Thanks, very kind of you.
On Fri, Mar 30, 2012 at 2:00 AM, Vinayak Borkar <[email protected]> wrote: > Sorry about the broken link. > > Here is one that works. > > http://flamingo.ics.uci.edu/**pub/sigmod10-vernica.pdf<http://flamingo.ics.uci.edu/pub/sigmod10-vernica.pdf> > > Vinayak > > > > On 3/29/12 1:27 PM, Praveen Kumar K J V S wrote: > >> Hi Vinayak, >> >> Thanks. If I use a single reduce I might run out of memory in that reduce >> JVM. >> >> BTW URL is not accessible. >> >> Thanks, >> Praveen >> >> On Fri, Mar 30, 2012 at 1:52 AM, Vinayak Borkar<[email protected]> wrote: >> >> Hi Praveen, >>> >>> The way your problem is stated, requires in the worst case that all >>> cities >>> appear at every reducer. The simplest way to do so is to have one reducer >>> -- but this is a sequential solution and probably not what you are >>> looking >>> for. >>> >>> If you have more visibility into your similarity function you can do >>> better. Look at http://asterix.ics.uci.edu/**** >>> pub/sigmod10-vernica-long.pdf<http://asterix.ics.uci.edu/**pub/sigmod10-vernica-long.pdf> >>> <**http://asterix.ics.uci.edu/**pub/sigmod10-vernica-long.pdf<http://asterix.ics.uci.edu/pub/sigmod10-vernica-long.pdf> >>> >**for trying to solve a similar problem for set similarity joins. >>> >>> >>> One other approach you could use (if the number of unique cities is >>> fairly >>> small), is to first run a MapReduce job to compute the distinct cities >>> (duplicate eliminated). Then do a map-only job where each mapper uses the >>> distinct list of cities to perform the "similarity join" with the data in >>> its HDFS block. >>> >>> Hope this helps. >>> >>> Vinayak >>> >>> >>> >>> On 3/29/12 1:05 PM, Praveen Kumar K J V S wrote: >>> >>> Hi All, >>>> >>>> I have already posted my question to the MapReduce users mailing list, >>>> but >>>> alas I did not get any response. Probably I did not convey my question >>>> correctly, so I thought I will rephrase my question and post it in dev >>>> list. >>>> >>>> Kindly give your suggestions. >>>> >>>> I have a many files HDFS each containing list of cities. For each city >>>> in >>>> any document I want to find a similar city that appear in any of the >>>> documents. I have a utility method that says the level of similarity >>>> b/w 2 >>>> cities, re turning a value b/w 0 -1. >>>> >>>> Is there a way of doing this in Hadoop. I have specific doubt because, a >>>> city might be similar to another city present in some other input split >>>> that is processed by another mapper. Lets say odd cities (C1, C3, C5) >>>> are >>>> similar >>>> >>>> Input Split 1 has the cities: C1, C2, C3, C4 >>>> Input Split 2 has the cities: C1, C2, C5, C6 >>>> >>>> Say my mapper 1 o/p is: since odd cities (C1, C3, C5) are similar >>>> >>>> C1, C3, >>>> C2, C4 >>>> C3, C1 >>>> C4, C2 >>>> >>>> Similar for mapper 2. >>>> C1, C5 >>>> C2, C5 >>>> C5, C1 >>>> C6, C2 >>>> >>>> Since C1 appears in both the splits, is finally at my reducer I get C3, >>>> C5 >>>> for key C1, But this does not happen for C3, since it appears in only >>>> one >>>> split, >>>> >>>> Thanks, >>>> Praveen >>>> >>>> >>>> >>> >> >
