Hi Vinayak, Thanks. If I use a single reduce I might run out of memory in that reduce JVM.
BTW URL is not accessible. Thanks, Praveen On Fri, Mar 30, 2012 at 1:52 AM, Vinayak Borkar <[email protected]> wrote: > Hi Praveen, > > The way your problem is stated, requires in the worst case that all cities > appear at every reducer. The simplest way to do so is to have one reducer > -- but this is a sequential solution and probably not what you are looking > for. > > If you have more visibility into your similarity function you can do > better. Look at > http://asterix.ics.uci.edu/**pub/sigmod10-vernica-long.pdf<http://asterix.ics.uci.edu/pub/sigmod10-vernica-long.pdf>for > trying to solve a similar problem for set similarity joins. > > One other approach you could use (if the number of unique cities is fairly > small), is to first run a MapReduce job to compute the distinct cities > (duplicate eliminated). Then do a map-only job where each mapper uses the > distinct list of cities to perform the "similarity join" with the data in > its HDFS block. > > Hope this helps. > > Vinayak > > > > On 3/29/12 1:05 PM, Praveen Kumar K J V S wrote: > >> Hi All, >> >> I have already posted my question to the MapReduce users mailing list, but >> alas I did not get any response. Probably I did not convey my question >> correctly, so I thought I will rephrase my question and post it in dev >> list. >> >> Kindly give your suggestions. >> >> I have a many files HDFS each containing list of cities. For each city in >> any document I want to find a similar city that appear in any of the >> documents. I have a utility method that says the level of similarity b/w 2 >> cities, re turning a value b/w 0 -1. >> >> Is there a way of doing this in Hadoop. I have specific doubt because, a >> city might be similar to another city present in some other input split >> that is processed by another mapper. Lets say odd cities (C1, C3, C5) are >> similar >> >> Input Split 1 has the cities: C1, C2, C3, C4 >> Input Split 2 has the cities: C1, C2, C5, C6 >> >> Say my mapper 1 o/p is: since odd cities (C1, C3, C5) are similar >> >> C1, C3, >> C2, C4 >> C3, C1 >> C4, C2 >> >> Similar for mapper 2. >> C1, C5 >> C2, C5 >> C5, C1 >> C6, C2 >> >> Since C1 appears in both the splits, is finally at my reducer I get C3, C5 >> for key C1, But this does not happen for C3, since it appears in only one >> split, >> >> Thanks, >> Praveen >> >> >
