Re: Need help to map a problem to Mapreduce domain

Praveen Kumar K J V S Thu, 29 Mar 2012 13:28:08 -0700

Hi Vinayak,

Thanks. If I use a single reduce I might run out of memory in that reduce
JVM.


BTW URL is not accessible.

Thanks,
Praveen

On Fri, Mar 30, 2012 at 1:52 AM, Vinayak Borkar <[email protected]> wrote:

> Hi Praveen,
>
> The way your problem is stated, requires in the worst case that all cities
> appear at every reducer. The simplest way to do so is to have one reducer
> -- but this is a sequential solution and probably not what you are looking
> for.
>
> If you have more visibility into your similarity function you can do
> better. Look at 
> http://asterix.ics.uci.edu/**pub/sigmod10-vernica-long.pdf<http://asterix.ics.uci.edu/pub/sigmod10-vernica-long.pdf>for
>  trying to solve a similar problem for set similarity joins.
>
> One other approach you could use (if the number of unique cities is fairly
> small), is to first run a MapReduce job to compute the distinct cities
> (duplicate eliminated). Then do a map-only job where each mapper uses the
> distinct list of cities to perform the "similarity join" with the data in
> its HDFS block.
>
> Hope this helps.
>
> Vinayak
>
>
>
> On 3/29/12 1:05 PM, Praveen Kumar K J V S wrote:
>
>> Hi All,
>>
>> I have already posted my question to the MapReduce users mailing list, but
>> alas I did not get any response. Probably I did not convey my question
>> correctly, so I thought I will rephrase my question and post it in dev
>> list.
>>
>> Kindly give your suggestions.
>>
>> I have a many files HDFS each containing list of cities. For each city in
>> any document I want to find a similar city that appear in any of the
>> documents. I have a utility method that says the level of similarity b/w 2
>> cities, re turning a value b/w 0 -1.
>>
>> Is there a way of doing this in Hadoop. I have specific doubt because, a
>> city might be similar to another city present in some other input split
>> that is processed by another mapper.  Lets say odd cities (C1, C3, C5) are
>> similar
>>
>> Input Split 1 has the cities: C1, C2, C3, C4
>> Input Split 2 has the cities: C1, C2, C5, C6
>>
>> Say my mapper 1 o/p is: since odd cities (C1, C3, C5) are similar
>>
>> C1, C3,
>> C2, C4
>> C3, C1
>> C4, C2
>>
>> Similar for mapper 2.
>> C1, C5
>> C2, C5
>> C5, C1
>> C6, C2
>>
>> Since C1 appears in both the splits, is finally at my reducer I get C3, C5
>> for key C1, But this does not happen for C3, since it appears in only one
>> split,
>>
>> Thanks,
>> Praveen
>>
>>
>

Re: Need help to map a problem to Mapreduce domain

Reply via email to