Re: how to create collections in the mapper class

Ted Dunning Mon, 31 Dec 2007 10:19:05 -0800

Yes.  This is what I suspected.

I don't know why your map is empty, but it probably has to do with the fact
that the mappers are being invoked more than you think.


Regardless of that, this is a very poor design for this problem.  It would
be much better if you were to simply use a map-reduce pass to eliminate
duplicate elements.  The basic idea would be to use the following functions:

   map: <key, value> -> <key, key, value>
   collect and reduce: <key, values> -> null, first(values)

This will give you an echo of your original file with all duplicates
removed.  You can then do the processing that you originally planned to do.

I should also point out that if your duplicate records are grouped together
in your input data, then this operation will be very efficient because the
collect function will do most of the duplicate elimination even before your
data is written to disk.


On 12/31/07 1:17 AM, "helena21" <[EMAIL PROTECTED]> wrote:

> 
> Thanks, for your response. Just to make my question clear i want to have
> hashMap and declare it as follows
> public static class MapClass extends MapReduceBase implements Mapper {
> private final static LongWritable ONE = new LongWritable(1);
>                 private static Map usersMap=null;
> 
> public static Map getUsersMap(){
> if(usersMap==null){
> usersMap=new HashMap();
> }
> return usersMap;
> }
> 
>          ........
>       
>           public void map(WritableComparable key, Writable value,
> OutputCollector output, Reporter reporter) throws IOException {
> 
>                .......
>                        // nkey is object
>                       //name is Text
> 
>                        if(getUsersMap().get(nKey)==null){
> output.collect(name, ONE);
> getUsersMap().put(nKey, data[12]);
> }
> 
>                      ......
> 
> 
>                      }
> 
> 
>                the problem is my hashmap(userMap) is always empty.Now I hope
> my problem is clear.
> 
> Thanks,
> 
> Helen
> 
> 
> 
> 
> 
> Ted Dunning-3 wrote:
>> 
>> 
>> This sounds like there is a little bit of confusion going on here.
>> 
>> It is common for people who are starting with Hadoop that they are
>> surprised
>> when static fields of the mapper do not get shared across all parallel
>> instances of the map function.  This is, of course, because you are
>> running
>> many mappers.
>> 
>> Usually when people say what you are saying, the reason is that they are
>> trying to do something like removing duplicate elements.  The best way to
>> do
>> that is to NOT try to put state into the map function, but rather to use
>> the
>> reduce and sorting functions to do the work.  A good example is trying to
>> find all of the unique words in a set of documents.  If you just use a
>> word-counting function, you get what you want (a list of unique words).
>> If
>> you want a list of unique words per day, then you simply have to change
>> the
>> program so that the mapper outputs a key that contains the word and the
>> day
>> and do the count as before.
>> 
>> Remember also that your program may contain several map/reduce steps.
>> 
>> Perhaps if you say more about what you are trying to do, it would be
>> easier
>> to help you.
>> 
>> 
>> On 12/28/07 6:35 AM, "helena21" <[EMAIL PROTECTED]> wrote:
>> 
>>> 
>>> Hi Everybody,
>>> 
>>> i want to create arraylist that collects some objects from the input in
>>> the
>>> mapper class so that i want to use these collections to filter my input.
>>> the
>>> problem is my arraylist can't have even one object in it. its size is
>>> always
>>> zero. pls pls point me how can i create arraylist or other collection
>>> objects. i make it static object but still the arraylist can't collect
>>> any
>>> object.
>>> 
>>> Thanks
>>> Helen
>> 
>> 
>>

Re: how to create collections in the mapper class

Reply via email to