Re: mapreduce memory issues

Tamas Jambor Wed, 05 May 2010 11:35:42 -0700

I think this must be the issue. But my guess that it is regardless ofthe cluster size, because I tried to change the maximum map/reduce taskcapacity, and it looks that hadoop does not create more tasks for thisjob, even if there are more free slots available.


On 05/05/2010 19:11, Sean Owen wrote:

I think it's UserVectorToCooccurrenceMapper, which keeps a local count
of how many times each item has been seen. On a small cluster with a
few mappers, which see all items, you'd have a count for each item.
That's still not terrible, but, could take up a fair bit of memory.

One easy solution is to cap its size and throw out low-count entries sometimes.

Just to confirm this is the issue, you could hack in this line:

   private void countSeen(Vector userVector) {
     if (indexCounts.size()>  1000000) return;
     ...

That's not a real solution, but an easy way you could perhaps test for
everyone whether that's the problem. If that's it i can solve this in
a more robust way.

Re: mapreduce memory issues

Reply via email to