I'm sure you understand all implications here so I'll just answer your
questions, inline.

On Thu, Nov 17, 2011 at 9:53 AM, Something Something
<mailinglist...@gmail.com> wrote:
> Is the idea of writing business logic in cleanup method of a Mapper good or
> bad?  We think we can make our Mapper run faster if we keep accumulating
> data in a HashMap in a Mapper, and later in the cleanup() method write it.

You can certainly write it during cleanup() call. Streams are only
closed after thats done, so no issues framework-wise.

> 1)  Does Map/Reduce paradigm guarantee that cleanup will always be called
> before the reducer starts?

Reducers start reducing only after all Map Tasks have completed
(Tasks, on the whole level). So, yes. This is guaranteed.

> 2)  Is cleanup strictly for cleaning up unneeded resources?

Yes, it was provided for that purpose.

> 3)  We understand that the HashMap can grow & that could cause memory
> issues, but hypothetically let's say the memory requirements
> were manageable.

You are also pushing the whole write load to after the reads. It is
almost 1:1 otherwise.

P.s. Perhaps try overriding Mapper#run if you'd like complete control
on how a Mapper executes in stages.

-- 
Harsh J

Reply via email to