I'm sure you understand all implications here so I'll just answer your questions, inline.
On Thu, Nov 17, 2011 at 9:53 AM, Something Something <mailinglist...@gmail.com> wrote: > Is the idea of writing business logic in cleanup method of a Mapper good or > bad? We think we can make our Mapper run faster if we keep accumulating > data in a HashMap in a Mapper, and later in the cleanup() method write it. You can certainly write it during cleanup() call. Streams are only closed after thats done, so no issues framework-wise. > 1) Does Map/Reduce paradigm guarantee that cleanup will always be called > before the reducer starts? Reducers start reducing only after all Map Tasks have completed (Tasks, on the whole level). So, yes. This is guaranteed. > 2) Is cleanup strictly for cleaning up unneeded resources? Yes, it was provided for that purpose. > 3) We understand that the HashMap can grow & that could cause memory > issues, but hypothetically let's say the memory requirements > were manageable. You are also pushing the whole write load to after the reads. It is almost 1:1 otherwise. P.s. Perhaps try overriding Mapper#run if you'd like complete control on how a Mapper executes in stages. -- Harsh J