I have chosen to use Jay's suggestion as a quick workaround and am pleased
to report that it seems to work well on small test inputs.

My question now is, are the mappers guaranteed to receive the file's lines
in order?

Browsing the source suggests this is so, but I just want to make sure as my
understanding of Hadoop is transubstantial.

Thank you for your patience in answering my questions.

On 23 April 2012 14:28, Harsh J <ha...@cloudera.com> wrote:

> Jay,
>
> On Mon, Apr 23, 2012 at 6:43 PM, JAX <jayunit...@gmail.com> wrote:
> > Curious : Seems like you could aggregate the results in the mapper as a
> local variable or list of strings--- is there a way to know that your
> mapper has just read the LAST line of an input split?
>
> True. Can be one way to do it (unless aggregation of 'records' needs
> to happen live, and you don't wish to store it all in memory).
>
> > Is there a "cleanup" or "finalize" method in mappers that is run at the
> end of a whole steam read to support these sort of chunked, in memor map/r
> operations?
>
> Yes there is. See:
>
> Old API:
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Mapper.html
> (See Closeable's close())
>
> New API:
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup(org.apache.hadoop.mapreduce.Mapper.Context)
>
>
> --
> Harsh J
>

Reply via email to