Curious : Seems like you could aggregate the results in the mapper as a local variable or list of strings--- is there a way to know that your mapper has just read the LAST line of an input split?
I.e if so, then you could implement your entire solution in your mapper without needing a new input format z? Is there a "cleanup" or "finalize" method in mappers that is run at the end of a whole steam read to support these sort of chunked, in memor map/r operations? Jay Vyas MMSB UCHC On Apr 23, 2012, at 6:40 AM, Dan Drew <wirefr...@googlemail.com> wrote: > I require each input file to be processed by each mapper as a whole. > > I subclass c.o.a.h.mapreduce.lib.input.TextInputFormat and override > isSplitable() to invariably return false. > > The job is configured to use this subclass as the input format class via > setInputFormatClass(). The job runs without error, yet the logs reveal > files are still processed line by line by the mappers. > > Any help would be greatly appreciated, > Thanks