When I run map reduce task over a har file as the input, I see that the input splits refer to 64mb byte boundaries inside the part file.
My mappers only know how to process the contents of each logical file inside the har file. Is there some way by which I can take the offset range specified by the input split and determine which logical files lie in that offset range? (How else would one do map reduce over a har file?) Roshan