Hello Pei,

On Thu, Apr 28, 2011 at 6:58 AM, Pei HE <[email protected]> wrote:
> The key, which TextInputFormat generates, is the bytes offset in the
> file. So, how can I find the global line offset in the mapper?

This is not achievable unless you have fixed byte records (in which
case you should be able to divide and find). You can try pre-building
and maintaining an index otherwise, but looking up these forms of
structure for every record may get slow.

Sometimes its also alright to process complete documents in mappers
instead of letting it split across, as a solution (your task's input
record counter could be used as line number).

-- 
Harsh J

Reply via email to