Hello Pei, On Thu, Apr 28, 2011 at 6:58 AM, Pei HE <[email protected]> wrote: > The key, which TextInputFormat generates, is the bytes offset in the > file. So, how can I find the global line offset in the mapper?
This is not achievable unless you have fixed byte records (in which case you should be able to divide and find). You can try pre-building and maintaining an index otherwise, but looking up these forms of structure for every record may get slow. Sometimes its also alright to process complete documents in mappers instead of letting it split across, as a solution (your task's input record counter could be used as line number). -- Harsh J
