> Date: Mon, 5 Apr 2010 14:57:09 +0100
> From: [email protected]
> To: [email protected]
> Subject: Get Line Number from InputFormat
> 
> Dear all,
>    TextInputFormat send the <Offset, Line> into the Mapper, however, the 
> offset is sometime meaningless, and confusing. Is it possible to have a 
> InputFormat which outputs <Line NO., line> into mapper?
> 
> Thanks a lot.
> 
> Song

Song,

I'm not sure what you want is realistic or even worthwhile.

You have a file and its split in to chunks of 64MB (default) or something 
larger based on your cloud settings.
You have map job that starts from a specific point in to the file, but that 
does not mean that its starting at a specific line, or that Hadoop will know 
which line in the file. (Your records are not always going to be based on the 
end of a line, or one like per record.

Does that make sense?
Offset has more meaning that an arbitrary Line NO.

-Mike
                                          
_________________________________________________________________
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with 
Hotmail. 
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5

Reply via email to