Re: File Split

Amareshwari Sri Ramadasu Mon, 21 Dec 2009 03:38:26 -0800

You should implement your split to represent the split information. Then you 
should implement getSplits in InputFormat to get the splits from your input, 
which divides the whole input into chunks. Here, each split will be given to a 
map task.
You should also define RecordReader which reads records from the split. Map 
task processes one record at a time.


See http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Input

Thanks
Amareshwari

On 12/21/09 2:22 AM, "Cao Kang" <[email protected]> wrote:

Hi,
I have spent several days on the customized file input format in hadoop.
Basically, we need split one giant square shaped image (.tif) into four
square shaped smaller images. Where does the really split happen?  Should I
implement it in "getSplits" function or in the "next" function? It's quite
confusing.
Does anyone know or can anyone provide some examples of it? Thanks.

Re: File Split

Reply via email to