You should implement your split to represent the split information. Then you should implement getSplits in InputFormat to get the splits from your input, which divides the whole input into chunks. Here, each split will be given to a map task. You should also define RecordReader which reads records from the split. Map task processes one record at a time.
See http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Input Thanks Amareshwari On 12/21/09 2:22 AM, "Cao Kang" <[email protected]> wrote: Hi, I have spent several days on the customized file input format in hadoop. Basically, we need split one giant square shaped image (.tif) into four square shaped smaller images. Where does the really split happen? Should I implement it in "getSplits" function or in the "next" function? It's quite confusing. Does anyone know or can anyone provide some examples of it? Thanks.
