Hi Cao, My answers are inline.
On 12/21/09 8:42 PM, "Cao Kang" <[email protected]> wrote: Hi Amareshwari, Thanks for your reply. But another question is, where and how should I define the split boundaries? Should I define it in FileSplit constructor? I don't think you can extend FileSplit directly. I think you should write your own split say ImageSplit, in which you can represent your image fully. For example, FileSplit represents the split using offset and length. You may need all four co-ordinates of your image. Furthermore, as far as I have seen, all examples there use longwritable to represent the offset of that split in the input file. What is the split is not sequential? Yes. FileSplit is used for representing text data. For example, in the image split, the sub images bytes array are not sequential from the input image. The bytes split look like this: |---------------|---------------| | | | | 1 | 2 | | | | |---------------|---------------| | | | | 3 | 4 | | | | |---------------|---------------| Each sub image split will be consisted by an array. Where and how this should be defined in InputFormat? Many thanks. In your InputFormat, you should define getSplits() method which returns your ImageSplits. Thanks Amareshwari On Mon, Dec 21, 2009 at 6:37 AM, Amareshwari Sri Ramadasu < [email protected]> wrote: > You should implement your split to represent the split information. Then > you should implement getSplits in InputFormat to get the splits from your > input, which divides the whole input into chunks. Here, each split will be > given to a map task. > You should also define RecordReader which reads records from the split. Map > task processes one record at a time. > > See > http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Input > > Thanks > Amareshwari > > On 12/21/09 2:22 AM, "Cao Kang" <[email protected]> wrote: > > Hi, > I have spent several days on the customized file input format in hadoop. > Basically, we need split one giant square shaped image (.tif) into four > square shaped smaller images. Where does the really split happen? Should I > implement it in "getSplits" function or in the "next" function? It's quite > confusing. > Does anyone know or can anyone provide some examples of it? Thanks. > >
