Hi everyone,
Using Hadoop-0.20.2, I'm trying to use MultiFileInputFormat which is supposed
to put each file from the input directory in a SEPARATE split. So the number of
Maps is equal to the number of input files. Yet, what I get is that each split
contains multiple paths of input files, hence # of maps is < # of input files.
Is it because "MultiFileInputFormat" is deprecated?
In my implemented myMultiFileInputFormat I have only the following:
public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
JobConf job, Reporter reporter){
return (new myRecordReader((MultiFileSplit) split));
}
Yet, in myRecordReader, for example one split has the following;
" /tmp/input/file1:0+300
/tmp/input/file2:0+199 "
instead of each line in its own split.
Why? Any clues?
Thank you,
Maha