Hi, All,

The input of my MapReduce job is two large txt files. And an InputSplit 
consists of a portion of the file from both files. And this Split is content 
dependent. So I have to read the input file to generate a split. Now the thing 
is that most of the time is spent in generating these splits. The Map and 
Reduce phases actually take less time than that. I was wondering if there is an 
efficient way to generate splits from files. My InputFormat class is based on 
FileInputFormat. The getSplits function of FileInputFormat doesn't read input 
file. But this is impossible for me because my split depends on the content of 
the file.

Any ideas or comments are appreciated.


      

Reply via email to