Of course, this is only true of the TextInputFormat. You can write a CsvInputFormat in which every mapper reads the first line as well as their assigned split. This would cause some delay at the beginning as all of the first round of mappers whacked against the beginning of the file, but that delay should be very short and the convenience of being able to read standard CSV input would be significant.
On Wed, Jul 6, 2011 at 3:11 AM, Sean Owen <[email protected]> wrote: > If your input is a CSV file with a header line, one > mapper will read that first chunk with the header line. You don't know > which > mapper that will be. Only one will read it, so no you would not construct a > MapReduce app that depends on all mappers seeing some header line, because > they don't. >
