Hi,
1) Where can I find the "main" class of hadoop? The one that calls the
InputFormat then the MapperRunner and ReducerRunner and others?
This will help me understand what is in memory or still on disk , exact
flow of data between split and mappers .
My problem is, assuming I have a TextInputFormat and would like to modify
the input in memory before being read by RecordReader... where shall I do
that?
InputFormat was my first guess, but unfortunately, it only defines the
logical splits ... So, the only way I can think of is use the recordReader
to read all the records in split into another variable (with the format I
want) then process that variable by map functions.
But is that efficient? So, to understand this,I hope someone can give an
answer to Q(1)
Thank you,
Mark