Mark, I may not have gotten your question exactly, but you can do further processing inside of your FileInputFormat derivative's RecordReader implementation (just before it loads the value for a next() form of call -- which the MapRunner would use to read).
If you're looking to dig into Hadoop's source code to understand the flow yourself, MapTask.java is what you may be looking for (run* methods). On Sun, Jun 12, 2011 at 3:25 AM, Mark question <markq2...@gmail.com> wrote: > Hi, > > 1) Where can I find the "main" class of hadoop? The one that calls the > InputFormat then the MapperRunner and ReducerRunner and others? > > This will help me understand what is in memory or still on disk , exact > flow of data between split and mappers . > > My problem is, assuming I have a TextInputFormat and would like to modify > the input in memory before being read by RecordReader... where shall I do > that? > > InputFormat was my first guess, but unfortunately, it only defines the > logical splits ... So, the only way I can think of is use the recordReader > to read all the records in split into another variable (with the format I > want) then process that variable by map functions. > > But is that efficient? So, to understand this,I hope someone can give an > answer to Q(1) > > Thank you, > Mark > -- Harsh J