Dear All, I am having a requirement in which I need to move my existing program to map-reduce framework:
---I am reading files within a directory and also subdirectories. ---Processing one file at a time ---Writing all the processed output to a single output file. [One output file per folder] Now, if I have to do this process using Map-Reduce, how should I progress? I think I need to give one file to one Mapper at a time, when all the mappers combine, one single reducer should write to a single file. [as I think we cannot write parallely to a single output file] Please suggest me (or point me to resources) so that I can: a) My map function gets one file at a time (instead of one line at a time) b) Should implementing a custom RecordReader and/or FileInputFormat allow me to read files in subdirectories as well (one file at a time) ? Appreciate any help. Thanks Bhaskar Ghosh Hyderabad, India http://www.google.com/profiles/bjgindia "Ignorance is Bliss... Knowledge never brings Peace!!!"