Hello,

The map/reduce tutorials in the hadoop src are great for getting started. Are 
there any similar tutorials for more advanced use cases? Especially complicated 
ones that might involve subclassing RecordReader, InputFormat, and others. 

In particular I want to write a job that does a cartesian product of a file, 
i.e. it takes each row in the file and compares it against every other row in 
the file. My first pass involved writing a NonSplittableInputFormat and a 
RecordReader that composes 2 LineRecordReaders, one outerReader and one 
innerReader. This returns two rows merged into one to the Map task which does 
the comparison. 

Seems there must be a better way to do this. Additionally, no matter how many 
map tasks I assign, only one map task gets created and assigned by the job 
tracker. Any ideas on a better approach? Has anyone done anything similar?

Thanks!

Reply via email to