Feng,


Actually I have already partially implemented a MapReduce WebDB
writer. I don't know whether anyone else is working on this now.
Well, not working, but also spend some days with code reading and playing around.
However I'm very interested in helping as well, but I'm sure I'm some steps behind you.
Comments?

In general I do not clearly understand the idea behind a "master" and the MapredWebDBCommitter.
Isn't this handled by the jobtracker and the job itself?
When browsing the Grep job then you can see that the grep job itself has the grepJob and sortJob, so you are able to manage 'flows' in the job itself.


Wouldn't make it sense to do the mr webdb similar?
As mentioned I just played around and may be missed something, however I was thinking doit it like this:


* create inputformat for the segment file(s).
* writing a mapper that creates several small unsorted webdb's.
* writing a combiner that merges this small webdb's with the existing webdb in to a temp webdb.
* writing a reducer that is able to sort and merge the entries of the temp webdb.


As mentioned may be I missed something, but since the job itself is a kind of master the processes can be managed from the job.
Since all files would be written in a ndf folder that is unique it is may not necessary to have any kind of id.


Anyway I would love to see the code you mentioned to understand your ideas.

Stefan



Reply via email to