Hi all,
some time ago I started using some of Nutch's classes (like the
WebDBWriter and Reader) in a project of mine, Nutch version at that
time was 0.7.
Now I see the code has changed a lot, due to the mapreduce strategy,
I suppose. 
Could someone please tell me what are the classes/packages involved 
in what was once called the "webdb"?

What I'm still trying to understand is why the old PagesByMD5 archive
used a Page as the index key instead of using only the MD5Hash, like
PagesByURL did using the UTF8 (the url of the pages). 

Thanks in advance.
-- 
Francesco

Reply via email to