Hi all, some time ago I started using some of Nutch's classes (like the WebDBWriter and Reader) in a project of mine, Nutch version at that time was 0.7. Now I see the code has changed a lot, due to the mapreduce strategy, I suppose. Could someone please tell me what are the classes/packages involved in what was once called the "webdb"?
What I'm still trying to understand is why the old PagesByMD5 archive used a Page as the index key instead of using only the MD5Hash, like PagesByURL did using the UTF8 (the url of the pages). Thanks in advance. -- Francesco
