At a glance it seems that org.apache.nutch.db.WebDBInjector should (or could) have the DMOZ code taken out of it and put somewhere else, as the DMOZ code is really just a use of WebDBInjector and not essential to it and in theory there could be lots of different injectors (e.g. URLs from a DB...links from del.icio.us/furl.net ... RSS feeds recently updated blogs).
Benefit of doing this is minor of course and might be just a matter of taste, but if people want I'll enter a change request and attach a diff of the code changes [BTW: what are the right args to diff when submitting a code change?].
- Re: WebDBInjector and DMOZ separation David Spencer
- Re: WebDBInjector and DMOZ separation Doug Cutting
