Andrzej Bialecki wrote:
You can download the patch from here:

http://www.getopt.org/nutch/20050507.patch

I have not yet had a chance to try this. Following are some quick comments from reading the patch. Overall I think this is great stuff.


1. Why does an HTMLMetaTags need to be passed to Parser.parse()? This seems to cross an abstraction boundary, since the Parser interface is meant to be format and protocol independent. Is it not possible to store this meta info in the getParseData().getMetadata()?

2. I still have some concern about the transient nature of ParseStatus. It would be inexpensive to add it to ParseData, no? What if, e.g., the db update tool needed an aspect of the parse status?

Cheers,

Doug

Reply via email to