Nathan Gass created NUTCH-1495:
----------------------------------
Summary: -normalize and -filter for updatedb command in nutch 2.x
Key: NUTCH-1495
URL: https://issues.apache.org/jira/browse/NUTCH-1495
Project: Nutch
Issue Type: Improvement
Affects Versions: 2.2
Reporter: Nathan Gass
AFAIS in nutch 1.x you could change your url filters and normalizers during the
crawl, and update the db using crawldb -normalize -filter. There does not seem
to be a away to achieve the same in nutch 2.x?
Anyway, I went ahead and tried to implement -normalize and -filter for the
nutch 2.x updatedb command. I have no experience with any of the used
technologies including java, so please check the attached code carefully before
using it. I'm very interested to hear if this is the right approach or any
other comments.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira