Nathan Gass created NUTCH-1495:
----------------------------------

             Summary: -normalize and -filter for updatedb command in nutch 2.x
                 Key: NUTCH-1495
                 URL: https://issues.apache.org/jira/browse/NUTCH-1495
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 2.2
            Reporter: Nathan Gass


AFAIS in nutch 1.x you could change your url filters and normalizers during the 
crawl, and update the db using crawldb -normalize -filter. There does not seem 
to be a away to achieve the same in nutch 2.x?

Anyway, I went ahead and tried to implement -normalize and -filter for the 
nutch 2.x updatedb command. I have no experience with any of the used 
technologies including java, so please check the attached code carefully before 
using it. I'm very interested to hear if this is the right approach or any 
other comments.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to