[ https://issues.apache.org/jira/browse/NUTCH-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney updated NUTCH-1495: ---------------------------------------- Fix Version/s: 2.2 > -normalize and -filter for updatedb command in nutch 2.x > -------------------------------------------------------- > > Key: NUTCH-1495 > URL: https://issues.apache.org/jira/browse/NUTCH-1495 > Project: Nutch > Issue Type: Improvement > Affects Versions: 2.2 > Reporter: Nathan Gass > Fix For: 2.2 > > Attachments: patch-updatedb-normalize-filter-2012-11-09.txt, > patch-updatedb-normalize-filter-2012-11-13.txt > > > AFAIS in nutch 1.x you could change your url filters and normalizers during > the crawl, and update the db using crawldb -normalize -filter. There does not > seem to be a away to achieve the same in nutch 2.x? > Anyway, I went ahead and tried to implement -normalize and -filter for the > nutch 2.x updatedb command. I have no experience with any of the used > technologies including java, so please check the attached code carefully > before using it. I'm very interested to hear if this is the right approach or > any other comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira