[ https://issues.apache.org/jira/browse/NUTCH-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yossi Tamari updated NUTCH-2644: -------------------------------- Comment: was deleted (was: [~wastl-nagel], Isn't this a much wider issue? For example, I think it also applies to CrawlDbMerger in line 183. ) > CrawlDbReader -dump ignores filter options > ------------------------------------------ > > Key: NUTCH-2644 > URL: https://issues.apache.org/jira/browse/NUTCH-2644 > Project: Nutch > Issue Type: Bug > Components: crawldb > Affects Versions: 1.15 > Reporter: Sebastian Nagel > Priority: Major > Fix For: 1.16 > > > The CrawlDbReader ignores the filter options -status and -expr when dumping a > crawldb: > {noformat} > % bin/nutch readdb crawldb/ -dump cdb.dump -status 'db_fetched' -expr 'status > == "db_fetched"' > ... > % grep '^Status:' cdb.dump/part-r-00000 | sort | uniq -c > 10 Status: 1 (db_unfetched) > 28 Status: 2 (db_fetched) > 1 Status: 3 (db_gone) > 1 Status: 4 (db_redir_temp) > 3 Status: 7 (db_duplicate) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)