[ https://issues.apache.org/jira/browse/NUTCH-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578770#action_12578770 ]
Andrzej Bialecki commented on NUTCH-612: ----------------------------------------- Patch committed to trunk rev. 637114. Thank you! > URL filtering is always disabled in Generator when invoked by Crawl > ------------------------------------------------------------------- > > Key: NUTCH-612 > URL: https://issues.apache.org/jira/browse/NUTCH-612 > Project: Nutch > Issue Type: Bug > Components: generator > Affects Versions: 1.0.0 > Reporter: Susam Pal > Assignee: Andrzej Bialecki > Fix For: 1.0.0 > > Attachments: NUTCH-612v0.1.patch > > > When a crawl is done using the 'bin/nutch crawl' command, no filtering is > done in Generator even if 'crawl.generate.filter' is set to true in the > configuration file. > The problem is that in the Generator's generate method, the following code > unconditionally sets the filter value of the job to whatever is passed to it:- > {code}job.setBoolean(CRAWL_GENERATE_FILTER, filter);{code} > The code in Crawl.java always passes this as false. > This has been fixed by exposing an overloaded generate method which takes > only the 5 arguments that Crawl needs to set. This overloaded method reads > the configuration and sets the filter value appropriately. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.