[ 
https://issues.apache.org/jira/browse/NUTCH-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578770#action_12578770
 ] 

Andrzej Bialecki  commented on NUTCH-612:
-----------------------------------------

Patch committed to trunk rev. 637114. Thank you!

> URL filtering is always disabled in Generator when invoked by Crawl
> -------------------------------------------------------------------
>
>                 Key: NUTCH-612
>                 URL: https://issues.apache.org/jira/browse/NUTCH-612
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 1.0.0
>            Reporter: Susam Pal
>            Assignee: Andrzej Bialecki 
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-612v0.1.patch
>
>
> When a crawl is done using the 'bin/nutch crawl' command, no filtering is 
> done in Generator even if 'crawl.generate.filter' is set to true in the 
> configuration file.
> The problem is that in the Generator's generate method, the following code 
> unconditionally sets the filter value of the job to whatever is passed to it:-
> {code}job.setBoolean(CRAWL_GENERATE_FILTER, filter);{code}
> The code in Crawl.java always passes this as false. 
> This has been fixed by exposing an overloaded generate method which takes 
> only the 5 arguments that Crawl needs to set. This overloaded method reads 
> the configuration and sets the filter value appropriately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to