[ 
https://issues.apache.org/jira/browse/NUTCH-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  closed NUTCH-612.
-----------------------------------

    Resolution: Fixed
      Assignee: Andrzej Bialecki 

> URL filtering is always disabled in Generator when invoked by Crawl
> -------------------------------------------------------------------
>
>                 Key: NUTCH-612
>                 URL: https://issues.apache.org/jira/browse/NUTCH-612
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 1.0.0
>            Reporter: Susam Pal
>            Assignee: Andrzej Bialecki 
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-612v0.1.patch
>
>
> When a crawl is done using the 'bin/nutch crawl' command, no filtering is 
> done in Generator even if 'crawl.generate.filter' is set to true in the 
> configuration file.
> The problem is that in the Generator's generate method, the following code 
> unconditionally sets the filter value of the job to whatever is passed to it:-
> {code}job.setBoolean(CRAWL_GENERATE_FILTER, filter);{code}
> The code in Crawl.java always passes this as false. 
> This has been fixed by exposing an overloaded generate method which takes 
> only the 5 arguments that Crawl needs to set. This overloaded method reads 
> the configuration and sets the filter value appropriately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to