[ 
https://issues.apache.org/jira/browse/NUTCH-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371641#comment-14371641
 ] 

Luis Lopez commented on NUTCH-1971:
-----------------------------------

Thanks Sebastian, I took a look at the patch and it does fix this issue however 
it is not applied yet, at least in the released 1.9 version (or trunk)

https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/crawl/CrawlDbFilter.java
https://github.com/apache/nutch/blob/branch-1.9/src/java/org/apache/nutch/crawl/CrawlDbFilter.java

and yes, more information from the command line help would also help. 

> The crawldb.url.filters property is not present in any configuration file
> -------------------------------------------------------------------------
>
>                 Key: NUTCH-1971
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1971
>             Project: Nutch
>          Issue Type: Improvement
>          Components: crawldb
>    Affects Versions: 1.9
>            Reporter: Luis Lopez
>              Labels: configuration, crawldb, nutch-default.xml
>
> In CrawlDbFilter.java there is a line for getting a boolean that sets if the 
> filters are going to be applied or not: 
>   public static final String URL_FILTERING = "crawldb.url.filters";
> However in nutch-default.xml that property is not present. Currently the only 
> way to set this value is using the -filter parameter from the command line. 
> The same applies to:  
> public static final String URL_NORMALIZING = "crawldb.url.normalizers";
> public static final String URL_NORMALIZING_SCOPE = 
> "crawldb.url.normalizers.scope";



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to