[ 
https://issues.apache.org/jira/browse/NUTCH-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433118#comment-16433118
 ] 

ASF GitHub Bot commented on NUTCH-2539:
---------------------------------------

lewismc closed pull request #300: NUTCH-2539
URL: https://github.com/apache/nutch/pull/300
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/conf/nutch-default.xml b/conf/nutch-default.xml
index 87c405883..688da5e40 100644
--- a/conf/nutch-default.xml
+++ b/conf/nutch-default.xml
@@ -548,15 +548,21 @@
 </property>
 
 <property>
-    <name>db.url.normalizers</name>
+    <name>crawldb.url.normalizers</name>
     <value>false</value>
-    <description>Normalize urls when updating crawldb</description>
+    <description>
+       !Temporary, can be overwritten with the command line!
+       Normalize urls when updating crawldb
+    </description>
 </property>
 
 <property>
-    <name>db.url.filters</name>
+    <name>crawldb.url.filters</name>
     <value>false</value>
-    <description>Filter urls when updating crawldb</description>
+    <description>
+       !Temporary, can be overwritten with the command line!
+       Filter urls when updating crawldb
+    </description>
 </property>
 
 <property>


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Not correct naming of db.url.filters and db.url.normalizers in 
> nutch-default.xml
> --------------------------------------------------------------------------------
>
>                 Key: NUTCH-2539
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2539
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.15
>            Reporter: Semyon Semyonov
>            Priority: Major
>
> There is a mismatch between config and code.
> In code, 
>  In CrawlDbFilter line 41:43
> > public static final String URL_FILTERING = "crawldb.url.filters";
> > public static final String URL_NORMALIZING = "crawldb.url.normalizers";
> > public static final String URL_NORMALIZING_SCOPE = 
> > "crawldb.url.normalizers.scope";
>  
> In nutch-default.xml
> > <property>
> > <name>db.url.normalizers</name>
> > <value>false</value>
> > <description>Normalize urls when updating crawldb</description>
> > </property>
> >
> > <property>
> > <name>db.url.filters</name>
> > <value>false</value>
> > <description>Filter urls when updating crawldb</description>
> > </property>
> These properties should be in line with code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to