Gents,

To use URL filters and Normalizers in CrawlDBUpdate the three config setting 
may be used:
 
In CrawlDbFilter line 41:43
  public static final String URL_FILTERING = "crawldb.url.filters";
  public static final String URL_NORMALIZING = "crawldb.url.normalizers";
  public static final String URL_NORMALIZING_SCOPE = 
"crawldb.url.normalizers.scope";


However, in nutch-default we have different names 
<property>
    <name>db.url.normalizers</name>
    <value>false</value>
    <description>Normalize urls when updating crawldb</description>
</property>

<property>
    <name>db.url.filters</name>
    <value>false</value>
    <description>Filter urls when updating crawldb</description>
</property>


Obviously, that is the reason why URLNormalizers/Filters dont work.

Should I change CrawlDbFilter code to
 public static final String URL_FILTERING = "db.url.filters";
  public static final String URL_NORMALIZING = "db.url.normalizers";
  public static final String URL_NORMALIZING_SCOPE = "db.url.normalizers.scope";


?

Semyon.

Reply via email to