Yossi Tamari created NUTCH-2414:
-----------------------------------

             Summary: Allow LanguageIndexingFilter to actually filter documents 
by language.
                 Key: NUTCH-2414
                 URL: https://issues.apache.org/jira/browse/NUTCH-2414
             Project: Nutch
          Issue Type: Improvement
          Components: plugin
    Affects Versions: 1.13
            Reporter: Yossi Tamari
            Priority: Minor


It is often useful to only index pages in select languages (e.g. only those 
languages that we intend to search in). At first glance it seems that this is 
done by LanguageIndexingFilter, but currently all the filter does is add the 
language as a field to the index.
We can add a configuration property to LanguageIndexingFilter that will allow 
it to only index languages specified in this property.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to