Yossi Tamari created NUTCH-2414:
-----------------------------------
Summary: Allow LanguageIndexingFilter to actually filter documents
by language.
Key: NUTCH-2414
URL: https://issues.apache.org/jira/browse/NUTCH-2414
Project: Nutch
Issue Type: Improvement
Components: plugin
Affects Versions: 1.13
Reporter: Yossi Tamari
Priority: Minor
It is often useful to only index pages in select languages (e.g. only those
languages that we intend to search in). At first glance it seems that this is
done by LanguageIndexingFilter, but currently all the filter does is add the
language as a field to the index.
We can add a configuration property to LanguageIndexingFilter that will allow
it to only index languages specified in this property.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)