[
https://issues.apache.org/jira/browse/NUTCH-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144264#comment-16144264
]
Jorge Luis Betancourt Gonzalez commented on NUTCH-2414:
-------------------------------------------------------
+1 This would allow also help to deprecate the {{mimetype-filter}} plugin and
avoid having the responsibility of indexing/allowing/blocking documents (from
being indexed) scattered across several plugins
> Allow LanguageIndexingFilter to actually filter documents by language.
> ----------------------------------------------------------------------
>
> Key: NUTCH-2414
> URL: https://issues.apache.org/jira/browse/NUTCH-2414
> Project: Nutch
> Issue Type: Improvement
> Components: plugin
> Affects Versions: 1.13
> Reporter: Yossi Tamari
> Priority: Minor
>
> It is often useful to only index pages in select languages (e.g. only those
> languages that we intend to search in). At first glance it seems that this is
> done by LanguageIndexingFilter, but currently all the filter does is add the
> language as a field to the index.
> We can add a configuration property to LanguageIndexingFilter that will allow
> it to only index languages specified in this property.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)