+1 This way one could have a very focused crawl/search On Mon, Aug 28, 2017 at 10:08 PM, Jorge Luis Betancourt Gonzalez (JIRA) < [email protected]> wrote:
> > [ https://issues.apache.org/jira/browse/NUTCH-2414?page= > com.atlassian.jira.plugin.system.issuetabpanels:comment- > tabpanel&focusedCommentId=16144264#comment-16144264 ] > > Jorge Luis Betancourt Gonzalez commented on NUTCH-2414: > ------------------------------------------------------- > > +1 This would allow also help to deprecate the {{mimetype-filter}} plugin > and avoid having the responsibility of indexing/allowing/blocking documents > (from being indexed) scattered across several plugins > > > Allow LanguageIndexingFilter to actually filter documents by language. > > ---------------------------------------------------------------------- > > > > Key: NUTCH-2414 > > URL: https://issues.apache.org/jira/browse/NUTCH-2414 > > Project: Nutch > > Issue Type: Improvement > > Components: plugin > > Affects Versions: 1.13 > > Reporter: Yossi Tamari > > Priority: Minor > > > > It is often useful to only index pages in select languages (e.g. only > those languages that we intend to search in). At first glance it seems that > this is done by LanguageIndexingFilter, but currently all the filter does > is add the language as a field to the index. > > We can add a configuration property to LanguageIndexingFilter that will > allow it to only index languages specified in this property. > > > > -- > This message was sent by Atlassian JIRA > (v6.4.14#64029) >

