[ https://issues.apache.org/jira/browse/LUCENE-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Woodward resolved LUCENE-7444. ----------------------------------- Resolution: Fixed Assignee: Alan Woodward Fix Version/s: master (8.0) > Remove English stopwords default from StandardAnalyzer in Lucene-Core > --------------------------------------------------------------------- > > Key: LUCENE-7444 > URL: https://issues.apache.org/jira/browse/LUCENE-7444 > Project: Lucene - Core > Issue Type: Task > Components: core/other, modules/analysis > Affects Versions: 6.2 > Reporter: Uwe Schindler > Assignee: Alan Woodward > Priority: Major > Fix For: master (8.0) > > Attachments: LUCENE-7444.patch > > > Yonik said on LUCENE-7318: > {quote} > bq. I think it would make a good default for most Lucene users, and we should > graduate it from the analyzers module into core, and make it the default for > IndexWriter. > This "StandardAnalyzer" is specific to English, as it removes English > stopwords. > That seems to be an odd choice now for a few reasons: > - It was argued in the past (rather vehemently) that Solr should not prefer > english in it's default "text" field > - AFAIK, removing stopwords is no longer considered best practice. > Given that removal of english stopwords is the only thing that really makes > this analyzer english-centric (and given the negative impact that can have on > other languages), it seems like the stopword filter should be removed from > StandardAnalyzer. > {quote} > When trying to fix the backwards incompatibility issues in LUCENE-7318, it > looks like most unrelated code moved from analysis module to core (and > changing package names!!!! :( ) was related to word list loading, > CharArraySets, and superclasses of StopFilter. If we follow Yonik's > suggestion, we can revert all those changes. I agree with hin, an "universal" > analyzer should not have any language specific stop-words. > The other thing is LowercaseFilter, but I'd suggest to simply add a clone of > it to Lucene core and leave the analysis-module self-contained. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org