[ https://issues.apache.org/jira/browse/SOLR-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117084#comment-13117084 ]
Jan Høydahl commented on SOLR-2792: ----------------------------------- Robert, bq. in the part reading the dictionary, we should avoid the String.toLowerCase() without any locale here, at least use String.toLowerCase(Locale.ENGLISH) for consistency? Yep, for consistency it's probably better to lowercase using an explicit locale than system default. I tested with my name, and Locale.ENGLISH converts Ø->ø, so I'm happy :) bq. shouldn't we case fold the affixes too? however, i'm guessing most of these are already in lowercase. The way it works now is that we case fold the input word *after* affixes are applied, before comparing with dictionary words. So if either input word or affixes are not lower-case they will both be. We could add a test for it to make sure.. bq. are we "merging" dictionary entries here (I think we should in this lower-casing mode?) No, we are not, meaning, I guess, that Foo/B would overwrite foo/A in your example? Would you like to take a stab at the merging code? > Allow case insensitive Hunspell stemming > ---------------------------------------- > > Key: SOLR-2792 > URL: https://issues.apache.org/jira/browse/SOLR-2792 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis > Affects Versions: 3.5, 4.0 > Reporter: Jan Høydahl > Assignee: Jan Høydahl > Attachments: SOLR-2792.patch, SOLR-2792.patch, SOLR-2792.patch > > > Same as http://code.google.com/p/lucene-hunspell/issues/detail?id=3 > Hunspell dictionaries are by nature case sensitive. The Hunspell stemmer thus > needs an option to allow case insensitive matching of the dictionaries. > Imagine a query for "microsofts". It will never be stemmed to the dictionary > word "Microsoft" because of the case difference. This problem cannot be fixed > by putting LowercaseFilter before Hunspell. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org