[ 
https://issues.apache.org/jira/browse/SOLR-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117084#comment-13117084
 ] 

Jan Høydahl commented on SOLR-2792:
-----------------------------------

Robert,

bq. in the part reading the dictionary, we should avoid the 
String.toLowerCase() without any locale here, at least use 
String.toLowerCase(Locale.ENGLISH) for consistency?
Yep, for consistency it's probably better to lowercase using an explicit locale 
than system default. I tested with my name, and Locale.ENGLISH converts Ø->ø, 
so I'm happy :)

bq. shouldn't we case fold the affixes too? however, i'm guessing most of these 
are already in lowercase.
The way it works now is that we case fold the input word *after* affixes are 
applied, before comparing with dictionary words. So if either input word or 
affixes are not lower-case they will both be. We could add a test for it to 
make sure..

bq. are we "merging" dictionary entries here (I think we should in this 
lower-casing mode?)
No, we are not, meaning, I guess, that Foo/B would overwrite foo/A in your 
example? Would you like to take a stab at the merging code?
                
> Allow case insensitive Hunspell stemming
> ----------------------------------------
>
>                 Key: SOLR-2792
>                 URL: https://issues.apache.org/jira/browse/SOLR-2792
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 3.5, 4.0
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>         Attachments: SOLR-2792.patch, SOLR-2792.patch, SOLR-2792.patch
>
>
> Same as http://code.google.com/p/lucene-hunspell/issues/detail?id=3
> Hunspell dictionaries are by nature case sensitive. The Hunspell stemmer thus 
> needs an option to allow case insensitive matching of the dictionaries.
> Imagine a query for "microsofts". It will never be stemmed to the dictionary 
> word "Microsoft" because of the case difference. This problem cannot be fixed 
> by putting LowercaseFilter before Hunspell.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to