[ 
https://issues.apache.org/jira/browse/SOLR-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-4452:
------------------------------

    Attachment: SOLR-4452.patch

New patch with testcase
                
> Hunspell stemmer should not merge duplicate dictionary entries
> --------------------------------------------------------------
>
>                 Key: SOLR-4452
>                 URL: https://issues.apache.org/jira/browse/SOLR-4452
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>            Reporter: Jan Høydahl
>              Labels: hunspell
>             Fix For: 4.2, 5.0
>
>         Attachments: SOLR-4452.patch, SOLR-4452.patch
>
>
> Hunspell dictionaries are on the form
> {noformat}
> lucene/ABC
> mahout/X
> {noformat}
> Each word is listed once with is base form, and the flags after the / define 
> allowed prefixes and suffixes.
> In HunspellDictionary's parsing logic, if the same base word should appear 
> multiple times in the file, the flags from the duplicate entry are added to 
> the flags from the existing entry.
> However, HunSpellStemFilterFactory allows for a comma-separated list of 
> dictionary files to be passed in, the idea being that you can have your own 
> custom extensions and not need to modify the "standard" ones which may change 
> upstream once in a while. This feature now works only for NEW words, not for 
> overriding existing entries from the first dictionary.
> Would like to change this behavior, so that the last line read overwrites any 
> previous one. This will both fix the custom dictionary issue and also fix 
> unintentional wrong original dictionaries, where someone added a word 
> definition at the end without realizing there was another already.
> For the en_UK.dic there are no duplicates. For en_US.dic there is one 
> duplicate, so I argue this behavior is a bug and not a feature dictionary 
> authors depend upon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to