[
https://issues.apache.org/jira/browse/SOLR-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13578228#comment-13578228
]
Jan Høydahl commented on SOLR-4452:
-----------------------------------
Think this is ready. Will commit in a day or two unless anyone proves that this
is a needed "feature", not a bug.
> Hunspell stemmer should not merge duplicate dictionary entries
> --------------------------------------------------------------
>
> Key: SOLR-4452
> URL: https://issues.apache.org/jira/browse/SOLR-4452
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Reporter: Jan Høydahl
> Labels: hunspell
> Fix For: 4.2, 5.0
>
> Attachments: SOLR-4452.patch, SOLR-4452.patch
>
>
> Hunspell dictionaries are on the form
> {noformat}
> lucene/ABC
> mahout/X
> {noformat}
> Each word is listed once with is base form, and the flags after the / define
> allowed prefixes and suffixes.
> In HunspellDictionary's parsing logic, if the same base word should appear
> multiple times in the file, the flags from the duplicate entry are added to
> the flags from the existing entry.
> However, HunSpellStemFilterFactory allows for a comma-separated list of
> dictionary files to be passed in, the idea being that you can have your own
> custom extensions and not need to modify the "standard" ones which may change
> upstream once in a while. This feature now works only for NEW words, not for
> overriding existing entries from the first dictionary.
> Would like to change this behavior, so that the last line read overwrites any
> previous one. This will both fix the custom dictionary issue and also fix
> unintentional wrong original dictionaries, where someone added a word
> definition at the end without realizing there was another already.
> For the en_UK.dic there are no duplicates. For en_US.dic there is one
> duplicate, so I argue this behavior is a bug and not a feature dictionary
> authors depend upon.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]