[ https://issues.apache.org/jira/browse/SOLR-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jan Høydahl updated SOLR-4452: ------------------------------ Attachment: SOLR-4452.patch New patch with testcase > Hunspell stemmer should not merge duplicate dictionary entries > -------------------------------------------------------------- > > Key: SOLR-4452 > URL: https://issues.apache.org/jira/browse/SOLR-4452 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis > Reporter: Jan Høydahl > Labels: hunspell > Fix For: 4.2, 5.0 > > Attachments: SOLR-4452.patch, SOLR-4452.patch > > > Hunspell dictionaries are on the form > {noformat} > lucene/ABC > mahout/X > {noformat} > Each word is listed once with is base form, and the flags after the / define > allowed prefixes and suffixes. > In HunspellDictionary's parsing logic, if the same base word should appear > multiple times in the file, the flags from the duplicate entry are added to > the flags from the existing entry. > However, HunSpellStemFilterFactory allows for a comma-separated list of > dictionary files to be passed in, the idea being that you can have your own > custom extensions and not need to modify the "standard" ones which may change > upstream once in a while. This feature now works only for NEW words, not for > overriding existing entries from the first dictionary. > Would like to change this behavior, so that the last line read overwrites any > previous one. This will both fix the custom dictionary issue and also fix > unintentional wrong original dictionaries, where someone added a word > definition at the end without realizing there was another already. > For the en_UK.dic there are no duplicates. For en_US.dic there is one > duplicate, so I argue this behavior is a bug and not a feature dictionary > authors depend upon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org