[
https://issues.apache.org/jira/browse/SOLR-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739815#comment-13739815
]
Robert Muir commented on SOLR-5153:
-----------------------------------
For german diacritics, maybe you want to use the
GermanNormalizationFilterFactory (its in the text_de fieldtype in the
solr/example i think).
{code}
/**
* Normalizes German characters according to the heuristics
* of the <a
href="http://snowball.tartarus.org/algorithms/german2/stemmer.html">
* German2 snowball algorithm</a>.
* It allows for the fact that ä, ö and ü are sometimes written as ae, oe and
ue.
* <p>
* <ul>
* <li> 'ß' is replaced by 'ss'
* <li> 'ä', 'ö', 'ü' are replaced by 'a', 'o', 'u', respectively.
* <li> 'ae' and 'oe' are replaced by 'a', and 'o', respectively.
* <li> 'ue' is replaced by 'u', when not following a vowel or q.
* </ul>
* <p>
* This is useful if you want this normalization without using
* the German2 stemmer, or perhaps no stemming at all.
*/
{code}
> CollationKeyFilter returns unexpected output
> --------------------------------------------
>
> Key: SOLR-5153
> URL: https://issues.apache.org/jira/browse/SOLR-5153
> Project: Solr
> Issue Type: Bug
> Components: SearchComponents - other
> Affects Versions: 4.3
> Environment: Mac os x
> Reporter: Maciej Niemczyk
>
> Given the default situation and the example from solr-wiki:
> http://wiki.apache.org/solr/UnicodeCollation
> the solr analysis reports strange output for the CKF.
> Settings:
> {code}
> <fieldType name="germanText" class="solr.TextField">
> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.CollationKeyFilterFactory" language="de"
> strength="primary"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.CollationKeyFilterFactory" language="de"
> strength="primary"/>
> </analyzer>
> </fieldType>
> <field name="germanText" type="germanText" indexed="true" stored="false"
> multiValued="true"/>
> <copyField source="title" dest="germanText"/>
> {code}
> Input:
> {code}
> Peter
> {code}
> Output:
> {code}
> WT: Peter [50 65 74 65 72]
> CKF: 1䀖瀅䀃᐀ [31 e4 80 96 c e7 80 85 e4 80 83 e1 90 80 0 0 0]
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]