[ 
https://issues.apache.org/jira/browse/SOLR-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Niemczyk updated SOLR-5153:
----------------------------------

    Description: 
Given the default situation and the example from solr-wiki: 
http://wiki.apache.org/solr/UnicodeCollation
the solr analysis reports strange output for the CKF.
Settings:
{code}
<fieldType name="germanText" class="solr.TextField">
        <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.CollationKeyFilterFactory" language="de" 
strength="primary"/>
        </analyzer>
        <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.CollationKeyFilterFactory" language="de" 
strength="primary"/>
        </analyzer>
</fieldType>

<field name="germanText" type="germanText" indexed="true" stored="false" 
multiValued="true"/>

<copyField source="title" dest="germanText"/>
{code}

Input:
{code}
Peter
{code}

Output:
{code}
WT:  Peter [50 65 74 65 72]
CKF: 1䀖瀅䀃᐀ [31 e4 80 96 c e7 80 85 e4 80 83 e1 90 80 0 0 0]
{code}

  was:
Given the default situation and the example from solr-wiki: 
http://wiki.apache.org/solr/UnicodeCollation
the solr analysis reports strange output for the CKF.
Settings:
{code}
<fieldType name="germanText" class="solr.TextField">
        <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.CollationKeyFilterFactory" language="de" 
strength="primary"/>
        </analyzer>
        <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.CollationKeyFilterFactory" language="de" 
strength="primary"/>
        </analyzer>
</fieldType>

<field name="germanText" type="germanText" indexed="true" stored="false" 
multiValued="true"/>

<copyField source="title" dest="germanText"/>
{code}

Output:
{code}

WT
text
raw_bytes
start
end
position
type
Peter
[50 65 74 65 72]
0
5
1
word
CKF
text
raw_bytes
position
start
end
type
1䀖瀅䀃᐀
[31 e4 80 96 c e7 80 85 e4 80 83 e1 90 80 0 0 0]
1
0
5
word
{code}

    
> CollationKeyFilter returns unexpected output
> --------------------------------------------
>
>                 Key: SOLR-5153
>                 URL: https://issues.apache.org/jira/browse/SOLR-5153
>             Project: Solr
>          Issue Type: Bug
>          Components: SearchComponents - other
>    Affects Versions: 4.3
>         Environment: Mac os x
>            Reporter: Maciej Niemczyk
>
> Given the default situation and the example from solr-wiki: 
> http://wiki.apache.org/solr/UnicodeCollation
> the solr analysis reports strange output for the CKF.
> Settings:
> {code}
> <fieldType name="germanText" class="solr.TextField">
>       <analyzer type="index">
>               <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>               <filter class="solr.CollationKeyFilterFactory" language="de" 
> strength="primary"/>
>       </analyzer>
>       <analyzer type="query">
>               <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>               <filter class="solr.CollationKeyFilterFactory" language="de" 
> strength="primary"/>
>       </analyzer>
> </fieldType>
> <field name="germanText" type="germanText" indexed="true" stored="false" 
> multiValued="true"/>
> <copyField source="title" dest="germanText"/>
> {code}
> Input:
> {code}
> Peter
> {code}
> Output:
> {code}
> WT:  Peter [50 65 74 65 72]
> CKF: 1䀖瀅䀃᐀ [31 e4 80 96 c e7 80 85 e4 80 83 e1 90 80 0 0 0]
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to