[ 
https://issues.apache.org/jira/browse/SOLR-12655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604160#comment-16604160
 ] 

Jim Ferenczi commented on SOLR-12655:
-------------------------------------

[~y100421] we use the mecab-ko-dic-2.0.3-20170922 version for the build. 
mecab-ko-dic-2.0.1-20150920 has a different list of POS tags (UNT tag is not 
present in 2.0.3) and some POS tags have a different id so you'll need to 
modify the source to fix the build. If you add UNT to the list of POS tags and 
change line 35 of the UnknownDictionaryBuilder to:

{code:java}
private static final String NGRAM_DICTIONARY_ENTRY = 
"NGRAM,1801,3561,3668,SY,*,*,*,*,*,*,*";
{code}

... the build should work. We need this entry to annotate the ngrams that we 
add if the word is not recognized but the leftId, rightID for the SY POS tag 
has changed between 2.0.3 and 2.0.1. We could apply this switch automatically 
but can you explain why you need to use the old version of the dictionary 
instead of the new one ?


> Add Korean analyzer JAR file (NORI) and schema.xml example to Solr
> ------------------------------------------------------------------
>
>                 Key: SOLR-12655
>                 URL: https://issues.apache.org/jira/browse/SOLR-12655
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Build, Schema and Analysis
>    Affects Versions: 7.4
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Major
>             Fix For: master (8.0), 7.5
>
>         Attachments: SOLR-12655.patch, image-2018-09-05-17-42-09-983.png, 
> screenshot-1.png
>
>
> In Lucene 7.4 we added the NORI analyzer for Korean. In contrast to Kuromoji, 
> the JAR file is missing in the distribution (the analyzers-kuromoji is part 
> of main solr distribution). We should also add an updated/new "text_ko" field 
> in the default schema.
> See also SOLR-12255 about the documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to