[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

Tomoko Uchida (JIRA) Sun, 11 Aug 2019 00:08:02 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904584#comment-16904584
 ]


Tomoko Uchida commented on SOLR-13593:
--------------------------------------

ICU factory "name" argument was changed to "form" on the master branch, so the 
factories can be looked up by names (with "form" attributes to specify 
normalization form) like this:
{code:xml}
    <fieldType name="text_ws_icucf" class="solr.TextField" 
positionIncrementGap="100">
      <analyzer>
        <charFilter name="icuNormalizer2" form="nfkc"/>
        <tokenizer name="whitespace"/>
      </analyzer>
    </fieldType>

    <fieldType name="text_ws_icutf" class="solr.TextField" 
positionIncrementGap="100">
      <analyzer>
        <tokenizer name="whitespace"/>
        <filter name="icuNormalizer2" form="nfkc"/>
      </analyzer>
    </fieldType>
{code}
Corresponding field types using "class" are:
{code:xml}
    <fieldType name="text_ws_icucf" class="solr.TextField" 
positionIncrementGap="100">
      <analyzer>
        <charFilter class="solr.ICUNormalizer2CharFilterFactory" form="nfkc"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>

    <fieldType name="text_ws_icutf" class="solr.TextField" 
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.ICUNormalizer2FilterFactory" form="nfkc" 
mode="compose"/>
      </analyzer>
    </fieldType>
{code}
This works for me and the branch passed entire test. I will merge the all 
changes to the master branch soon.

> Allow to specify analyzer components by their SPI names in schema definition
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-13593
>                 URL: https://issues.apache.org/jira/browse/SOLR-13593
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Tomoko Uchida
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> <fieldtype name="myfieldtype" class="solr.TextField">
>   <analyzer>
>     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>     <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" 
> />
>     <filter class="solr.PorterStemFilterFactory" />
>   </analyzer>
> </fieldtype>
> {code}
> would be
> {code:xml}
> <fieldtype name="myfieldtype" class="solr.TextField">
>   <analyzer>
>     <tokenizer name="whitespace"/>
>     <filter name="keywordMarker" protected="protwords.txt" />
>     <filter name="porterStem" />
>   </analyzer>
> </fieldtype>
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

Reply via email to