[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

Tomoko Uchida (JIRA) Wed, 03 Jul 2019 23:51:36 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878369#comment-16878369
 ]


Tomoko Uchida commented on SOLR-13593:
--------------------------------------

I've opened a draft pull request: 
[https://github.com/apache/lucene-solr/pull/761]. (Not yet tested.) I'm new to 
Solr schema handling, please feel free to add comments if I missed something.

This accepts SPI names when loading bundled managed-schema and calling REST API.

managed-schema example:
{code:xml}
<fieldType name="text_fa_spi" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <!-- for ZWNJ -->
        <charFilter spi="persian"/>
        <tokenizer spi="standard"/>
        <filter spi="lowercase"/>
        <filter spi="arabicNormalization"/>
        <filter spi="persianNormalization"/>
        <filter spi="stop" ignoreCase="true" words="lang/stopwords_fa.txt" />
    </analyzer>
</fieldType>
{code}
REST API example:
{code:java}
curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field-type" : {
     "name":"myNewTxtField",
     "class":"solr.TextField",
     "positionIncrementGap":"100",
     "analyzer" : {
        "charFilters":[{
           "spi":"htmlStrip"
        }],
        "tokenizer":{
           "spi":"whitespace" },
        "filters":[{
           "spi":"lowercase"
        }]}}
}' http://localhost:8983/solr/techproducts/schema
{code}

> Allow to specify analyzer components by their SPI names in schema definition
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-13593
>                 URL: https://issues.apache.org/jira/browse/SOLR-13593
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Schema and Analysis
>            Reporter: Tomoko Uchida
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> <fieldtype name="myfieldtype" class="solr.TextField">
>   <analyzer>
>     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>     <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" 
> />
>     <filter class="solr.PorterStemFilterFactory" />
>   </analyzer>
> </fieldtype>
> {code}
> would be
> {code:xml}
> <fieldtype name="myfieldtype" class="solr.TextField">
>   <analyzer>
>     <tokenizer name="whitespace"/>
>     <filter name="keywordMarker" protected="protwords.txt" />
>     <filter name="porterStem" />
>   </analyzer>
> </fieldtype>
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

Reply via email to