[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

Tomoko Uchida (JIRA) Sat, 10 Aug 2019 07:00:17 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904430#comment-16904430
 ]


Tomoko Uchida commented on SOLR-13593:
--------------------------------------

When running entire test, I encountered a TokenFilterFactory which has "name" 
argument: 
[https://lucene.apache.org/core/8_2_0/analyzers-icu/org/apache/lucene/analysis/icu/ICUNormalizer2FilterFactory.html]

So the field type definition including this filter is like this:
{code:xml}
  <fieldType name="text_icunormalizer2" class="solr.TextField">
    <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.ICUNormalizer2FilterFactory" name="nfkc_cf" 
mode="compose"/>
    </analyzer>
  </fieldType>
{code}
It's incompatible with the changes here of course...

There may be some options.

1. Allow to use "class" and "name" as is (only when the "name" is not a SPI 
name) and use "class" to look up the factory in that case.
 2. Forbid "name" argument in a factory and change existing "name" arguments to 
different ones.
 3. Rethink attribute name to look up factories, because "name" is already 
reserved.

I don't like option 1 - it seems too confusing and makes it's impossible to 
discard "class" attribute in future releases. Also I don't think we should take 
option 3 due to a few anomalistic classes.
 Option 2 would make sense to me, can we fix "name" args in existing factories 
(maybe another LUCENE issue is needed) before proceeding? We may also need to 
delay exposing this feature until Solr 9.0 because it breaks backwards 
compatibility.

[~thetaphi]: Do you have any ideas about that?

> Allow to specify analyzer components by their SPI names in schema definition
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-13593
>                 URL: https://issues.apache.org/jira/browse/SOLR-13593
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Tomoko Uchida
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> <fieldtype name="myfieldtype" class="solr.TextField">
>   <analyzer>
>     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>     <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" 
> />
>     <filter class="solr.PorterStemFilterFactory" />
>   </analyzer>
> </fieldtype>
> {code}
> would be
> {code:xml}
> <fieldtype name="myfieldtype" class="solr.TextField">
>   <analyzer>
>     <tokenizer name="whitespace"/>
>     <filter name="keywordMarker" protected="protwords.txt" />
>     <filter name="porterStem" />
>   </analyzer>
> </fieldtype>
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

Reply via email to