[jira] [Updated] (SOLR-14434) Add documentation for adding multiterm analyzers in Schema API

Trey Grainger (Jira) Thu, 23 Apr 2020 23:10:43 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-14434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Trey Grainger updated SOLR-14434:
---------------------------------
    Description: 
Originally this was filed as a bug report, but upon further inspection I 
realized the usage was just undocumented and just a result of inconsistent 
property name (casing) between the XML and JSON. Changing this to a Jira to add 
documentation so others don't run into this issue in the future.

--------------

In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an 
explicit "{{multiterm}}" analyzer to schema {{fieldType}} definitions. This 
allows for specific control over analysis for things like wildcard terms, 
prefix queries, range queries, etc. For example, the following would cause the 
wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of 
"{{hats*}}", and thus match on the indexed version of "{{hat}}".
{code:java}
  <fieldType class="solr.TextField" multiValued="true" name="multiterm_test" 
positionIncrementGap="100" termOffsets="true" termVectors="true">
    <analyzer type="index">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" 
ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
    <analyzer type="multiterm">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
  </fieldType>{code}
In the  

 

  was:
In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an 
explicit "{{multiterm}}" analyzer to schema {{fieldType}} definitions. This 
allows for specific control over analysis for things like wildcard terms, 
prefix queries, range queries, etc. For example, the following would cause the 
wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of 
"{{hats*}}", and thus match on the indexed version of "{{hat}}".
{code:java}
  <fieldType class="solr.TextField" multiValued="true" name="multiterm_test" 
positionIncrementGap="100" termOffsets="true" termVectors="true">
    <analyzer type="index">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" 
ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
    <analyzer type="multiterm">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
  </fieldType>{code}
This works fine if using a non-managed schema (i.e. {{schema.xml}} file) OR if 
you use managed schema (i.e. {{managed-schema}} file) and push your schema 
directly to Zookeeper. However, the {{multiterm}} analyzers are not being 
persisted (only {{index}} and {{query}} analyzers are) if you pass them in 
through the API.

 


> Add documentation for adding multiterm analyzers in Schema API
> --------------------------------------------------------------
>
>                 Key: SOLR-14434
>                 URL: https://issues.apache.org/jira/browse/SOLR-14434
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Schema and Analysis
>    Affects Versions: 8.0, 8.1, 8.2, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1
>            Reporter: Trey Grainger
>            Priority: Major
>
> Originally this was filed as a bug report, but upon further inspection I 
> realized the usage was just undocumented and just a result of inconsistent 
> property name (casing) between the XML and JSON. Changing this to a Jira to 
> add documentation so others don't run into this issue in the future.
> --------------
> In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an 
> explicit "{{multiterm}}" analyzer to schema {{fieldType}} definitions. This 
> allows for specific control over analysis for things like wildcard terms, 
> prefix queries, range queries, etc. For example, the following would cause 
> the wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of 
> "{{hats*}}", and thus match on the indexed version of "{{hat}}".
> {code:java}
>   <fieldType class="solr.TextField" multiValued="true" name="multiterm_test" 
> positionIncrementGap="100" termOffsets="true" termVectors="true">
>     <analyzer type="index">
>       <tokenizer class="solr.ClassicTokenizerFactory"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.EnglishMinimalStemFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.ClassicTokenizerFactory"/>
>       <filter class="solr.SynonymGraphFilterFactory" expand="true" 
> ignoreCase="true" synonyms="synonyms.txt"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.EnglishMinimalStemFilterFactory"/>
>     </analyzer>
>     <analyzer type="multiterm">
>       <tokenizer class="solr.ClassicTokenizerFactory"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.EnglishMinimalStemFilterFactory"/>
>     </analyzer>
>   </fieldType>{code}
> In the  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14434) Add documentation for adding multiterm analyzers in Schema API

Reply via email to