[jira] [Updated] (SOLR-14434) Add documentation for adding multiterm analyzers in Schema API

Trey Grainger (Jira) Thu, 23 Apr 2020 23:17:44 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-14434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Trey Grainger updated SOLR-14434:
---------------------------------
    Description: 
Originally this was filed as a bug report, but upon further inspection I 
realized the usage was just undocumented and just a result of inconsistent 
property name (casing) between the XML and JSON. Changing this to a Jira to add 
documentation so others don't run into this issue in the future.

Also need to document that the "analysis/field" API ignores {{multiterm}} 
analysis and thus doesn't reflect the full nature of incoming queries. This has 
been an annoying quirk for years and I think would be worth fixing, but for now 
we should at least document it.

--------------

In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an 
explicit "{{multiterm}}" analyzer to schema {{fieldType}} definitions. This 
allows for specific control over analysis for things like wildcard terms, 
prefix queries, range queries, etc. For example, the following would cause the 
wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of 
"{{hats*}}", and thus match on the indexed version of "{{hat}}".
{code:java}
  <fieldType class="solr.TextField" multiValued="true" name="multiterm_test" 
positionIncrementGap="100" termOffsets="true" termVectors="true">
    <analyzer type="index">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" 
ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
    <analyzer type="multiterm">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
  </fieldType>{code}
In the xml version this analyzer is called "{{multiterm}}", whereas it's 
"{{multiTerm}}" in the JsonAPI. This isn't in the documentation anywhere and 
just cost me a bunch of time debugging through the code until I finally found 
what was going on. Using this ticket to add better documentation around usage 
and gotchas around this feature.

 

  was:
Originally this was filed as a bug report, but upon further inspection I 
realized the usage was just undocumented and just a result of inconsistent 
property name (casing) between the XML and JSON. Changing this to a Jira to add 
documentation so others don't run into this issue in the future.

--------------

In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an 
explicit "{{multiterm}}" analyzer to schema {{fieldType}} definitions. This 
allows for specific control over analysis for things like wildcard terms, 
prefix queries, range queries, etc. For example, the following would cause the 
wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of 
"{{hats*}}", and thus match on the indexed version of "{{hat}}".
{code:java}
  <fieldType class="solr.TextField" multiValued="true" name="multiterm_test" 
positionIncrementGap="100" termOffsets="true" termVectors="true">
    <analyzer type="index">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" 
ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
    <analyzer type="multiterm">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
  </fieldType>{code}
In the  

 


> Add documentation for adding multiterm analyzers in Schema API
> --------------------------------------------------------------
>
>                 Key: SOLR-14434
>                 URL: https://issues.apache.org/jira/browse/SOLR-14434
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Schema and Analysis
>    Affects Versions: 8.0, 8.1, 8.2, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1
>            Reporter: Trey Grainger
>            Priority: Major
>
> Originally this was filed as a bug report, but upon further inspection I 
> realized the usage was just undocumented and just a result of inconsistent 
> property name (casing) between the XML and JSON. Changing this to a Jira to 
> add documentation so others don't run into this issue in the future.
> Also need to document that the "analysis/field" API ignores {{multiterm}} 
> analysis and thus doesn't reflect the full nature of incoming queries. This 
> has been an annoying quirk for years and I think would be worth fixing, but 
> for now we should at least document it.
> --------------
> In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an 
> explicit "{{multiterm}}" analyzer to schema {{fieldType}} definitions. This 
> allows for specific control over analysis for things like wildcard terms, 
> prefix queries, range queries, etc. For example, the following would cause 
> the wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of 
> "{{hats*}}", and thus match on the indexed version of "{{hat}}".
> {code:java}
>   <fieldType class="solr.TextField" multiValued="true" name="multiterm_test" 
> positionIncrementGap="100" termOffsets="true" termVectors="true">
>     <analyzer type="index">
>       <tokenizer class="solr.ClassicTokenizerFactory"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.EnglishMinimalStemFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.ClassicTokenizerFactory"/>
>       <filter class="solr.SynonymGraphFilterFactory" expand="true" 
> ignoreCase="true" synonyms="synonyms.txt"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.EnglishMinimalStemFilterFactory"/>
>     </analyzer>
>     <analyzer type="multiterm">
>       <tokenizer class="solr.ClassicTokenizerFactory"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.EnglishMinimalStemFilterFactory"/>
>     </analyzer>
>   </fieldType>{code}
> In the xml version this analyzer is called "{{multiterm}}", whereas it's 
> "{{multiTerm}}" in the JsonAPI. This isn't in the documentation anywhere and 
> just cost me a bunch of time debugging through the code until I finally found 
> what was going on. Using this ticket to add better documentation around usage 
> and gotchas around this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14434) Add documentation for adding multiterm analyzers in Schema API

Reply via email to