[ https://issues.apache.org/jira/browse/SOLR-14434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Trey Grainger updated SOLR-14434: --------------------------------- Description: Originally this was filed as a bug report, but upon further inspection I realized the usage was just undocumented and just a result of inconsistent property name (casing) between the XML and JSON. Changing this to a Jira to add documentation so others don't run into this issue in the future. Also need to document that the "analysis/field" API ignores {{multiterm}} analysis and thus doesn't reflect the full nature of incoming queries. This has been an annoying quirk for years and I think would be worth fixing, but for now we should at least document it. -------------- In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an explicit "{{multiterm}}" analyzer to schema {{fieldType}} definitions. This allows for specific control over analysis for things like wildcard terms, prefix queries, range queries, etc. For example, the following would cause the wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of "{{hats*}}", and thus match on the indexed version of "{{hat}}". {code:java} <fieldType class="solr.TextField" multiValued="true" name="multiterm_test" positionIncrementGap="100" termOffsets="true" termVectors="true"> <analyzer type="index"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishMinimalStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishMinimalStemFilterFactory"/> </analyzer> <analyzer type="multiterm"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishMinimalStemFilterFactory"/> </analyzer> </fieldType>{code} In the xml version this analyzer is called "{{multiterm}}", whereas it's "{{multiTerm}}" in the JsonAPI. This isn't in the documentation anywhere and just cost me a bunch of time debugging through the code until I finally found what was going on. Using this ticket to add better documentation around usage and gotchas around this feature. was: Originally this was filed as a bug report, but upon further inspection I realized the usage was just undocumented and just a result of inconsistent property name (casing) between the XML and JSON. Changing this to a Jira to add documentation so others don't run into this issue in the future. -------------- In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an explicit "{{multiterm}}" analyzer to schema {{fieldType}} definitions. This allows for specific control over analysis for things like wildcard terms, prefix queries, range queries, etc. For example, the following would cause the wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of "{{hats*}}", and thus match on the indexed version of "{{hat}}". {code:java} <fieldType class="solr.TextField" multiValued="true" name="multiterm_test" positionIncrementGap="100" termOffsets="true" termVectors="true"> <analyzer type="index"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishMinimalStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishMinimalStemFilterFactory"/> </analyzer> <analyzer type="multiterm"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishMinimalStemFilterFactory"/> </analyzer> </fieldType>{code} In the > Add documentation for adding multiterm analyzers in Schema API > -------------------------------------------------------------- > > Key: SOLR-14434 > URL: https://issues.apache.org/jira/browse/SOLR-14434 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Schema and Analysis > Affects Versions: 8.0, 8.1, 8.2, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1 > Reporter: Trey Grainger > Priority: Major > > Originally this was filed as a bug report, but upon further inspection I > realized the usage was just undocumented and just a result of inconsistent > property name (casing) between the XML and JSON. Changing this to a Jira to > add documentation so others don't run into this issue in the future. > Also need to document that the "analysis/field" API ignores {{multiterm}} > analysis and thus doesn't reflect the full nature of incoming queries. This > has been an annoying quirk for years and I think would be worth fixing, but > for now we should at least document it. > -------------- > In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an > explicit "{{multiterm}}" analyzer to schema {{fieldType}} definitions. This > allows for specific control over analysis for things like wildcard terms, > prefix queries, range queries, etc. For example, the following would cause > the wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of > "{{hats*}}", and thus match on the indexed version of "{{hat}}". > {code:java} > <fieldType class="solr.TextField" multiValued="true" name="multiterm_test" > positionIncrementGap="100" termOffsets="true" termVectors="true"> > <analyzer type="index"> > <tokenizer class="solr.ClassicTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishMinimalStemFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.ClassicTokenizerFactory"/> > <filter class="solr.SynonymGraphFilterFactory" expand="true" > ignoreCase="true" synonyms="synonyms.txt"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishMinimalStemFilterFactory"/> > </analyzer> > <analyzer type="multiterm"> > <tokenizer class="solr.ClassicTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishMinimalStemFilterFactory"/> > </analyzer> > </fieldType>{code} > In the xml version this analyzer is called "{{multiterm}}", whereas it's > "{{multiTerm}}" in the JsonAPI. This isn't in the documentation anywhere and > just cost me a bunch of time debugging through the code until I finally found > what was going on. Using this ticket to add better documentation around usage > and gotchas around this feature. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org