[
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Oliver Schihin updated SOLR-2921:
---------------------------------
Comment: was deleted
(was: Am I off topic, or is ICUCollationKeyFilterFactory a candidate, as well?)
> Make any Filters, Tokenizers and CharFilters implement
> MultiTermAwareComponent if they should
> ---------------------------------------------------------------------------------------------
>
> Key: SOLR-2921
> URL: https://issues.apache.org/jira/browse/SOLR-2921
> Project: Solr
> Issue Type: Improvement
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Environment: All
> Reporter: Erick Erickson
> Assignee: Erick Erickson
> Priority: Minor
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-2921-3x.patch, SOLR-2921-3x.patch,
> SOLR-2921-3x.patch, SOLR-2921-trunk.patch, SOLR-2921_rest.patch
>
>
> SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr
> to automatically assemble a "multiterm" analyzer that does the right thing
> vis-a-vis transforming the individual terms of a multi-term query at query
> time. Examples are: lower casing, folding accents, etc. Currently
> (27-Nov-2011), the following classes implement MultiTermAwareComponent:
> * ASCIIFoldingFilterFactory
> * LowerCaseFilterFactory
> * LowerCaseTokenizerFactory
> * MappingCharFilterFactory
> * PersianCharFilterFactory
> When users put any of the above in their query analyzer, Solr will "do the
> right thing" at query time and the perennial question users have, "why didn't
> my wildcard query automatically lower-case (or accent fold or....) my terms?"
> will be gone. Die question die!
> But taking a quick look, for instance, at the various FilterFactories that
> exist, there are a number of possibilities that *might* be good candidates
> for implementing MultiTermAwareComponent. But I really don't understand the
> correct behavior here well enough to know whether these should implement the
> interface or not. And this doesn't include other CharFilters or Tokenizers.
> Actually implementing the interface is often trivial, see the classes above
> for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which
> is the right thing in this case.
> Here is a quick cull of the Filters that, just from their names, might be
> candidates. If anyone wants to take any of them on, that would be great. If
> all you can do is provide test cases, I could probably do the code part, just
> let me know.
> ArabicNormalizationFilterFactory
> GreekLowerCaseFilterFactory
> HindiNormalizationFilterFactory
> ICUFoldingFilterFactory
> ICUNormalizer2FilterFactory
> ICUTransformFilterFactory
> IndicNormalizationFilterFactory
> ISOLatin1AccentFilterFactory
> PersianNormalizationFilterFactory
> RussianLowerCaseFilterFactory
> TurkishLowerCaseFilterFactory
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]