Hi all,

I’d like to start a discussion about adding case-insensitive matching
support to TermsInSetQuery. Currently, Elasticsearch’s terms query, which
maps to TermsInSetQuery in Lucene, does not support case insensitivity.
This limitation has led to user requests for case-insensitive matching in
Elasticsearch (e.g., this issue
<https://github.com/elastic/elasticsearch/issues/71520>).
Problem Statement

   - Unlike TermQuery, which supports case_insensitive: true,
   TermsInSetQuery does not, meaning users must preprocess their data at
   index time.
   - This affects use cases like email lookups, usernames, and
   case-insensitive identifiers, where exact case preservation is required but
   searches must remain case insensitive.

Proposed Solution

   - Extend TermsInSetQuery to optionally apply a normalizer (e.g.,
   LowercaseFilter) before executing lookups.
   - Alternatively, introduce a new query type (e.g.,
   CaseInsensitiveTermsQuery) to handle this efficiently.

Considerations

   - The previous discussion in Elasticsearch mentioned concerns about
   query expansion if case normalization required rewriting into a
   BooleanQuery.
   - A possible mitigation is applying normalization only once per term
   before execution.

Would the team be open to discussing this further? If this approach makes
sense, I’d be happy to explore implementation details and submit a proof of
concept.

Thanks,
Will Dickerson

Reply via email to