Hi all, I’d like to start a discussion about adding case-insensitive matching support to TermsInSetQuery. Currently, Elasticsearch’s terms query, which maps to TermsInSetQuery in Lucene, does not support case insensitivity. This limitation has led to user requests for case-insensitive matching in Elasticsearch (e.g., this issue <https://github.com/elastic/elasticsearch/issues/71520>). Problem Statement
- Unlike TermQuery, which supports case_insensitive: true, TermsInSetQuery does not, meaning users must preprocess their data at index time. - This affects use cases like email lookups, usernames, and case-insensitive identifiers, where exact case preservation is required but searches must remain case insensitive. Proposed Solution - Extend TermsInSetQuery to optionally apply a normalizer (e.g., LowercaseFilter) before executing lookups. - Alternatively, introduce a new query type (e.g., CaseInsensitiveTermsQuery) to handle this efficiently. Considerations - The previous discussion in Elasticsearch mentioned concerns about query expansion if case normalization required rewriting into a BooleanQuery. - A possible mitigation is applying normalization only once per term before execution. Would the team be open to discussing this further? If this approach makes sense, I’d be happy to explore implementation details and submit a proof of concept. Thanks, Will Dickerson