Dawid Weiss created LUCENE-10220:
------------------------------------

             Summary: Add an utility method to get IntervalSource from analyzed 
text (or token stream)
                 Key: LUCENE-10220
                 URL: https://issues.apache.org/jira/browse/LUCENE-10220
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Dawid Weiss
            Assignee: Dawid Weiss


The Intervals has a number of utility methods that provide an IntervalSource 
for tokens, phrases, etc. But it's missing an important bit: an interval source 
matching tokens that are a result of some string applied to a full analysis 
chain. This corresponds to actually resides in the index and is hard to predict 
from the outside.

This is an important omission in Intervals as a utility class.

I borrowed the implementation from the then-ASL-licensed Elasticsearch code at: 

[https://github.com/elastic/elasticsearch/blob/7.10/server/src/main/java/org/elasticsearch/index/query/IntervalBuilder.java#L54-L106]

I also modified it slightly to fit the static-method-based Lucene API. I also 
added a small test that showcases how this method can be used in practice (and 
why it's hard to accomplish the same result with existing methods).

The only thing I'm not sure is how to attribute Elasticsearch properly - in the 
notice file, perhaps?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to