Working on Intervals QP https://github.com/apache/solr/pull/4582 Stay tuned. Feedback is welcome!
On Fri, Sep 12, 2025 at 10:43 PM Mikhail Khludnev <[email protected]> wrote: > I've checked the surround parser. Turns out it lacks braces support. > I've also added a reproducer for nested spans issue, which intervals are > able to handle > > https://github.com/mkhludnev/solr-flexible-qparser/blob/860e17c16153b1d3ef337f099b0d9f572620e9b1/src/test/java/org/apache/solr/flexibleqp/TestCompeteWithSpans.java#L49 > > > On Tue, Sep 9, 2025 at 1:12 PM Mikhail Khludnev <[email protected]> wrote: > >> Right. complexphrase is not an option for nesting. >> I'm wondering if you encounter >> https://issues.apache.org/jira/browse/LUCENE-7398 Let us know please if >> you do. >> I'm interested in whether intervals are an option for such cases. >> >> On Mon, Sep 8, 2025 at 6:31 PM Matt Kuiper <[email protected]> wrote: >> >>> Thanks for the feedback! >>> >>> Mikhail - I did not see the complex query parser supporting proximity >>> between 2 phrases, however the XmlQParser might via spans. Thanks for >>> the >>> tip! >>> >>> Gus - we currently use the Surround query parser for proximity between >>> two >>> terms. Do you know of a means to use it for proximity between phrases? >>> This would be ideal as we have a search client tool already using this >>> syntax. >>> >>> Dave - This type of approach might work for us (possibly like the complex >>> query parser) where it is not exactly finding proximity between two >>> phrases. But verifying that all the worlds within two phrases are >>> within a >>> proximity range. As you say this could handle stop words that may still >>> be >>> in the index from not blocking a match. >>> >>> Matt >>> >>> On Mon, Sep 8, 2025 at 7:29 AM Dave <[email protected]> >>> wrote: >>> >>> > There are other clever ways to do it too, using the within parameter, >>> and >>> > other things I don’t remember off the top of my head but I gave a >>> > presentation a few years ago that utilized it. It uses more raw solr >>> > parameters that you can take in a phrase but tokenize them and find out >>> > documents that have that phrase but may have words inside them, so you >>> > restrict the results to only documents that have all the words in the >>> > phrase but within that number of words plus 2 or 3 to take care of stop >>> > words that may show up, like “red house hill” would still find “red >>> house >>> > on top of the hill” within a proximity to each other of about 7. >>> > >>> > > On Sep 7, 2025, at 7:15 PM, Gus Heck <[email protected]> wrote: >>> > > >>> > > Or >>> > > >>> > >>> https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#surround-query-parser >>> > > >>> > >> On Sun, Sep 7, 2025 at 4:32 PM Mikhail Khludnev <[email protected]> >>> > wrote: >>> > >> >>> > >> Hi >>> > >> I might be missing a point. But the way to create spans in Solr are: >>> > >> >>> > >> >>> > >>> https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#xml-query-parser >>> > >> >>> > >> >>> > >>> https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#complex-phrase-query-parser >>> > >> >>> > >> >>> > >>> On Fri, Sep 5, 2025 at 6:32 PM mtn search <[email protected]> >>> wrote: >>> > >>> >>> > >>> I may have found what I am running up against - if Chatgpt is >>> correct >>> > >>> on diagnosis? >>> > >>> >>> > >>> *My sample query* >>> > >>> /select?debug=true&indent=true&q={!lucene}spanNear( >>> > >>> spanNear(spanTerm(body:separate),spanTerm(body:email),0,true), >>> > >>> spanNear(spanTerm(body:will),spanTerm(body:be),0,true), >>> > >>> 10,false) >>> > >>> >>> > >>> *Text from body field from a message where the messages is returned >>> > from >>> > >>> the spanNear query above (I believe incorrectly)* >>> > >>> "separate device there will not be any load on the email >>> servers" >>> > >>> >>> > >>> *Same text through analyzer* >>> > >>> text >>> > >>> raw_bytes >>> > >>> start >>> > >>> end >>> > >>> >>> > >>> >>> > >>> separate >>> > >>> [73 65 70 61 72 61 74 65] >>> > >>> 5 >>> > >>> 13 >>> > >>> >>> > >>> device >>> > >>> [64 65 76 69 63 65] >>> > >>> 14 >>> > >>> 20 >>> > >>> >>> > >>> there >>> > >>> [74 68 65 72 65] >>> > >>> 21 >>> > >>> 26 >>> > >>> >>> > >>> will >>> > >>> [77 69 6c 6c] >>> > >>> 27 >>> > >>> 31 >>> > >>> >>> > >>> not >>> > >>> [6e 6f 74] >>> > >>> 32 >>> > >>> 35 >>> > >>> >>> > >>> be >>> > >>> [62 65] >>> > >>> 36 >>> > >>> 38 >>> > >>> >>> > >>> any >>> > >>> [61 6e 79] >>> > >>> 39 >>> > >>> 42 >>> > >>> >>> > >>> load >>> > >>> [6c 6f 61 64] >>> > >>> 43 >>> > >>> 47 >>> > >>> >>> > >>> on >>> > >>> [6f 6e] >>> > >>> 48 >>> > >>> 50 >>> > >>> >>> > >>> the >>> > >>> [74 68 65] >>> > >>> 51 >>> > >>> 54 >>> > >>> >>> > >>> email >>> > >>> [65 6d 61 69 6c] >>> > >>> 55 >>> > >>> 60 >>> > >>> >>> > >>> server >>> > >>> [73 65 72 76 65 72] >>> > >>> 61 >>> > >>> 68 >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> *Chatgpt assessment* >>> > >>> >>> > >>> Now, let’s check the spans: >>> > >>> >>> > >>> - >>> > >>> >>> > >>> Inner spanNear(separate, email, 0, true) is *not* going to match >>> > >>> directly, because email isn’t right after separate. >>> > >>> - >>> > >>> >>> > >>> But Lucene is allowed to *reposition* the spans when used as >>> children >>> > >> of >>> > >>> the outer spanNear. Each child span doesn’t need to be contiguous >>> > >> unless >>> > >>> it resolves to a valid match somewhere in the text. >>> > >>> >>> > >>> *Conclusion: *This last line may explain why the message above was >>> > >> returned >>> > >>> by the query above, but appears to be incorrect. While the >>> > words/tokens >>> > >> in >>> > >>> the query are in the message they do not honor the proximity >>> specified. >>> > >>> But apparently children spans do not have to honor the proximity >>> rules >>> > >>> specified. AI suggested this query for proximity, I am now >>> concluding >>> > it >>> > >>> is not a valid approach. >>> > >>> >>> > >>> I am not seeing a Solr/Lucene http query approach for a proximity >>> > search >>> > >>> between phrases, other than possibly to use the Lucene Java API >>> for >>> > more >>> > >>> control. >>> > >>> >>> > >>> If others have found a workable solution, please let me know. >>> > >>> >>> > >>> Thanks, >>> > >>> Matt >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>>> On Thu, Sep 4, 2025 at 3:26 PM mtn search <[email protected]> >>> > wrote: >>> > >>> >>> > >>>> Also, I am using the SolrAdmin Analysis UI to verify how Solr is >>> > >>>> tokenizing the messages and verifying manually position between >>> > tokens. >>> > >>>> >>> > >>>> Debug view of the query side: >>> > >>>> For query: >>> > >>>> "*params*":{ >>> > >>>> "q":"{!lucene}SpanNearQuery(body,(money question),5,true)", >>> > >>>> "df":"body", >>> > >>>> "debug":"true", >>> > >>>> "indent":"true", >>> > >>>> "q.op":"OR", >>> > >>>> "wt":"json"}}, >>> > >>>> >>> > >>>> It seems odd that in the parsed query that the "body" field named >>> is >>> > >>>> pre-appended to the value 5 and the text true. >>> > >>>> "*debug*":{ >>> > >>>> "rawquerystring":"{!lucene}SpanNearQuery(body,(money >>> > >>>> question),5,true)", >>> > >>>> "querystring":"{!lucene}SpanNearQuery(body,(money >>> > >> question),5,true)", >>> > >>>> "parsedquery":"body:spannearquery (body:body (body:money >>> > >>>> body:question) (body:5 body:true))", >>> > >>>> "*parsedquery_toString*":*"body:spannearquery *(body:body >>> > >> (body:money >>> > >>>> body:question)* (body:5 body:true*))", >>> > >>>> "explain":{ >>> > >>>> >>> > >>>> On Thu, Sep 4, 2025 at 12:04 PM mtn search <[email protected]> >>> > >> wrote: >>> > >>>> >>> > >>>>> Thanks Tim! Yes I have tried a variety of values and am aware >>> > >>>>> of ordering vs non ordering. I am getting more results than >>> expected >>> > >>> and >>> > >>>>> some that do not match the proximity criteria. So when I set >>> it to >>> > a >>> > >>>>> small value like 2, I was seeking to see the result count drop >>> > >>>>> significantly as many would not match criteria. Unfortunately, >>> the >>> > >>> count >>> > >>>>> does not drop. Looks like a fundamental problem with how I am >>> using >>> > >>> the >>> > >>>>> syntax. Still researching, and open to suggestions. >>> > >>>>> >>> > >>>>> Matt >>> > >>>>> >>> > >>>>> On Thu, Sep 4, 2025 at 11:54 AM Tim Casey <[email protected]> >>> wrote: >>> > >>>>> >>> > >>>>>> usually the span and proximities are off-by-one issues. >>> > Specifically >>> > >>> the >>> > >>>>>> order of the tokens will change the distance calculation. I do >>> not >>> > >>> have >>> > >>>>>> an >>> > >>>>>> example off the top of my head. But, when I was doing this, I >>> > >> usually >>> > >>>>>> started with a larger span and brought it down through looking >>> at >>> > >>>>>> results. >>> > >>>>>> >>> > >>>>>> This is the case for the old 5~"phrase words" syntax. >>> > >>>>>> >>> > >>>>>> As an aside, "Not working" is taken by me to mean you are not >>> > getting >>> > >>>>>> results but the query passes parse. Not working could mean a >>> lot >>> > >> more >>> > >>> in >>> > >>>>>> this context. So I am suggesting, instead of 2, try 10. >>> > >>>>>> >>> > >>>>>> On Thu, Sep 4, 2025 at 10:43 AM mtn search <[email protected] >>> > >>> > >>> wrote: >>> > >>>>>> >>> > >>>>>>> Hello, >>> > >>>>>>> >>> > >>>>>>> Looking for guidance on approaches to implement a proximity >>> search >>> > >>>>>> between >>> > >>>>>>> phrases. >>> > >>>>>>> >>> > >>>>>>> Initially tried: >>> > >>>>>>> >>> > >>>>>>> >>> > >>>>>> >>> > >>> >>> > >> >>> > >>> "q":"{!lucene}spanNear(spanNear(spanNear(spanTerm(body:off),spanTerm(body:the),0,true), >>> > >>>>>>> spanTerm(body: record),0,true), >>> > >>>>>> spanNear(spanTerm(body:new),spanTerm(body: >>> > >>>>>>> information),0,true) , 2N,false)", >>> > >>>>>>> "defType":"lucene", >>> > >>>>>>> "df":"body", >>> > >>>>>>> >>> > >>>>>>> However then simplified to just two terms: >>> > >>>>>>> >>> > >>> >>> "q":"{!lucene}spanNear(spanTerm(body:off),spanTerm(body:call),2,true)", >>> > >>>>>>> "defType":"lucene", >>> > >>>>>>> "df":"body", >>> > >>>>>>> >>> > >>>>>>> Both are not working. Any tips? Currently on Solr 9.4, but >>> will >>> > >>>>>> likely >>> > >>>>>>> need to run for some time on a Solr 6 instance. >>> > >>>>>>> >>> > >>>>>>> Thanks, >>> > >>>>>>> Matt >>> > >>>>>>> >>> > >>>>>> >>> > >>>>> >>> > >>> >>> > >> >>> > >> >>> > >> -- >>> > >> Sincerely yours >>> > >> Mikhail Khludnev >>> > >> >>> > > >>> > > >>> > > -- >>> > > http://www.needhamsoftware.com (work) >>> > > https://a.co/d/b2sZLD9 (my fantasy fiction book) >>> > >>> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> > > > -- > Sincerely yours > Mikhail Khludnev > -- Sincerely yours Mikhail Khludnev
