I *think* it would be possible to write an IntervalsSource implementation that 
took opening and closing tags, and did the right thing here - as you say, a 
standard `contains` will try and minimise things, but you could write something 
that attempted to match an opening tag with its corresponding closing tag by 
taking into account how many other opening tags there are before the next 
closing tag.  You’d need to do some caching to handle the look-ahead aspect but 
I don’t think that would be too tricky.

It’s a fun idea to think about, I’ll see if I can come up with something over 
the weekend :)

> On 6 May 2022, at 10:22, Mikhail Khludnev <m...@apache.org> wrote:
> 
> Hi Devs!
> 
> I found intervals quite nice and natural for retrieving scoped data (thanks, 
> Alan!): 
> <tag>foo stuff bar</tag>
> I.containing(I.ordered(I.term("<tag>"), I.term("<tag>")), 
>                       I.unordered(I.term("bar"), I.term("foo")));
> It works like a charm until it encounter ill nested tags:
> <tag>foo <tag>bug</tag> bar</tag>
> Due to intrinsic minimalizations it picks the internal tag. I feel like plain 
> intervals backed on positions lack tag scoping information. 
> Do you know any approaches for retrieving XML in Lucene?  
> 
> -- 
> Sincerely yours
> Mikhail Khludnev


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to