Hey all,
I have been working on an initial implementation of intra-segment search
concurrency for Lucene.

My goal is to introduce the ability to concurrently search partitions of
the same segment, think of a force-merged segment for instance, in a way
that's as transparent as possible to users. This way we can ideally
decouple search concurrency from the index geometry, with the least impact
on users. As part of my initial step, I decided to not tackle deduplicating
work that happens globally per segment, which every partition would repeat
on its own. This is certainly an important area to improve upon, yet I am
hoping that we can treat it as a follow-up, mostly because there is enough
work to do even without addressing that.

After quite a few iterations, I have just marked my PR ready for review:
https://github.com/apache/lucene/pull/13542. Tests are finally green. I
wrote a rather detailed description on the PR itself that includes the
problems I encountered, how I addressed them, and the way forward that I am
proposing. There are still a couple of rough edges, and needed alignment on
terminology API-wise. Mostly, what do we call a partition of a segment?
Existing leaf slices are partitions of an index. We are now introducing
partitions of segments that can be searched independently. I called them
LeafReaderContextPartition, but I am not particularly attached to this
specific name and open to feedback. This new terminology is only applied to
the IndexSearcher#search method (not called directly by users though) and
the IndexSearcher slices related methods. Otherwise, users that just call
search don't need to necessarily know what a segment partition is,
hopefully.

I'd love to collect enough feedback to agree on a path forward and get this
merged for Lucene 10, as it requires some API breaking changes as well as
changes in internal behaviour.


Looking forward to your feedback

Cheers
Luca

Reply via email to