Hi Luca, This is very exciting! I haven’t followed the dev process very closely so far, so this may already have been looked at and dismissed as unworkable for various reasons, but I’m wondering if we definitely need a new abstraction for a LeafReaderContext partition? Could we instead find a way to make IndexReader.leaves() return a view over the various segments that splits large segments into multiple LeafReaderContexts with different subsets of the docId space marked as deleted?
I suppose we could lose some optimisations in count() implementations, but maybe it would be possible to check up-front if the count() for a segment returns -1 and only do the split in that case. - Alan > On 29 Jul 2024, at 22:45, Luca Cavanna <java...@apache.org> wrote: > > Hey all, > I have been working on an initial implementation of intra-segment search > concurrency for Lucene. > > My goal is to introduce the ability to concurrently search partitions of the > same segment, think of a force-merged segment for instance, in a way that's > as transparent as possible to users. This way we can ideally decouple search > concurrency from the index geometry, with the least impact on users. As part > of my initial step, I decided to not tackle deduplicating work that happens > globally per segment, which every partition would repeat on its own. This is > certainly an important area to improve upon, yet I am hoping that we can > treat it as a follow-up, mostly because there is enough work to do even > without addressing that. > > After quite a few iterations, I have just marked my PR ready for review: > https://github.com/apache/lucene/pull/13542. Tests are finally green. I wrote > a rather detailed description on the PR itself that includes the problems I > encountered, how I addressed them, and the way forward that I am proposing. > There are still a couple of rough edges, and needed alignment on terminology > API-wise. Mostly, what do we call a partition of a segment? Existing leaf > slices are partitions of an index. We are now introducing partitions of > segments that can be searched independently. I called them > LeafReaderContextPartition, but I am not particularly attached to this > specific name and open to feedback. This new terminology is only applied to > the IndexSearcher#search method (not called directly by users though) and the > IndexSearcher slices related methods. Otherwise, users that just call search > don't need to necessarily know what a segment partition is, hopefully. > > I'd love to collect enough feedback to agree on a path forward and get this > merged for Lucene 10, as it requires some API breaking changes as well as > changes in internal behaviour. > > > Looking forward to your feedback > > Cheers > Luca > > >