Re: Lucene 10.0 and 9.12 blockers

Luca Cavanna Thu, 12 Sep 2024 01:33:41 -0700

Thanks for raising this Michael.

I see a risk as well around removing the deprecated search(Query,
Collector) method by using a sequential collector manager. Previously, we
would happily execute sequentially despite an executor being provided to
the searcher (because search(Query, Collector) bypasses concurrency
entirely). After the migration, we would execute concurrently, hence the
collector manager would throw an exception as soon as more than one slice
gets searched. The sequential collector manager can be used in tests to
overcome the lack of a collector manager, but that only moves the problem
to the user side: what are users of these collectors going to do given they
need a collector manager to call search, and we don't provide one? Their
only option would be to not set an executor to their searcher. Maybe it is
a possibility to clearly document that but is it acceptable to require
users to have a non concurrent index searcher instance around to use for
non concurrent queries? There's only a few cases of this fortunately.
There's also a couple of cases like QueryUtils and JoinUtil where we have
utility methods that call search internally and accept an external
searcher. Those searchers may have an executor set to them, and the only
safe way to migrate these is to add concurrent collector managers support
or explicitly state that the provided search shouldn't have an executor set
to it. Another option if we are not happy with the workaround I mentioned,
is to consider leaving search(Query, Collector) deprecated in Lucene 10 and
removing it in Lucene 11. It is a shame because we are not far off, but I
am not sure that this warrants delaying the release.
I am not entirely sure how this aligns with the risk you mentioned, which
cases of SimpleCollector are you referring to specifically?


Regarding your second concern around intra-segment concurrency: while I had
to adapt a couple of tests to be intra-segment ready as they made wrong
assumptions, we are now leveraging intra-segment concurrency in all tests
(provided that the searcher is created using LuceneTestCase#newSearcher),
besides when DrillSideways is used. I have seen a couple of post-merge
failures that may be related which I will look into, but nothing that would
suggest that the design is entirely problematic. When you retrieve a scorer
supplier or a bulk scorer you provide a LeafReaderContext. The overall
expectation is that you get a different instance each time, regardless of
whether you have already seen the segment or not. If that is the case there
is no state shared between threads, because ScorerSuppliers should not get
shared across threads? It is not the expectation that the support for
intra-segment concurrency requires bulk scorers and scorer suppliers to
become thread-safe.
With that in mind, I checked DefaultScorerSupplier and I do not see why it
would not work, as long as each call to retrieve a scorer supplier returns
a new instance of if that points to a new instance of Scorer, which holds
state created as part of that same scorerSupplier call. The problem that we
have is that we duplicate ahead of time work for partitions of the same
segment (tracked in https://github.com/apache/lucene/issues/13745), because
if we need to pull doc_values, we will do so for the same segment multiple
times. I would assume that if something is off with DefaultScorerSupplier,
tests would show that clearly, as it is widely used.
I also checked FeatureQuery, and I see that each call to
scorerSupplier(LeafReaderContext) returns a new instance of the supplier
which points to different TermsEnum instance retrieved multiple times for
the same segment. Removing this duplication will require additional work,
and there will be bugs, or incorrect assumptions made in existing scorer
supplier instances, but those should not be too hard to fix.
Does this make sense to you? Perhaps there are additional changes to make
in the migrate guide or javadocs to clarify what I described, let me know
what you think.

Cheers
Luca

On Thu, Sep 12, 2024 at 9:42 AM Michael Froh <[email protected]> wrote:

> Oh, I forgot to mention:
>
> I think we should deprecate the DefaultScorerSupplier constructor that
> takes a Scorer. There's no way that works with intra-segment concurrency.
>
> Maybe we remove DefaultScorerSupplier altogether?
>
> On Thu, Sep 12, 2024 at 12:35 AM Michael Froh <[email protected]> wrote:
>
>> I'm a big fan of both of Luca's topics. I'd like to raise a small red
>> flag around them, though, since they seem to be connected.
>>
>> Working through the join module and helping my colleague @harshavamsi on
>> the QueryUtils side, I see two layers of unpreparatedness for the modern
>> "concurrency first" architecture. (Again, I want to make clear that I think
>> the modern architecture is the way to go and we can and should get there in
>> time for Lucene 10.)
>>
>> 1. There are several uses of SimpleCollector, where it's assumed that one
>> collector will collect all results on a single thread. With the deprecated
>> method, this forces single-threaded behavior all the time. In my opinion,
>> these represent 13+ year technical debt for cases where you couldn't
>> properly use an IndexSearcher to do concurrent searches.
>> 2. With the merge of intra-segment searches, we have another layer:
>> ScorerSuppliers that share mutable state across the Scorers that they
>> produce. For example, @harshavamsi came across a case today in the sigmoid
>> function for FeatureQuery where a TermsEnum was created in the
>> ScorerSupplier and passed into the Scorers. Each Scorer shared the same
>> TermsEnum. What changed? In the old concurrency model, one thread might
>> search a few segments, but each segment was guaranteed to only be searched
>> by one thread. Now, with intra-segment concurrency, we produce one
>> ScorerSupplier per segment, but may produce multiple Scorers across
>> different threads. If the ScorerSupplier produces some mutable object and
>> shares it across the resulting Scorers, you're going to have a bad time.
>> Fun fact: back in 2012, we had an office Halloween party and I dressed as
>> the thing that scares me the most. I printed a picture of Texas (since
>> everyone recognizes Texas) with a TV remote control mute button in the
>> middle. I sewed it to my shirt in the four corners. It was mutable state
>> held by multiple threads.
>>
>> I definitely think we should address these before the Lucene 10 release,
>> as they provide a clean break from the old world. I also think it's a
>> decent amount of work (but not unsurmountable). I'm also maybe no longer a
>> fan of the helper method that Greg added in his PR for the monitor module,
>> since it risks sweeping non-threadsafe code under the rug, if folks make
>> single-threaded tests (which is essentially what they've been doing all
>> along -- see my first point above).
>>
>> I haven't properly looked into the scope of my second point above, but
>> I've seen at least two cases in the past two days. Hopefully it's not too
>> bad, but it might be a risk. I think the first point is still pretty easy
>> to address.
>>
>> Thanks,
>> Froh
>>
>> On Thu, Aug 29, 2024 at 2:15 AM Luca Cavanna <[email protected]>
>> wrote:
>>
>>> For Lucene 10.0, I have two topics to raise:
>>>
>>> 1. Remove the deprecated IndexSearcher#search(Query, Collector) in
>>> favour of IndexSearcher#search(Query, CollectorManager)  (
>>> https://github.com/apache/lucene/issues/12892): this involves removing
>>> the leftover usages in facet, grouping, join and test-framework, plus in
>>> some tests. A list of the leftover usages is in the description of the
>>> issue. It would be great to complete this for Lucene 10, otherwise this
>>> deprecated method and usages will stick around for much longer. What do
>>> others think? Should we make this a blocker for the release? I think this
>>> is not a huge effort and it is parallelizable across different people.
>>>
>>> 2. Intra-segment concurrency (
>>> https://github.com/apache/lucene/pull/13542): current thinking is to
>>> add support for partitioning segments when searching, and searching across
>>> segment partitions concurrently. My intention is to introduce breaking
>>> changes and documentation in Lucene 10 (really only the basics),
>>> without switching the default slicing of IndexSearcher to create segment
>>> partitions. We will want to leverage segment partitions in testing. More
>>> iterations are going to be needed to remove duplicated work across
>>> partitions of the same segment, which is my next step, but currently out of
>>> scope for Lucene 10. Judging from the reviews I got so far, my PR is not
>>> far and I am working on it to address comments, polish it a bit more and
>>> merge it soon.
>>>
>>> Feedback is welcome
>>>
>>> Cheers
>>> Luca
>>>
>>> On Wed, Aug 28, 2024 at 3:05 PM Adrien Grand <[email protected]> wrote:
>>>
>>>> Thanks Mike.
>>>>
>>>> On Wed, Aug 28, 2024 at 2:16 PM Michael McCandless <
>>>> [email protected]> wrote:
>>>>
>>>>> I think maybe also https://github.com/apache/lucene/issues/13519
>>>>> should be a blocker?  It looks like 8 bit vector HNSW quantization is
>>>>> broken (unless I'm making a silly mistake with luceneutil tooling).
>>>>>
>>>>> I've also set its milestone to 10.0.0.
>>>>>
>>>>> Do we really not have a way to mark an issue a blocker for a given
>>>>> release?  That's insane.  OK well I went and created "blocker" label, and
>>>>> added that to GH 13519.  Greg, I'll also go mark your linked issue as
>>>>> "blocker".
>>>>>
>>>>> Mike McCandless
>>>>>
>>>>> http://blog.mikemccandless.com
>>>>>
>>>>>
>>>>> On Sat, Aug 24, 2024 at 2:33 PM Uwe Schindler <[email protected]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I updated Policeman Jenkins to have JDK 23 RC and JDK 24 EA releases.
>>>>>>
>>>>>> Uwe
>>>>>>
>>>>>> P.S.: Unfortunately I have to update the macOS Hackintosh VM to have
>>>>>> a newer operating system version: JDK 22 and later no longer run on this
>>>>>> machine.
>>>>>> Am 23.08.2024 um 10:41 schrieb Uwe Schindler:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> In 9.x there's still the backport of
>>>>>> https://github.com/apache/lucene/pull/13570 to be done. The PR
>>>>>> apperas in the changelog, but was not backported yet. Chris and I will do
>>>>>> this soon.
>>>>>>
>>>>>> 9.last release on Sept 22 fits perfectly with the JDK 23 release (and
>>>>>> we will have Panama Vector Support). I am seeting up Jenkins Job with
>>>>>> latest RC now to verify all vector stuff works with 23.
>>>>>>
>>>>>> Uwe
>>>>>> Am 08.08.2024 um 18:50 schrieb Adrien Grand:
>>>>>>
>>>>>> Hello everyone,
>>>>>>
>>>>>> As previously discussed
>>>>>> <https://lists.apache.org/thread/4bhnkkvvodxxgrpj4yqm5yrgj0ppc59r>,
>>>>>> I plan on releasing 9.last and 10.0 under the following timeline:
>>>>>> - ~September 15th: 10.0 feature freeze - main becomes 11.0
>>>>>> - ~September 22nd: 9.last release,
>>>>>> - ~October 1st: 10.0 release.
>>>>>>
>>>>>> Unless someone shortly volunteers to do a 9.x release, this 9.last
>>>>>> release will likely be 9.12.
>>>>>>
>>>>>> As these dates are coming shortly, I would like to start tracking
>>>>>> blockers. Please reply to this thread with issues that you know about 
>>>>>> that
>>>>>> should delay the 9.last or 10.0 releases.
>>>>>>
>>>>>> Chris, Uwe: I also wanted to check with you if this timeline works
>>>>>> well with regards to supporting Java 23 in 9.last and 10.0?
>>>>>>
>>>>>> --
>>>>>> Adrien
>>>>>>
>>>>>> --
>>>>>> Uwe SchindlerAchterdiek 19, D-28357 Bremen 
>>>>>> <https://www.google.com/maps/search/Achterdiek+19,+D-28357+Bremen?entry=gmail&source=g>https://www.thetaphi.de
>>>>>> eMail: [email protected]
>>>>>>
>>>>>> --
>>>>>> Uwe SchindlerAchterdiek 19, D-28357 Bremen 
>>>>>> <https://www.google.com/maps/search/Achterdiek+19,+D-28357+Bremen?entry=gmail&source=g>https://www.thetaphi.de
>>>>>> eMail: [email protected]
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Adrien
>>>>
>>>

Re: Lucene 10.0 and 9.12 blockers

Reply via email to