Re: Lucene 10.0 and 9.12 blockers

Luca Cavanna Sat, 14 Sep 2024 09:07:54 -0700

On the topic of removing search(Query, Collector), I think it is unlikely
that we will be able to complete that task before the Lucene 10 branch is
cut. It would be nice to still work on the remaining issues, but it's
probably good to have less urgency around it hence more time to think about
how to properly address the remaining issues. I think that removing a
deprecated method leaving users without a proper replacement is not a good
trade-off.


I have an outstanding PR around CollectorManager#forSequentialExecution,
based on the risk I described in my previous message.I am now concluding
that it would be safer to remove such a public method from branch_9x before
it gets released. I described the reason in the PR I just opened:
https://github.com/apache/lucene/pull/13790 . I'd like to know what folks
think about this. I myself have changed my mind a few times on the topic.

On Thu, Sep 12, 2024 at 10:09 PM Michael Froh <[email protected]> wrote:

> Aha! I realized that I was totally misreading some stacktraces while
> debugging some of the Join tests. You're absolutely right that a
> ScorerSupplier is created for each doc ID range -- not just the Scorer.
> Also, the test that I "helped" Vamsi with didn't really get fixed by moving
> construction of the TermsEnum into the get() method -- it just didn't
> happen to fail that time. *facepalm*
>
> Please ignore my unfounded panic :D
>
> Thanks,
> Froh
>
> On Thu, Sep 12, 2024 at 1:33 AM Luca Cavanna <[email protected]> wrote:
>
>> Thanks for raising this Michael.
>>
>> I see a risk as well around removing the deprecated search(Query,
>> Collector) method by using a sequential collector manager. Previously, we
>> would happily execute sequentially despite an executor being provided to
>> the searcher (because search(Query, Collector) bypasses concurrency
>> entirely). After the migration, we would execute concurrently, hence the
>> collector manager would throw an exception as soon as more than one slice
>> gets searched. The sequential collector manager can be used in tests to
>> overcome the lack of a collector manager, but that only moves the problem
>> to the user side: what are users of these collectors going to do given they
>> need a collector manager to call search, and we don't provide one? Their
>> only option would be to not set an executor to their searcher. Maybe it is
>> a possibility to clearly document that but is it acceptable to require
>> users to have a non concurrent index searcher instance around to use for
>> non concurrent queries? There's only a few cases of this fortunately.
>> There's also a couple of cases like QueryUtils and JoinUtil where we have
>> utility methods that call search internally and accept an external
>> searcher. Those searchers may have an executor set to them, and the only
>> safe way to migrate these is to add concurrent collector managers support
>> or explicitly state that the provided search shouldn't have an executor set
>> to it. Another option if we are not happy with the workaround I mentioned,
>> is to consider leaving search(Query, Collector) deprecated in Lucene 10 and
>> removing it in Lucene 11. It is a shame because we are not far off, but I
>> am not sure that this warrants delaying the release.
>> I am not entirely sure how this aligns with the risk you mentioned, which
>> cases of SimpleCollector are you referring to specifically?
>>
>> Regarding your second concern around intra-segment concurrency: while I
>> had to adapt a couple of tests to be intra-segment ready as they made wrong
>> assumptions, we are now leveraging intra-segment concurrency in all tests
>> (provided that the searcher is created using LuceneTestCase#newSearcher),
>> besides when DrillSideways is used. I have seen a couple of post-merge
>> failures that may be related which I will look into, but nothing that would
>> suggest that the design is entirely problematic. When you retrieve a scorer
>> supplier or a bulk scorer you provide a LeafReaderContext. The overall
>> expectation is that you get a different instance each time, regardless of
>> whether you have already seen the segment or not. If that is the case there
>> is no state shared between threads, because ScorerSuppliers should not get
>> shared across threads? It is not the expectation that the support for
>> intra-segment concurrency requires bulk scorers and scorer suppliers to
>> become thread-safe.
>> With that in mind, I checked DefaultScorerSupplier and I do not see why
>> it would not work, as long as each call to retrieve a scorer supplier
>> returns a new instance of if that points to a new instance of Scorer, which
>> holds state created as part of that same scorerSupplier call. The problem
>> that we have is that we duplicate ahead of time work for partitions of the
>> same segment (tracked in https://github.com/apache/lucene/issues/13745),
>> because if we need to pull doc_values, we will do so for the same segment
>> multiple times. I would assume that if something is off with
>> DefaultScorerSupplier, tests would show that clearly, as it is widely used.
>> I also checked FeatureQuery, and I see that each call to
>> scorerSupplier(LeafReaderContext) returns a new instance of the supplier
>> which points to different TermsEnum instance retrieved multiple times for
>> the same segment. Removing this duplication will require additional work,
>> and there will be bugs, or incorrect assumptions made in existing scorer
>> supplier instances, but those should not be too hard to fix.
>> Does this make sense to you? Perhaps there are additional changes to make
>> in the migrate guide or javadocs to clarify what I described, let me know
>> what you think.
>>
>> Cheers
>> Luca
>>
>> On Thu, Sep 12, 2024 at 9:42 AM Michael Froh <[email protected]> wrote:
>>
>>> Oh, I forgot to mention:
>>>
>>> I think we should deprecate the DefaultScorerSupplier constructor that
>>> takes a Scorer. There's no way that works with intra-segment concurrency.
>>>
>>> Maybe we remove DefaultScorerSupplier altogether?
>>>
>>> On Thu, Sep 12, 2024 at 12:35 AM Michael Froh <[email protected]> wrote:
>>>
>>>> I'm a big fan of both of Luca's topics. I'd like to raise a small red
>>>> flag around them, though, since they seem to be connected.
>>>>
>>>> Working through the join module and helping my colleague @harshavamsi
>>>> on the QueryUtils side, I see two layers of unpreparatedness for the modern
>>>> "concurrency first" architecture. (Again, I want to make clear that I think
>>>> the modern architecture is the way to go and we can and should get there in
>>>> time for Lucene 10.)
>>>>
>>>> 1. There are several uses of SimpleCollector, where it's assumed that
>>>> one collector will collect all results on a single thread. With the
>>>> deprecated method, this forces single-threaded behavior all the time. In my
>>>> opinion, these represent 13+ year technical debt for cases where you
>>>> couldn't properly use an IndexSearcher to do concurrent searches.
>>>> 2. With the merge of intra-segment searches, we have another layer:
>>>> ScorerSuppliers that share mutable state across the Scorers that they
>>>> produce. For example, @harshavamsi came across a case today in the sigmoid
>>>> function for FeatureQuery where a TermsEnum was created in the
>>>> ScorerSupplier and passed into the Scorers. Each Scorer shared the same
>>>> TermsEnum. What changed? In the old concurrency model, one thread might
>>>> search a few segments, but each segment was guaranteed to only be searched
>>>> by one thread. Now, with intra-segment concurrency, we produce one
>>>> ScorerSupplier per segment, but may produce multiple Scorers across
>>>> different threads. If the ScorerSupplier produces some mutable object and
>>>> shares it across the resulting Scorers, you're going to have a bad time.
>>>> Fun fact: back in 2012, we had an office Halloween party and I dressed as
>>>> the thing that scares me the most. I printed a picture of Texas (since
>>>> everyone recognizes Texas) with a TV remote control mute button in the
>>>> middle. I sewed it to my shirt in the four corners. It was mutable state
>>>> held by multiple threads.
>>>>
>>>> I definitely think we should address these before the Lucene 10
>>>> release, as they provide a clean break from the old world. I also think
>>>> it's a decent amount of work (but not unsurmountable). I'm also maybe no
>>>> longer a fan of the helper method that Greg added in his PR for the monitor
>>>> module, since it risks sweeping non-threadsafe code under the rug, if folks
>>>> make single-threaded tests (which is essentially what they've been doing
>>>> all along -- see my first point above).
>>>>
>>>> I haven't properly looked into the scope of my second point above, but
>>>> I've seen at least two cases in the past two days. Hopefully it's not too
>>>> bad, but it might be a risk. I think the first point is still pretty easy
>>>> to address.
>>>>
>>>> Thanks,
>>>> Froh
>>>>
>>>> On Thu, Aug 29, 2024 at 2:15 AM Luca Cavanna <[email protected]>
>>>> wrote:
>>>>
>>>>> For Lucene 10.0, I have two topics to raise:
>>>>>
>>>>> 1. Remove the deprecated IndexSearcher#search(Query, Collector) in
>>>>> favour of IndexSearcher#search(Query, CollectorManager)  (
>>>>> https://github.com/apache/lucene/issues/12892): this involves
>>>>> removing the leftover usages in facet, grouping, join and test-framework,
>>>>> plus in some tests. A list of the leftover usages is in the description of
>>>>> the issue. It would be great to complete this for Lucene 10, otherwise 
>>>>> this
>>>>> deprecated method and usages will stick around for much longer. What do
>>>>> others think? Should we make this a blocker for the release? I think this
>>>>> is not a huge effort and it is parallelizable across different people.
>>>>>
>>>>> 2. Intra-segment concurrency (
>>>>> https://github.com/apache/lucene/pull/13542): current thinking is to
>>>>> add support for partitioning segments when searching, and searching across
>>>>> segment partitions concurrently. My intention is to introduce breaking
>>>>> changes and documentation in Lucene 10 (really only the basics),
>>>>> without switching the default slicing of IndexSearcher to create segment
>>>>> partitions. We will want to leverage segment partitions in testing. More
>>>>> iterations are going to be needed to remove duplicated work across
>>>>> partitions of the same segment, which is my next step, but currently out 
>>>>> of
>>>>> scope for Lucene 10. Judging from the reviews I got so far, my PR is not
>>>>> far and I am working on it to address comments, polish it a bit more and
>>>>> merge it soon.
>>>>>
>>>>> Feedback is welcome
>>>>>
>>>>> Cheers
>>>>> Luca
>>>>>
>>>>> On Wed, Aug 28, 2024 at 3:05 PM Adrien Grand <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks Mike.
>>>>>>
>>>>>> On Wed, Aug 28, 2024 at 2:16 PM Michael McCandless <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> I think maybe also https://github.com/apache/lucene/issues/13519
>>>>>>> should be a blocker?  It looks like 8 bit vector HNSW quantization is
>>>>>>> broken (unless I'm making a silly mistake with luceneutil tooling).
>>>>>>>
>>>>>>> I've also set its milestone to 10.0.0.
>>>>>>>
>>>>>>> Do we really not have a way to mark an issue a blocker for a given
>>>>>>> release?  That's insane.  OK well I went and created "blocker" label, 
>>>>>>> and
>>>>>>> added that to GH 13519.  Greg, I'll also go mark your linked issue as
>>>>>>> "blocker".
>>>>>>>
>>>>>>> Mike McCandless
>>>>>>>
>>>>>>> http://blog.mikemccandless.com
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Aug 24, 2024 at 2:33 PM Uwe Schindler <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I updated Policeman Jenkins to have JDK 23 RC and JDK 24 EA
>>>>>>>> releases.
>>>>>>>>
>>>>>>>> Uwe
>>>>>>>>
>>>>>>>> P.S.: Unfortunately I have to update the macOS Hackintosh VM to
>>>>>>>> have a newer operating system version: JDK 22 and later no longer run 
>>>>>>>> on
>>>>>>>> this machine.
>>>>>>>> Am 23.08.2024 um 10:41 schrieb Uwe Schindler:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> In 9.x there's still the backport of
>>>>>>>> https://github.com/apache/lucene/pull/13570 to be done. The PR
>>>>>>>> apperas in the changelog, but was not backported yet. Chris and I will 
>>>>>>>> do
>>>>>>>> this soon.
>>>>>>>>
>>>>>>>> 9.last release on Sept 22 fits perfectly with the JDK 23 release
>>>>>>>> (and we will have Panama Vector Support). I am seeting up Jenkins Job 
>>>>>>>> with
>>>>>>>> latest RC now to verify all vector stuff works with 23.
>>>>>>>>
>>>>>>>> Uwe
>>>>>>>> Am 08.08.2024 um 18:50 schrieb Adrien Grand:
>>>>>>>>
>>>>>>>> Hello everyone,
>>>>>>>>
>>>>>>>> As previously discussed
>>>>>>>> <https://lists.apache.org/thread/4bhnkkvvodxxgrpj4yqm5yrgj0ppc59r>,
>>>>>>>> I plan on releasing 9.last and 10.0 under the following timeline:
>>>>>>>> - ~September 15th: 10.0 feature freeze - main becomes 11.0
>>>>>>>> - ~September 22nd: 9.last release,
>>>>>>>> - ~October 1st: 10.0 release.
>>>>>>>>
>>>>>>>> Unless someone shortly volunteers to do a 9.x release, this 9.last
>>>>>>>> release will likely be 9.12.
>>>>>>>>
>>>>>>>> As these dates are coming shortly, I would like to start tracking
>>>>>>>> blockers. Please reply to this thread with issues that you know about 
>>>>>>>> that
>>>>>>>> should delay the 9.last or 10.0 releases.
>>>>>>>>
>>>>>>>> Chris, Uwe: I also wanted to check with you if this timeline works
>>>>>>>> well with regards to supporting Java 23 in 9.last and 10.0?
>>>>>>>>
>>>>>>>> --
>>>>>>>> Adrien
>>>>>>>>
>>>>>>>> --
>>>>>>>> Uwe SchindlerAchterdiek 19, D-28357 Bremen 
>>>>>>>> <https://www.google.com/maps/search/Achterdiek+19,+D-28357+Bremen?entry=gmail&source=g>https://www.thetaphi.de
>>>>>>>> eMail: [email protected]
>>>>>>>>
>>>>>>>> --
>>>>>>>> Uwe SchindlerAchterdiek 19, D-28357 Bremen 
>>>>>>>> <https://www.google.com/maps/search/Achterdiek+19,+D-28357+Bremen?entry=gmail&source=g>https://www.thetaphi.de
>>>>>>>> eMail: [email protected]
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Adrien
>>>>>>
>>>>>

Re: Lucene 10.0 and 9.12 blockers

Reply via email to