Re: Lucene 10.0 and 9.12 blockers

Luca Cavanna Mon, 30 Sep 2024 10:36:56 -0700

Hi all,
given Adrien's limited availability, Chris and I flipped a coin and decided
that I will be the release manager for Lucene 10. branch_10x and
branch_10_0 have already been created. I will follow up with the first RC
tomorrow.


On Mon, Sep 30, 2024 at 5:14 PM Adrien Grand <[email protected]> wrote:

> FYI I have little availability this week so Luca and Chris offered to help
> with the release. Thank you both!
>
> Le sam. 14 sept. 2024, 18:07, Luca Cavanna <[email protected]> a écrit :
>
>> On the topic of removing search(Query, Collector), I think it is unlikely
>> that we will be able to complete that task before the Lucene 10 branch is
>> cut. It would be nice to still work on the remaining issues, but it's
>> probably good to have less urgency around it hence more time to think about
>> how to properly address the remaining issues. I think that removing a
>> deprecated method leaving users without a proper replacement is not a good
>> trade-off.
>>
>> I have an outstanding PR around CollectorManager#forSequentialExecution,
>> based on the risk I described in my previous message.I am now concluding
>> that it would be safer to remove such a public method from branch_9x before
>> it gets released. I described the reason in the PR I just opened:
>> https://github.com/apache/lucene/pull/13790 . I'd like to know what
>> folks think about this. I myself have changed my mind a few times on the
>> topic.
>>
>> On Thu, Sep 12, 2024 at 10:09 PM Michael Froh <[email protected]> wrote:
>>
>>> Aha! I realized that I was totally misreading some stacktraces while
>>> debugging some of the Join tests. You're absolutely right that a
>>> ScorerSupplier is created for each doc ID range -- not just the Scorer.
>>> Also, the test that I "helped" Vamsi with didn't really get fixed by moving
>>> construction of the TermsEnum into the get() method -- it just didn't
>>> happen to fail that time. *facepalm*
>>>
>>> Please ignore my unfounded panic :D
>>>
>>> Thanks,
>>> Froh
>>>
>>> On Thu, Sep 12, 2024 at 1:33 AM Luca Cavanna <[email protected]> wrote:
>>>
>>>> Thanks for raising this Michael.
>>>>
>>>> I see a risk as well around removing the deprecated search(Query,
>>>> Collector) method by using a sequential collector manager. Previously, we
>>>> would happily execute sequentially despite an executor being provided to
>>>> the searcher (because search(Query, Collector) bypasses concurrency
>>>> entirely). After the migration, we would execute concurrently, hence the
>>>> collector manager would throw an exception as soon as more than one slice
>>>> gets searched. The sequential collector manager can be used in tests to
>>>> overcome the lack of a collector manager, but that only moves the problem
>>>> to the user side: what are users of these collectors going to do given they
>>>> need a collector manager to call search, and we don't provide one? Their
>>>> only option would be to not set an executor to their searcher. Maybe it is
>>>> a possibility to clearly document that but is it acceptable to require
>>>> users to have a non concurrent index searcher instance around to use for
>>>> non concurrent queries? There's only a few cases of this fortunately.
>>>> There's also a couple of cases like QueryUtils and JoinUtil where we have
>>>> utility methods that call search internally and accept an external
>>>> searcher. Those searchers may have an executor set to them, and the only
>>>> safe way to migrate these is to add concurrent collector managers support
>>>> or explicitly state that the provided search shouldn't have an executor set
>>>> to it. Another option if we are not happy with the workaround I mentioned,
>>>> is to consider leaving search(Query, Collector) deprecated in Lucene 10 and
>>>> removing it in Lucene 11. It is a shame because we are not far off, but I
>>>> am not sure that this warrants delaying the release.
>>>> I am not entirely sure how this aligns with the risk you mentioned,
>>>> which cases of SimpleCollector are you referring to specifically?
>>>>
>>>> Regarding your second concern around intra-segment concurrency: while I
>>>> had to adapt a couple of tests to be intra-segment ready as they made wrong
>>>> assumptions, we are now leveraging intra-segment concurrency in all tests
>>>> (provided that the searcher is created using LuceneTestCase#newSearcher),
>>>> besides when DrillSideways is used. I have seen a couple of post-merge
>>>> failures that may be related which I will look into, but nothing that would
>>>> suggest that the design is entirely problematic. When you retrieve a scorer
>>>> supplier or a bulk scorer you provide a LeafReaderContext. The overall
>>>> expectation is that you get a different instance each time, regardless of
>>>> whether you have already seen the segment or not. If that is the case there
>>>> is no state shared between threads, because ScorerSuppliers should not get
>>>> shared across threads? It is not the expectation that the support for
>>>> intra-segment concurrency requires bulk scorers and scorer suppliers to
>>>> become thread-safe.
>>>> With that in mind, I checked DefaultScorerSupplier and I do not see why
>>>> it would not work, as long as each call to retrieve a scorer supplier
>>>> returns a new instance of if that points to a new instance of Scorer, which
>>>> holds state created as part of that same scorerSupplier call. The problem
>>>> that we have is that we duplicate ahead of time work for partitions of the
>>>> same segment (tracked in https://github.com/apache/lucene/issues/13745),
>>>> because if we need to pull doc_values, we will do so for the same segment
>>>> multiple times. I would assume that if something is off with
>>>> DefaultScorerSupplier, tests would show that clearly, as it is widely used.
>>>> I also checked FeatureQuery, and I see that each call to
>>>> scorerSupplier(LeafReaderContext) returns a new instance of the supplier
>>>> which points to different TermsEnum instance retrieved multiple times for
>>>> the same segment. Removing this duplication will require additional work,
>>>> and there will be bugs, or incorrect assumptions made in existing scorer
>>>> supplier instances, but those should not be too hard to fix.
>>>> Does this make sense to you? Perhaps there are additional changes to
>>>> make in the migrate guide or javadocs to clarify what I described, let me
>>>> know what you think.
>>>>
>>>> Cheers
>>>> Luca
>>>>
>>>> On Thu, Sep 12, 2024 at 9:42 AM Michael Froh <[email protected]> wrote:
>>>>
>>>>> Oh, I forgot to mention:
>>>>>
>>>>> I think we should deprecate the DefaultScorerSupplier constructor that
>>>>> takes a Scorer. There's no way that works with intra-segment concurrency.
>>>>>
>>>>> Maybe we remove DefaultScorerSupplier altogether?
>>>>>
>>>>> On Thu, Sep 12, 2024 at 12:35 AM Michael Froh <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I'm a big fan of both of Luca's topics. I'd like to raise a small red
>>>>>> flag around them, though, since they seem to be connected.
>>>>>>
>>>>>> Working through the join module and helping my colleague @harshavamsi
>>>>>> on the QueryUtils side, I see two layers of unpreparatedness for the 
>>>>>> modern
>>>>>> "concurrency first" architecture. (Again, I want to make clear that I 
>>>>>> think
>>>>>> the modern architecture is the way to go and we can and should get there 
>>>>>> in
>>>>>> time for Lucene 10.)
>>>>>>
>>>>>> 1. There are several uses of SimpleCollector, where it's assumed that
>>>>>> one collector will collect all results on a single thread. With the
>>>>>> deprecated method, this forces single-threaded behavior all the time. In 
>>>>>> my
>>>>>> opinion, these represent 13+ year technical debt for cases where you
>>>>>> couldn't properly use an IndexSearcher to do concurrent searches.
>>>>>> 2. With the merge of intra-segment searches, we have another layer:
>>>>>> ScorerSuppliers that share mutable state across the Scorers that they
>>>>>> produce. For example, @harshavamsi came across a case today in the 
>>>>>> sigmoid
>>>>>> function for FeatureQuery where a TermsEnum was created in the
>>>>>> ScorerSupplier and passed into the Scorers. Each Scorer shared the same
>>>>>> TermsEnum. What changed? In the old concurrency model, one thread might
>>>>>> search a few segments, but each segment was guaranteed to only be 
>>>>>> searched
>>>>>> by one thread. Now, with intra-segment concurrency, we produce one
>>>>>> ScorerSupplier per segment, but may produce multiple Scorers across
>>>>>> different threads. If the ScorerSupplier produces some mutable object and
>>>>>> shares it across the resulting Scorers, you're going to have a bad time.
>>>>>> Fun fact: back in 2012, we had an office Halloween party and I dressed as
>>>>>> the thing that scares me the most. I printed a picture of Texas (since
>>>>>> everyone recognizes Texas) with a TV remote control mute button in the
>>>>>> middle. I sewed it to my shirt in the four corners. It was mutable state
>>>>>> held by multiple threads.
>>>>>>
>>>>>> I definitely think we should address these before the Lucene 10
>>>>>> release, as they provide a clean break from the old world. I also think
>>>>>> it's a decent amount of work (but not unsurmountable). I'm also maybe no
>>>>>> longer a fan of the helper method that Greg added in his PR for the 
>>>>>> monitor
>>>>>> module, since it risks sweeping non-threadsafe code under the rug, if 
>>>>>> folks
>>>>>> make single-threaded tests (which is essentially what they've been doing
>>>>>> all along -- see my first point above).
>>>>>>
>>>>>> I haven't properly looked into the scope of my second point above,
>>>>>> but I've seen at least two cases in the past two days. Hopefully it's not
>>>>>> too bad, but it might be a risk. I think the first point is still pretty
>>>>>> easy to address.
>>>>>>
>>>>>> Thanks,
>>>>>> Froh
>>>>>>
>>>>>> On Thu, Aug 29, 2024 at 2:15 AM Luca Cavanna <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> For Lucene 10.0, I have two topics to raise:
>>>>>>>
>>>>>>> 1. Remove the deprecated IndexSearcher#search(Query, Collector) in
>>>>>>> favour of IndexSearcher#search(Query, CollectorManager)  (
>>>>>>> https://github.com/apache/lucene/issues/12892): this involves
>>>>>>> removing the leftover usages in facet, grouping, join and 
>>>>>>> test-framework,
>>>>>>> plus in some tests. A list of the leftover usages is in the description 
>>>>>>> of
>>>>>>> the issue. It would be great to complete this for Lucene 10, otherwise 
>>>>>>> this
>>>>>>> deprecated method and usages will stick around for much longer. What do
>>>>>>> others think? Should we make this a blocker for the release? I think 
>>>>>>> this
>>>>>>> is not a huge effort and it is parallelizable across different people.
>>>>>>>
>>>>>>> 2. Intra-segment concurrency (
>>>>>>> https://github.com/apache/lucene/pull/13542): current thinking is
>>>>>>> to add support for partitioning segments when searching, and searching
>>>>>>> across segment partitions concurrently. My intention is to introduce
>>>>>>> breaking changes and documentation in Lucene 10 (really only the 
>>>>>>> basics),
>>>>>>> without switching the default slicing of IndexSearcher to create segment
>>>>>>> partitions. We will want to leverage segment partitions in testing. More
>>>>>>> iterations are going to be needed to remove duplicated work across
>>>>>>> partitions of the same segment, which is my next step, but currently 
>>>>>>> out of
>>>>>>> scope for Lucene 10. Judging from the reviews I got so far, my PR is not
>>>>>>> far and I am working on it to address comments, polish it a bit more and
>>>>>>> merge it soon.
>>>>>>>
>>>>>>> Feedback is welcome
>>>>>>>
>>>>>>> Cheers
>>>>>>> Luca
>>>>>>>
>>>>>>> On Wed, Aug 28, 2024 at 3:05 PM Adrien Grand <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Mike.
>>>>>>>>
>>>>>>>> On Wed, Aug 28, 2024 at 2:16 PM Michael McCandless <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> I think maybe also https://github.com/apache/lucene/issues/13519
>>>>>>>>> should be a blocker?  It looks like 8 bit vector HNSW quantization is
>>>>>>>>> broken (unless I'm making a silly mistake with luceneutil tooling).
>>>>>>>>>
>>>>>>>>> I've also set its milestone to 10.0.0.
>>>>>>>>>
>>>>>>>>> Do we really not have a way to mark an issue a blocker for a given
>>>>>>>>> release?  That's insane.  OK well I went and created "blocker" label, 
>>>>>>>>> and
>>>>>>>>> added that to GH 13519.  Greg, I'll also go mark your linked issue as
>>>>>>>>> "blocker".
>>>>>>>>>
>>>>>>>>> Mike McCandless
>>>>>>>>>
>>>>>>>>> http://blog.mikemccandless.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Aug 24, 2024 at 2:33 PM Uwe Schindler <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I updated Policeman Jenkins to have JDK 23 RC and JDK 24 EA
>>>>>>>>>> releases.
>>>>>>>>>>
>>>>>>>>>> Uwe
>>>>>>>>>>
>>>>>>>>>> P.S.: Unfortunately I have to update the macOS Hackintosh VM to
>>>>>>>>>> have a newer operating system version: JDK 22 and later no longer 
>>>>>>>>>> run on
>>>>>>>>>> this machine.
>>>>>>>>>> Am 23.08.2024 um 10:41 schrieb Uwe Schindler:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> In 9.x there's still the backport of
>>>>>>>>>> https://github.com/apache/lucene/pull/13570 to be done. The PR
>>>>>>>>>> apperas in the changelog, but was not backported yet. Chris and I 
>>>>>>>>>> will do
>>>>>>>>>> this soon.
>>>>>>>>>>
>>>>>>>>>> 9.last release on Sept 22 fits perfectly with the JDK 23 release
>>>>>>>>>> (and we will have Panama Vector Support). I am seeting up Jenkins 
>>>>>>>>>> Job with
>>>>>>>>>> latest RC now to verify all vector stuff works with 23.
>>>>>>>>>>
>>>>>>>>>> Uwe
>>>>>>>>>> Am 08.08.2024 um 18:50 schrieb Adrien Grand:
>>>>>>>>>>
>>>>>>>>>> Hello everyone,
>>>>>>>>>>
>>>>>>>>>> As previously discussed
>>>>>>>>>> <https://lists.apache.org/thread/4bhnkkvvodxxgrpj4yqm5yrgj0ppc59r>,
>>>>>>>>>> I plan on releasing 9.last and 10.0 under the following timeline:
>>>>>>>>>> - ~September 15th: 10.0 feature freeze - main becomes 11.0
>>>>>>>>>> - ~September 22nd: 9.last release,
>>>>>>>>>> - ~October 1st: 10.0 release.
>>>>>>>>>>
>>>>>>>>>> Unless someone shortly volunteers to do a 9.x release, this
>>>>>>>>>> 9.last release will likely be 9.12.
>>>>>>>>>>
>>>>>>>>>> As these dates are coming shortly, I would like to start tracking
>>>>>>>>>> blockers. Please reply to this thread with issues that you know 
>>>>>>>>>> about that
>>>>>>>>>> should delay the 9.last or 10.0 releases.
>>>>>>>>>>
>>>>>>>>>> Chris, Uwe: I also wanted to check with you if this timeline
>>>>>>>>>> works well with regards to supporting Java 23 in 9.last and 10.0?
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Adrien
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Uwe SchindlerAchterdiek 19, D-28357 Bremen 
>>>>>>>>>> <https://www.google.com/maps/search/Achterdiek+19,+D-28357+Bremen?entry=gmail&source=g>https://www.thetaphi.de
>>>>>>>>>> eMail: [email protected]
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Uwe SchindlerAchterdiek 19, D-28357 Bremen 
>>>>>>>>>> <https://www.google.com/maps/search/Achterdiek+19,+D-28357+Bremen?entry=gmail&source=g>https://www.thetaphi.de
>>>>>>>>>> eMail: [email protected]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Adrien
>>>>>>>>
>>>>>>>

Re: Lucene 10.0 and 9.12 blockers

Reply via email to