Oh, I forgot to mention:

I think we should deprecate the DefaultScorerSupplier constructor that
takes a Scorer. There's no way that works with intra-segment concurrency.

Maybe we remove DefaultScorerSupplier altogether?

On Thu, Sep 12, 2024 at 12:35 AM Michael Froh <msf...@gmail.com> wrote:

> I'm a big fan of both of Luca's topics. I'd like to raise a small red flag
> around them, though, since they seem to be connected.
>
> Working through the join module and helping my colleague @harshavamsi on
> the QueryUtils side, I see two layers of unpreparatedness for the modern
> "concurrency first" architecture. (Again, I want to make clear that I think
> the modern architecture is the way to go and we can and should get there in
> time for Lucene 10.)
>
> 1. There are several uses of SimpleCollector, where it's assumed that one
> collector will collect all results on a single thread. With the deprecated
> method, this forces single-threaded behavior all the time. In my opinion,
> these represent 13+ year technical debt for cases where you couldn't
> properly use an IndexSearcher to do concurrent searches.
> 2. With the merge of intra-segment searches, we have another layer:
> ScorerSuppliers that share mutable state across the Scorers that they
> produce. For example, @harshavamsi came across a case today in the sigmoid
> function for FeatureQuery where a TermsEnum was created in the
> ScorerSupplier and passed into the Scorers. Each Scorer shared the same
> TermsEnum. What changed? In the old concurrency model, one thread might
> search a few segments, but each segment was guaranteed to only be searched
> by one thread. Now, with intra-segment concurrency, we produce one
> ScorerSupplier per segment, but may produce multiple Scorers across
> different threads. If the ScorerSupplier produces some mutable object and
> shares it across the resulting Scorers, you're going to have a bad time.
> Fun fact: back in 2012, we had an office Halloween party and I dressed as
> the thing that scares me the most. I printed a picture of Texas (since
> everyone recognizes Texas) with a TV remote control mute button in the
> middle. I sewed it to my shirt in the four corners. It was mutable state
> held by multiple threads.
>
> I definitely think we should address these before the Lucene 10 release,
> as they provide a clean break from the old world. I also think it's a
> decent amount of work (but not unsurmountable). I'm also maybe no longer a
> fan of the helper method that Greg added in his PR for the monitor module,
> since it risks sweeping non-threadsafe code under the rug, if folks make
> single-threaded tests (which is essentially what they've been doing all
> along -- see my first point above).
>
> I haven't properly looked into the scope of my second point above, but
> I've seen at least two cases in the past two days. Hopefully it's not too
> bad, but it might be a risk. I think the first point is still pretty easy
> to address.
>
> Thanks,
> Froh
>
> On Thu, Aug 29, 2024 at 2:15 AM Luca Cavanna <l...@elastic.co.invalid>
> wrote:
>
>> For Lucene 10.0, I have two topics to raise:
>>
>> 1. Remove the deprecated IndexSearcher#search(Query, Collector) in favour
>> of IndexSearcher#search(Query, CollectorManager)  (
>> https://github.com/apache/lucene/issues/12892): this involves removing
>> the leftover usages in facet, grouping, join and test-framework, plus in
>> some tests. A list of the leftover usages is in the description of the
>> issue. It would be great to complete this for Lucene 10, otherwise this
>> deprecated method and usages will stick around for much longer. What do
>> others think? Should we make this a blocker for the release? I think this
>> is not a huge effort and it is parallelizable across different people.
>>
>> 2. Intra-segment concurrency (https://github.com/apache/lucene/pull/13542):
>> current thinking is to add support for partitioning segments when
>> searching, and searching across segment partitions concurrently. My
>> intention is to introduce breaking changes and documentation in Lucene 10
>> (really only the basics), without switching the default slicing of
>> IndexSearcher to create segment partitions. We will want to leverage
>> segment partitions in testing. More iterations are going to be needed to
>> remove duplicated work across partitions of the same segment, which is my
>> next step, but currently out of scope for Lucene 10. Judging from the
>> reviews I got so far, my PR is not far and I am working on it to address
>> comments, polish it a bit more and merge it soon.
>>
>> Feedback is welcome
>>
>> Cheers
>> Luca
>>
>> On Wed, Aug 28, 2024 at 3:05 PM Adrien Grand <jpou...@gmail.com> wrote:
>>
>>> Thanks Mike.
>>>
>>> On Wed, Aug 28, 2024 at 2:16 PM Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
>>>> I think maybe also https://github.com/apache/lucene/issues/13519
>>>> should be a blocker?  It looks like 8 bit vector HNSW quantization is
>>>> broken (unless I'm making a silly mistake with luceneutil tooling).
>>>>
>>>> I've also set its milestone to 10.0.0.
>>>>
>>>> Do we really not have a way to mark an issue a blocker for a given
>>>> release?  That's insane.  OK well I went and created "blocker" label, and
>>>> added that to GH 13519.  Greg, I'll also go mark your linked issue as
>>>> "blocker".
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>>
>>>> On Sat, Aug 24, 2024 at 2:33 PM Uwe Schindler <u...@thetaphi.de> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I updated Policeman Jenkins to have JDK 23 RC and JDK 24 EA releases.
>>>>>
>>>>> Uwe
>>>>>
>>>>> P.S.: Unfortunately I have to update the macOS Hackintosh VM to have a
>>>>> newer operating system version: JDK 22 and later no longer run on this
>>>>> machine.
>>>>> Am 23.08.2024 um 10:41 schrieb Uwe Schindler:
>>>>>
>>>>> Hi,
>>>>>
>>>>> In 9.x there's still the backport of
>>>>> https://github.com/apache/lucene/pull/13570 to be done. The PR
>>>>> apperas in the changelog, but was not backported yet. Chris and I will do
>>>>> this soon.
>>>>>
>>>>> 9.last release on Sept 22 fits perfectly with the JDK 23 release (and
>>>>> we will have Panama Vector Support). I am seeting up Jenkins Job with
>>>>> latest RC now to verify all vector stuff works with 23.
>>>>>
>>>>> Uwe
>>>>> Am 08.08.2024 um 18:50 schrieb Adrien Grand:
>>>>>
>>>>> Hello everyone,
>>>>>
>>>>> As previously discussed
>>>>> <https://lists.apache.org/thread/4bhnkkvvodxxgrpj4yqm5yrgj0ppc59r>, I
>>>>> plan on releasing 9.last and 10.0 under the following timeline:
>>>>> - ~September 15th: 10.0 feature freeze - main becomes 11.0
>>>>> - ~September 22nd: 9.last release,
>>>>> - ~October 1st: 10.0 release.
>>>>>
>>>>> Unless someone shortly volunteers to do a 9.x release, this 9.last
>>>>> release will likely be 9.12.
>>>>>
>>>>> As these dates are coming shortly, I would like to start tracking
>>>>> blockers. Please reply to this thread with issues that you know about that
>>>>> should delay the 9.last or 10.0 releases.
>>>>>
>>>>> Chris, Uwe: I also wanted to check with you if this timeline works
>>>>> well with regards to supporting Java 23 in 9.last and 10.0?
>>>>>
>>>>> --
>>>>> Adrien
>>>>>
>>>>> --
>>>>> Uwe SchindlerAchterdiek 19, D-28357 Bremen 
>>>>> <https://www.google.com/maps/search/Achterdiek+19,+D-28357+Bremen?entry=gmail&source=g>https://www.thetaphi.de
>>>>> eMail: u...@thetaphi.de
>>>>>
>>>>> --
>>>>> Uwe SchindlerAchterdiek 19, D-28357 Bremen 
>>>>> <https://www.google.com/maps/search/Achterdiek+19,+D-28357+Bremen?entry=gmail&source=g>https://www.thetaphi.de
>>>>> eMail: u...@thetaphi.de
>>>>>
>>>>>
>>>
>>> --
>>> Adrien
>>>
>>

Reply via email to