I'm a big fan of both of Luca's topics. I'd like to raise a small red flag
around them, though, since they seem to be connected.

Working through the join module and helping my colleague @harshavamsi on
the QueryUtils side, I see two layers of unpreparatedness for the modern
"concurrency first" architecture. (Again, I want to make clear that I think
the modern architecture is the way to go and we can and should get there in
time for Lucene 10.)

1. There are several uses of SimpleCollector, where it's assumed that one
collector will collect all results on a single thread. With the deprecated
method, this forces single-threaded behavior all the time. In my opinion,
these represent 13+ year technical debt for cases where you couldn't
properly use an IndexSearcher to do concurrent searches.
2. With the merge of intra-segment searches, we have another layer:
ScorerSuppliers that share mutable state across the Scorers that they
produce. For example, @harshavamsi came across a case today in the sigmoid
function for FeatureQuery where a TermsEnum was created in the
ScorerSupplier and passed into the Scorers. Each Scorer shared the same
TermsEnum. What changed? In the old concurrency model, one thread might
search a few segments, but each segment was guaranteed to only be searched
by one thread. Now, with intra-segment concurrency, we produce one
ScorerSupplier per segment, but may produce multiple Scorers across
different threads. If the ScorerSupplier produces some mutable object and
shares it across the resulting Scorers, you're going to have a bad time.
Fun fact: back in 2012, we had an office Halloween party and I dressed as
the thing that scares me the most. I printed a picture of Texas (since
everyone recognizes Texas) with a TV remote control mute button in the
middle. I sewed it to my shirt in the four corners. It was mutable state
held by multiple threads.

I definitely think we should address these before the Lucene 10 release, as
they provide a clean break from the old world. I also think it's a decent
amount of work (but not unsurmountable). I'm also maybe no longer a fan of
the helper method that Greg added in his PR for the monitor module, since
it risks sweeping non-threadsafe code under the rug, if folks make
single-threaded tests (which is essentially what they've been doing all
along -- see my first point above).

I haven't properly looked into the scope of my second point above, but I've
seen at least two cases in the past two days. Hopefully it's not too bad,
but it might be a risk. I think the first point is still pretty easy to
address.

Thanks,
Froh

On Thu, Aug 29, 2024 at 2:15 AM Luca Cavanna <l...@elastic.co.invalid>
wrote:

> For Lucene 10.0, I have two topics to raise:
>
> 1. Remove the deprecated IndexSearcher#search(Query, Collector) in favour
> of IndexSearcher#search(Query, CollectorManager)  (
> https://github.com/apache/lucene/issues/12892): this involves removing
> the leftover usages in facet, grouping, join and test-framework, plus in
> some tests. A list of the leftover usages is in the description of the
> issue. It would be great to complete this for Lucene 10, otherwise this
> deprecated method and usages will stick around for much longer. What do
> others think? Should we make this a blocker for the release? I think this
> is not a huge effort and it is parallelizable across different people.
>
> 2. Intra-segment concurrency (https://github.com/apache/lucene/pull/13542):
> current thinking is to add support for partitioning segments when
> searching, and searching across segment partitions concurrently. My
> intention is to introduce breaking changes and documentation in Lucene 10
> (really only the basics), without switching the default slicing of
> IndexSearcher to create segment partitions. We will want to leverage
> segment partitions in testing. More iterations are going to be needed to
> remove duplicated work across partitions of the same segment, which is my
> next step, but currently out of scope for Lucene 10. Judging from the
> reviews I got so far, my PR is not far and I am working on it to address
> comments, polish it a bit more and merge it soon.
>
> Feedback is welcome
>
> Cheers
> Luca
>
> On Wed, Aug 28, 2024 at 3:05 PM Adrien Grand <jpou...@gmail.com> wrote:
>
>> Thanks Mike.
>>
>> On Wed, Aug 28, 2024 at 2:16 PM Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>> I think maybe also https://github.com/apache/lucene/issues/13519 should
>>> be a blocker?  It looks like 8 bit vector HNSW quantization is broken
>>> (unless I'm making a silly mistake with luceneutil tooling).
>>>
>>> I've also set its milestone to 10.0.0.
>>>
>>> Do we really not have a way to mark an issue a blocker for a given
>>> release?  That's insane.  OK well I went and created "blocker" label, and
>>> added that to GH 13519.  Greg, I'll also go mark your linked issue as
>>> "blocker".
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Sat, Aug 24, 2024 at 2:33 PM Uwe Schindler <u...@thetaphi.de> wrote:
>>>
>>>> Hi,
>>>>
>>>> I updated Policeman Jenkins to have JDK 23 RC and JDK 24 EA releases.
>>>>
>>>> Uwe
>>>>
>>>> P.S.: Unfortunately I have to update the macOS Hackintosh VM to have a
>>>> newer operating system version: JDK 22 and later no longer run on this
>>>> machine.
>>>> Am 23.08.2024 um 10:41 schrieb Uwe Schindler:
>>>>
>>>> Hi,
>>>>
>>>> In 9.x there's still the backport of
>>>> https://github.com/apache/lucene/pull/13570 to be done. The PR apperas
>>>> in the changelog, but was not backported yet. Chris and I will do this 
>>>> soon.
>>>>
>>>> 9.last release on Sept 22 fits perfectly with the JDK 23 release (and
>>>> we will have Panama Vector Support). I am seeting up Jenkins Job with
>>>> latest RC now to verify all vector stuff works with 23.
>>>>
>>>> Uwe
>>>> Am 08.08.2024 um 18:50 schrieb Adrien Grand:
>>>>
>>>> Hello everyone,
>>>>
>>>> As previously discussed
>>>> <https://lists.apache.org/thread/4bhnkkvvodxxgrpj4yqm5yrgj0ppc59r>, I
>>>> plan on releasing 9.last and 10.0 under the following timeline:
>>>> - ~September 15th: 10.0 feature freeze - main becomes 11.0
>>>> - ~September 22nd: 9.last release,
>>>> - ~October 1st: 10.0 release.
>>>>
>>>> Unless someone shortly volunteers to do a 9.x release, this 9.last
>>>> release will likely be 9.12.
>>>>
>>>> As these dates are coming shortly, I would like to start tracking
>>>> blockers. Please reply to this thread with issues that you know about that
>>>> should delay the 9.last or 10.0 releases.
>>>>
>>>> Chris, Uwe: I also wanted to check with you if this timeline works well
>>>> with regards to supporting Java 23 in 9.last and 10.0?
>>>>
>>>> --
>>>> Adrien
>>>>
>>>> --
>>>> Uwe SchindlerAchterdiek 19, D-28357 Bremen 
>>>> <https://www.google.com/maps/search/Achterdiek+19,+D-28357+Bremen?entry=gmail&source=g>https://www.thetaphi.de
>>>> eMail: u...@thetaphi.de
>>>>
>>>> --
>>>> Uwe SchindlerAchterdiek 19, D-28357 Bremen 
>>>> <https://www.google.com/maps/search/Achterdiek+19,+D-28357+Bremen?entry=gmail&source=g>https://www.thetaphi.de
>>>> eMail: u...@thetaphi.de
>>>>
>>>>
>>
>> --
>> Adrien
>>
>

Reply via email to