Oh Thanks,

did not notice. So all looks fine. Because I was checking and found only 9.x files.

We still have some 9.x codec versions in the checkout, but thats the same for older ones. The only important thing is that the top-level codec is on 10.x, we don't need to care about postings, docvalues,...

Uwe

Am 01.10.2024 um 09:53 schrieb Chris Hegarty:
Uwe,

On 1 Oct 2024, at 08:08, Uwe Schindler <u...@thetaphi.de> wrote:

Hi,
are we ready for a release. Because at moment I don't think we have fully 
cleaned up the codecs and this could lead to problems when going to 11.x 
because of version numbers in the index files!?
     • Rename the most recent codec to 10x
     • Remove older 9.x stuff and move to backwards, only keep latest.
Is this still planned or ongoing?
Yes. I believe everything that is needed is in Adrien’s PR  - 
https://github.com/apache/lucene/pull/13815. I actually thought that it was 
already merged.

I’ll do some final testing, etc, on this PR and get it merged.

-Chris.

Uwe
Am 30.09.2024 um 19:35 schrieb Luca Cavanna:
Hi all,
given Adrien's limited availability, Chris and I flipped a coin and decided 
that I will be the release manager for Lucene 10. branch_10x and branch_10_0 
have already been created. I will follow up with the first RC tomorrow.

On Mon, Sep 30, 2024 at 5:14 PM Adrien Grand <jpou...@gmail.com> wrote:
FYI I have little availability this week so Luca and Chris offered to help with 
the release. Thank you both!

Le sam. 14 sept. 2024, 18:07, Luca Cavanna <java...@apache.org> a écrit :
On the topic of removing search(Query, Collector), I think it is unlikely that 
we will be able to complete that task before the Lucene 10 branch is cut. It 
would be nice to still work on the remaining issues, but it's probably good to 
have less urgency around it hence more time to think about how to properly 
address the remaining issues. I think that removing a deprecated method leaving 
users without a proper replacement is not a good trade-off.

I have an outstanding PR around CollectorManager#forSequentialExecution, based 
on the risk I described in my previous message.I am now concluding that it 
would be safer to remove such a public method from branch_9x before it gets 
released. I described the reason in the PR I just opened: 
https://github.com/apache/lucene/pull/13790 . I'd like to know what folks think 
about this. I myself have changed my mind a few times on the topic.

On Thu, Sep 12, 2024 at 10:09 PM Michael Froh <msf...@gmail.com> wrote:
Aha! I realized that I was totally misreading some stacktraces while debugging some of 
the Join tests. You're absolutely right that a ScorerSupplier is created for each doc ID 
range -- not just the Scorer. Also, the test that I "helped" Vamsi with didn't 
really get fixed by moving construction of the TermsEnum into the get() method -- it just 
didn't happen to fail that time. *facepalm*

Please ignore my unfounded panic :D

Thanks,
Froh

On Thu, Sep 12, 2024 at 1:33 AM Luca Cavanna <java...@apache.org> wrote:
Thanks for raising this Michael.

I see a risk as well around removing the deprecated search(Query, Collector) 
method by using a sequential collector manager. Previously, we would happily 
execute sequentially despite an executor being provided to the searcher 
(because search(Query, Collector) bypasses concurrency entirely). After the 
migration, we would execute concurrently, hence the collector manager would 
throw an exception as soon as more than one slice gets searched. The sequential 
collector manager can be used in tests to overcome the lack of a collector 
manager, but that only moves the problem to the user side: what are users of 
these collectors going to do given they need a collector manager to call 
search, and we don't provide one? Their only option would be to not set an 
executor to their searcher. Maybe it is a possibility to clearly document that 
but is it acceptable to require users to have a non concurrent index searcher 
instance around to use for non concurrent queries? There's only a few cases of 
this fortunately. There's also a couple of cases like QueryUtils and JoinUtil 
where we have utility methods that call search internally and accept an 
external searcher. Those searchers may have an executor set to them, and the 
only safe way to migrate these is to add concurrent collector managers support 
or explicitly state that the provided search shouldn't have an executor set to 
it. Another option if we are not happy with the workaround I mentioned, is to 
consider leaving search(Query, Collector) deprecated in Lucene 10 and removing 
it in Lucene 11. It is a shame because we are not far off, but I am not sure 
that this warrants delaying the release.
I am not entirely sure how this aligns with the risk you mentioned, which cases 
of SimpleCollector are you referring to specifically?

Regarding your second concern around intra-segment concurrency: while I had to 
adapt a couple of tests to be intra-segment ready as they made wrong 
assumptions, we are now leveraging intra-segment concurrency in all tests 
(provided that the searcher is created using LuceneTestCase#newSearcher), 
besides when DrillSideways is used. I have seen a couple of post-merge failures 
that may be related which I will look into, but nothing that would suggest that 
the design is entirely problematic. When you retrieve a scorer supplier or a 
bulk scorer you provide a LeafReaderContext. The overall expectation is that 
you get a different instance each time, regardless of whether you have already 
seen the segment or not. If that is the case there is no state shared between 
threads, because ScorerSuppliers should not get shared across threads? It is 
not the expectation that the support for intra-segment concurrency requires 
bulk scorers and scorer suppliers to become thread-safe.
With that in mind, I checked DefaultScorerSupplier and I do not see why it 
would not work, as long as each call to retrieve a scorer supplier returns a 
new instance of if that points to a new instance of Scorer, which holds state 
created as part of that same scorerSupplier call. The problem that we have is 
that we duplicate ahead of time work for partitions of the same segment 
(tracked in https://github.com/apache/lucene/issues/13745), because if we need 
to pull doc_values, we will do so for the same segment multiple times. I would 
assume that if something is off with DefaultScorerSupplier, tests would show 
that clearly, as it is widely used.
I also checked FeatureQuery, and I see that each call to 
scorerSupplier(LeafReaderContext) returns a new instance of the supplier which 
points to different TermsEnum instance retrieved multiple times for the same 
segment. Removing this duplication will require additional work, and there will 
be bugs, or incorrect assumptions made in existing scorer supplier instances, 
but those should not be too hard to fix.
Does this make sense to you? Perhaps there are additional changes to make in 
the migrate guide or javadocs to clarify what I described, let me know what you 
think.

Cheers
Luca

On Thu, Sep 12, 2024 at 9:42 AM Michael Froh <msf...@gmail.com> wrote:
Oh, I forgot to mention:

I think we should deprecate the DefaultScorerSupplier constructor that takes a 
Scorer. There's no way that works with intra-segment concurrency.

Maybe we remove DefaultScorerSupplier altogether?

On Thu, Sep 12, 2024 at 12:35 AM Michael Froh <msf...@gmail.com> wrote:
I'm a big fan of both of Luca's topics. I'd like to raise a small red flag 
around them, though, since they seem to be connected.

Working through the join module and helping my colleague @harshavamsi on the QueryUtils 
side, I see two layers of unpreparatedness for the modern "concurrency first" 
architecture. (Again, I want to make clear that I think the modern architecture is the 
way to go and we can and should get there in time for Lucene 10.)

1. There are several uses of SimpleCollector, where it's assumed that one 
collector will collect all results on a single thread. With the deprecated 
method, this forces single-threaded behavior all the time. In my opinion, these 
represent 13+ year technical debt for cases where you couldn't properly use an 
IndexSearcher to do concurrent searches.
2. With the merge of intra-segment searches, we have another layer: 
ScorerSuppliers that share mutable state across the Scorers that they produce. 
For example, @harshavamsi came across a case today in the sigmoid function for 
FeatureQuery where a TermsEnum was created in the ScorerSupplier and passed 
into the Scorers. Each Scorer shared the same TermsEnum. What changed? In the 
old concurrency model, one thread might search a few segments, but each segment 
was guaranteed to only be searched by one thread. Now, with intra-segment 
concurrency, we produce one ScorerSupplier per segment, but may produce 
multiple Scorers across different threads. If the ScorerSupplier produces some 
mutable object and shares it across the resulting Scorers, you're going to have 
a bad time. Fun fact: back in 2012, we had an office Halloween party and I 
dressed as the thing that scares me the most. I printed a picture of Texas 
(since everyone recognizes Texas) with a TV remote control mute button in the 
middle. I sewed it to my shirt in the four corners. It was mutable state held 
by multiple threads.

I definitely think we should address these before the Lucene 10 release, as 
they provide a clean break from the old world. I also think it's a decent 
amount of work (but not unsurmountable). I'm also maybe no longer a fan of the 
helper method that Greg added in his PR for the monitor module, since it risks 
sweeping non-threadsafe code under the rug, if folks make single-threaded tests 
(which is essentially what they've been doing all along -- see my first point 
above).

I haven't properly looked into the scope of my second point above, but I've 
seen at least two cases in the past two days. Hopefully it's not too bad, but 
it might be a risk. I think the first point is still pretty easy to address.

Thanks,
Froh

On Thu, Aug 29, 2024 at 2:15 AM Luca Cavanna <l...@elastic.co.invalid> wrote:
For Lucene 10.0, I have two topics to raise:

1. Remove the deprecated IndexSearcher#search(Query, Collector) in favour of 
IndexSearcher#search(Query, CollectorManager)  
(https://github.com/apache/lucene/issues/12892): this involves removing the 
leftover usages in facet, grouping, join and test-framework, plus in some 
tests. A list of the leftover usages is in the description of the issue. It 
would be great to complete this for Lucene 10, otherwise this deprecated method 
and usages will stick around for much longer. What do others think? Should we 
make this a blocker for the release? I think this is not a huge effort and it 
is parallelizable across different people.

2. Intra-segment concurrency (https://github.com/apache/lucene/pull/13542): 
current thinking is to add support for partitioning segments when searching, 
and searching across segment partitions concurrently. My intention is to 
introduce breaking changes and documentation in Lucene 10 (really only the 
basics), without switching the default slicing of IndexSearcher to create 
segment partitions. We will want to leverage segment partitions in testing. 
More iterations are going to be needed to remove duplicated work across 
partitions of the same segment, which is my next step, but currently out of 
scope for Lucene 10. Judging from the reviews I got so far, my PR is not far 
and I am working on it to address comments, polish it a bit more and merge it 
soon.

Feedback is welcome

Cheers
Luca

On Wed, Aug 28, 2024 at 3:05 PM Adrien Grand <jpou...@gmail.com> wrote:
Thanks Mike.

On Wed, Aug 28, 2024 at 2:16 PM Michael McCandless <luc...@mikemccandless.com> 
wrote:
I think maybe also https://github.com/apache/lucene/issues/13519 should be a 
blocker?  It looks like 8 bit vector HNSW quantization is broken (unless I'm 
making a silly mistake with luceneutil tooling).

I've also set its milestone to 10.0.0.

Do we really not have a way to mark an issue a blocker for a given release?  That's insane.  OK 
well I went and created "blocker" label, and added that to GH 13519.  Greg, I'll also go 
mark your linked issue as "blocker".

Mike McCandless

http://blog.mikemccandless.com


On Sat, Aug 24, 2024 at 2:33 PM Uwe Schindler <u...@thetaphi.de> wrote:
Hi,
I updated Policeman Jenkins to have JDK 23 RC and JDK 24 EA releases.
Uwe
P.S.: Unfortunately I have to update the macOS Hackintosh VM to have a newer 
operating system version: JDK 22 and later no longer run on this machine.
Am 23.08.2024 um 10:41 schrieb Uwe Schindler:
Hi,
In 9.x there's still the backport of 
https://github.com/apache/lucene/pull/13570 to be done. The PR apperas in the 
changelog, but was not backported yet. Chris and I will do this soon.
9.last release on Sept 22 fits perfectly with the JDK 23 release (and we will 
have Panama Vector Support). I am seeting up Jenkins Job with latest RC now to 
verify all vector stuff works with 23.
Uwe
Am 08.08.2024 um 18:50 schrieb Adrien Grand:
Hello everyone,

As previously discussed, I plan on releasing 9.last and 10.0 under the 
following timeline:
- ~September 15th: 10.0 feature freeze - main becomes 11.0
- ~September 22nd: 9.last release,
- ~October 1st: 10.0 release.

Unless someone shortly volunteers to do a 9.x release, this 9.last release will 
likely be 9.12.

As these dates are coming shortly, I would like to start tracking blockers. 
Please reply to this thread with issues that you know about that should delay 
the 9.last or 10.0 releases.

Chris, Uwe: I also wanted to check with you if this timeline works well with 
regards to supporting Java 23 in 9.last and 10.0?

--
Adrien
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de


--
Adrien
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de

Reply via email to