Hi,

The problem with "just updating ICU" randomly is: If a new version with newer Unicode version comes out, we can't just update as indexes cerated with older lucene versions may suddenly produce other tokens. When querying such an index it might happen that theres no term match anymore because analysis changes. This means, we won't update ICU for the 10.x or 9.x branches. Lucene main is fine as Users get the recommendation to reindex their data anyways.

Therefor in the stable Lucene branches we won't do major upgrades to ICU, especially not when the Unicode version changes. Bugfixes and minor versions are fine.

Uwe

Am 30.10.2025 um 06:45 schrieb Dawid Weiss:
I don't think there are any guidelines on this other than changes in
Unicode implemented by ICU. So if something
changes in an incompatible way and we can't implement a workaround that
would be compatible, we'd wait with
the upgrade to follow a major Lucene version only.

I CC Robert, he's much more knowledgeable in this than me.

As for automation - this is already pretty simple and we have dependabot
running on github, so it shouldn't be a problem. It is the consequences of
upgrading that are more difficult to assess.

Dawid

On Thu, Oct 30, 2025 at 4:50 AM Anh Dũng Bùi <[email protected]> wrote:

Thanks Dawid!

What you said makes sense (reducing build time and catch inconsistency).

I have some follow-up questions, when is the ICU library usually upgraded
(I previously thought of a major release, but the last upgrade seems to be
in the middle (10.1 to 10.2)), and are there any drawbacks on upgrading the
ICU library whenever it's released? Maybe it adds some workloads, but maybe
it can be automated?

Regards,
Anh Dung Bui

On Thu, Oct 23, 2025 at 3:40 PM Dawid Weiss <[email protected]> wrote:

If you're on the main branch, the code to regenerate ICU is in
lucene.regenerate.icu.gradle:



https://github.com/apache/lucene/blob/main/build-tools/build-infra/src/main/groovy/lucene.regenerate.icu.gradle#L4
you should bump the version of icu4j and this makes the build use the
aligned icu-c version too -

https://github.com/apache/lucene/blob/main/gradle/libs.versions.toml#L24

Once you do that, run:

./gradlew -p lucene/analysis/icu regenerate

and it should regenerate, clean-up and create checksums for all affected
files.

- Why aren't we generating them on the fly based on the available ICU
version at runtime? Would that enable users to upgrade ICU versions on
their own without breaking Lucene?

The reasons for not generating them on the fly are multiple - mainly
we're
trying to save on
build times (some of the generated resources are very costly or require
external tools and infrastructure)
but also ensure consistency and catch any changer if they happen in the
middleware toolchain somewhere (in theory,
if you run the regenerate command above without touching ICU versions,
you
should get identical checksums
of all the resulting files, regardless of the platform used, etc.)

Dawid

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to