Agree with Uwe. I think the major piece the OP needs to know is "gradlew regenerate"
Don't try to upgrade this dependency by hand. https://github.com/apache/lucene/blob/main/help/regeneration.txt On Thu, Oct 30, 2025 at 9:05 AM Uwe Schindler <[email protected]> wrote: > > Hi, > > The problem with "just updating ICU" randomly is: If a new version with > newer Unicode version comes out, we can't just update as indexes cerated > with older lucene versions may suddenly produce other tokens. When > querying such an index it might happen that theres no term match anymore > because analysis changes. This means, we won't update ICU for the 10.x > or 9.x branches. Lucene main is fine as Users get the recommendation to > reindex their data anyways. > > Therefor in the stable Lucene branches we won't do major upgrades to > ICU, especially not when the Unicode version changes. Bugfixes and minor > versions are fine. > > Uwe > > Am 30.10.2025 um 06:45 schrieb Dawid Weiss: > > I don't think there are any guidelines on this other than changes in > > Unicode implemented by ICU. So if something > > changes in an incompatible way and we can't implement a workaround that > > would be compatible, we'd wait with > > the upgrade to follow a major Lucene version only. > > > > I CC Robert, he's much more knowledgeable in this than me. > > > > As for automation - this is already pretty simple and we have dependabot > > running on github, so it shouldn't be a problem. It is the consequences of > > upgrading that are more difficult to assess. > > > > Dawid > > > > On Thu, Oct 30, 2025 at 4:50 AM Anh Dũng Bùi <[email protected]> wrote: > > > >> Thanks Dawid! > >> > >> What you said makes sense (reducing build time and catch inconsistency). > >> > >> I have some follow-up questions, when is the ICU library usually upgraded > >> (I previously thought of a major release, but the last upgrade seems to be > >> in the middle (10.1 to 10.2)), and are there any drawbacks on upgrading the > >> ICU library whenever it's released? Maybe it adds some workloads, but maybe > >> it can be automated? > >> > >> Regards, > >> Anh Dung Bui > >> > >> On Thu, Oct 23, 2025 at 3:40 PM Dawid Weiss <[email protected]> wrote: > >> > >>> If you're on the main branch, the code to regenerate ICU is in > >>> lucene.regenerate.icu.gradle: > >>> > >>> > >>> > >> https://github.com/apache/lucene/blob/main/build-tools/build-infra/src/main/groovy/lucene.regenerate.icu.gradle#L4 > >>> you should bump the version of icu4j and this makes the build use the > >>> aligned icu-c version too - > >>> > >>> https://github.com/apache/lucene/blob/main/gradle/libs.versions.toml#L24 > >>> > >>> Once you do that, run: > >>> > >>> ./gradlew -p lucene/analysis/icu regenerate > >>> > >>> and it should regenerate, clean-up and create checksums for all affected > >>> files. > >>> > >>> - Why aren't we generating them on the fly based on the available ICU > >>>> version at runtime? Would that enable users to upgrade ICU versions on > >>>> their own without breaking Lucene? > >>>> > >>> The reasons for not generating them on the fly are multiple - mainly > >> we're > >>> trying to save on > >>> build times (some of the generated resources are very costly or require > >>> external tools and infrastructure) > >>> but also ensure consistency and catch any changer if they happen in the > >>> middleware toolchain somewhere (in theory, > >>> if you run the regenerate command above without touching ICU versions, > >> you > >>> should get identical checksums > >>> of all the resulting files, regardless of the platform used, etc.) > >>> > >>> Dawid > >>> > -- > Uwe Schindler > Achterdiek 19, D-28357 Bremen > https://www.thetaphi.de > eMail: [email protected] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
