Agree with Uwe. I think the major piece the OP needs to know is
"gradlew regenerate"

Don't try to upgrade this dependency by hand.

https://github.com/apache/lucene/blob/main/help/regeneration.txt

On Thu, Oct 30, 2025 at 9:05 AM Uwe Schindler <[email protected]> wrote:
>
> Hi,
>
> The problem with "just updating ICU" randomly is: If a new version with
> newer Unicode version comes out, we can't just update as indexes cerated
> with older lucene versions may suddenly produce other tokens. When
> querying such an index it might happen that theres no term match anymore
> because analysis changes. This means, we won't update ICU for the 10.x
> or 9.x branches. Lucene main is fine as Users get the recommendation to
> reindex their data anyways.
>
> Therefor in the stable Lucene branches we won't do major upgrades to
> ICU, especially not when the Unicode version changes. Bugfixes and minor
> versions are fine.
>
> Uwe
>
> Am 30.10.2025 um 06:45 schrieb Dawid Weiss:
> > I don't think there are any guidelines on this other than changes in
> > Unicode implemented by ICU. So if something
> > changes in an incompatible way and we can't implement a workaround that
> > would be compatible, we'd wait with
> > the upgrade to follow a major Lucene version only.
> >
> > I CC Robert, he's much more knowledgeable in this than me.
> >
> > As for automation - this is already pretty simple and we have dependabot
> > running on github, so it shouldn't be a problem. It is the consequences of
> > upgrading that are more difficult to assess.
> >
> > Dawid
> >
> > On Thu, Oct 30, 2025 at 4:50 AM Anh Dũng Bùi <[email protected]> wrote:
> >
> >> Thanks Dawid!
> >>
> >> What you said makes sense (reducing build time and catch inconsistency).
> >>
> >> I have some follow-up questions, when is the ICU library usually upgraded
> >> (I previously thought of a major release, but the last upgrade seems to be
> >> in the middle (10.1 to 10.2)), and are there any drawbacks on upgrading the
> >> ICU library whenever it's released? Maybe it adds some workloads, but maybe
> >> it can be automated?
> >>
> >> Regards,
> >> Anh Dung Bui
> >>
> >> On Thu, Oct 23, 2025 at 3:40 PM Dawid Weiss <[email protected]> wrote:
> >>
> >>> If you're on the main branch, the code to regenerate ICU is in
> >>> lucene.regenerate.icu.gradle:
> >>>
> >>>
> >>>
> >> https://github.com/apache/lucene/blob/main/build-tools/build-infra/src/main/groovy/lucene.regenerate.icu.gradle#L4
> >>> you should bump the version of icu4j and this makes the build use the
> >>> aligned icu-c version too -
> >>>
> >>> https://github.com/apache/lucene/blob/main/gradle/libs.versions.toml#L24
> >>>
> >>> Once you do that, run:
> >>>
> >>> ./gradlew -p lucene/analysis/icu regenerate
> >>>
> >>> and it should regenerate, clean-up and create checksums for all affected
> >>> files.
> >>>
> >>> - Why aren't we generating them on the fly based on the available ICU
> >>>> version at runtime? Would that enable users to upgrade ICU versions on
> >>>> their own without breaking Lucene?
> >>>>
> >>> The reasons for not generating them on the fly are multiple - mainly
> >> we're
> >>> trying to save on
> >>> build times (some of the generated resources are very costly or require
> >>> external tools and infrastructure)
> >>> but also ensure consistency and catch any changer if they happen in the
> >>> middleware toolchain somewhere (in theory,
> >>> if you run the regenerate command above without touching ICU versions,
> >> you
> >>> should get identical checksums
> >>> of all the resulting files, regardless of the platform used, etc.)
> >>>
> >>> Dawid
> >>>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: [email protected]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to