Re: [PATCH 1/4] Update to Unicode 17.0.0

Jonathan Wakely Fri, 08 May 2026 02:21:38 -0700

On Mon, 15 Sept 2025 at 22:00, Jakub Jelinek <[email protected]> wrote:
>
> Hi!
>
> The following patches update GCC from Unicode 16.0.0 to 17.0.0.
> I'd like to commit it as one patch, but for mailing list restriction
> reasons I've split it into 4 patches.
> This part contains all the hand editted stuff and some of the updated
> or regenerated stuff, next patch contains some updated and regenerated
> stuff and last 2 will contain xz -9e compressed halves of the uname2c.h
> regenerated changes (which are just huge).
>
> I've followed what the README says and updated also one script from
> glibc, but that needed another Unicode file - HangulSyllableType.txt -
> around as well, so I'm adding it.
> I've added one new test to named-universal-char-escape-1.c for
> randomly chosen character from new CJK block.
> Note, Unicode 17.0.0 authors forgot to adjust the 4-8 table, I've filed
> bugreports about that but the UnicodeData.txt changes for the range ends
> and the new range seems to match e.g. what is in the glyph tables, so
> the patch follows UnicodeData.txt and not 4-8 table here.
>
> Another thing was that makeuname2c.cc didn't handle correctly when
> the size of the generated string table modulo 77 was 76 or 77, in which
> case it forgot to emit a semicolon after the string literal and so failed
> to compile.
>
> And as can be seen in the emoji-data.txt diff, some properties like
> Extended_Pictographic have been removed from certain characters, e.g.
> from the Mahjong cards characters except U+1F004, and one libstdc++
> test was testing that property exactly on U+1F000.  Dunno why that was
> changed, but U+1F004 is the only colored one among tons of black and white
> ones.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2025-09-15  Jakub Jelinek  <[email protected]>
>
> contrib/
>         * unicode/README: Add HangulSyllableType.txt file to the
>         list as newest utf8_gen.py from glibc now needs it.  Adjust
>         git commit hash and change unicode 16 version to 17.
>         * unicode/from_glibc/utf8_gen.py: Updated from glibc.
>         * unicode/DerivedCoreProperties.txt: Updated from Unicode 17.0.0.
>         * unicode/emoji-data.txt: Likewise.
>         * unicode/PropList.txt: Likewise.
>         * unicode/GraphemeBreakProperty.txt: Likewise.
>         * unicode/DerivedNormalizationProps.txt: Likewise.
>         * unicode/NameAliases.txt: Likewise.
>         * unicode/UnicodeData.txt: Likewise.
>         * unicode/EastAsianWidth.txt: Likewise.
>         * unicode/DerivedGeneralCategory.txt: Likewise.
>         * unicode/HangulSyllableType.txt: New file.
> gcc/
>         * c-c++-common/cpp/named-universal-char-escape-1.c: Add test for
>         \N{CJK UNIFIED IDEOGRAPH-3340E}.
> libcpp/
>         * makeucnid.cc (write_copyright): Adjust copyright year.
>         * makeuname2c.cc (generated_ranges): Adjust end points for a couple
>         of ranges based on UnicodeData.txt Last changes and add a whole new
>         CJK UNIFIED IDEOGRAPH- entry.  None of these changes are in the 4-8
>         table, but clearly it has just been forgotten.
>         (write_copyright): Adjust copyright year.
>         (write_dict): Fix up condition when to print semicolon.
>         * generated_cpp_wcwidth.h: Regenerate.
>         * ucnid.h: Regenerate.
>         * uname2c.h: Regenerate.
> libstdc++-v3/
>         * include/bits/unicode-data.h: Regenerate.
>         * testsuite/ext/unicode/properties.cc: Test __is_extended_pictographic
>         on U+1F004 rather than U+1F000.


The libstdc++ part is OK, thanks.

Re: [PATCH 1/4] Update to Unicode 17.0.0

Reply via email to