Re: [PATCH 1/4] Update to Unicode 17.0.0

Jonathan Wakely Fri, 08 May 2026 02:23:01 -0700

Sorry, this was an old gmail draft that I meant to delete and hit send instead!


On Fri, 8 May 2026 at 10:21, Jonathan Wakely <[email protected]> wrote:
>
> On Mon, 15 Sept 2025 at 22:00, Jakub Jelinek <[email protected]> wrote:
> >
> > Hi!
> >
> > The following patches update GCC from Unicode 16.0.0 to 17.0.0.
> > I'd like to commit it as one patch, but for mailing list restriction
> > reasons I've split it into 4 patches.
> > This part contains all the hand editted stuff and some of the updated
> > or regenerated stuff, next patch contains some updated and regenerated
> > stuff and last 2 will contain xz -9e compressed halves of the uname2c.h
> > regenerated changes (which are just huge).
> >
> > I've followed what the README says and updated also one script from
> > glibc, but that needed another Unicode file - HangulSyllableType.txt -
> > around as well, so I'm adding it.
> > I've added one new test to named-universal-char-escape-1.c for
> > randomly chosen character from new CJK block.
> > Note, Unicode 17.0.0 authors forgot to adjust the 4-8 table, I've filed
> > bugreports about that but the UnicodeData.txt changes for the range ends
> > and the new range seems to match e.g. what is in the glyph tables, so
> > the patch follows UnicodeData.txt and not 4-8 table here.
> >
> > Another thing was that makeuname2c.cc didn't handle correctly when
> > the size of the generated string table modulo 77 was 76 or 77, in which
> > case it forgot to emit a semicolon after the string literal and so failed
> > to compile.
> >
> > And as can be seen in the emoji-data.txt diff, some properties like
> > Extended_Pictographic have been removed from certain characters, e.g.
> > from the Mahjong cards characters except U+1F004, and one libstdc++
> > test was testing that property exactly on U+1F000.  Dunno why that was
> > changed, but U+1F004 is the only colored one among tons of black and white
> > ones.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2025-09-15  Jakub Jelinek  <[email protected]>
> >
> > contrib/
> >         * unicode/README: Add HangulSyllableType.txt file to the
> >         list as newest utf8_gen.py from glibc now needs it.  Adjust
> >         git commit hash and change unicode 16 version to 17.
> >         * unicode/from_glibc/utf8_gen.py: Updated from glibc.
> >         * unicode/DerivedCoreProperties.txt: Updated from Unicode 17.0.0.
> >         * unicode/emoji-data.txt: Likewise.
> >         * unicode/PropList.txt: Likewise.
> >         * unicode/GraphemeBreakProperty.txt: Likewise.
> >         * unicode/DerivedNormalizationProps.txt: Likewise.
> >         * unicode/NameAliases.txt: Likewise.
> >         * unicode/UnicodeData.txt: Likewise.
> >         * unicode/EastAsianWidth.txt: Likewise.
> >         * unicode/DerivedGeneralCategory.txt: Likewise.
> >         * unicode/HangulSyllableType.txt: New file.
> > gcc/
> >         * c-c++-common/cpp/named-universal-char-escape-1.c: Add test for
> >         \N{CJK UNIFIED IDEOGRAPH-3340E}.
> > libcpp/
> >         * makeucnid.cc (write_copyright): Adjust copyright year.
> >         * makeuname2c.cc (generated_ranges): Adjust end points for a couple
> >         of ranges based on UnicodeData.txt Last changes and add a whole new
> >         CJK UNIFIED IDEOGRAPH- entry.  None of these changes are in the 4-8
> >         table, but clearly it has just been forgotten.
> >         (write_copyright): Adjust copyright year.
> >         (write_dict): Fix up condition when to print semicolon.
> >         * generated_cpp_wcwidth.h: Regenerate.
> >         * ucnid.h: Regenerate.
> >         * uname2c.h: Regenerate.
> > libstdc++-v3/
> >         * include/bits/unicode-data.h: Regenerate.
> >         * testsuite/ext/unicode/properties.cc: Test 
> > __is_extended_pictographic
> >         on U+1F004 rather than U+1F000.
>
> The libstdc++ part is OK, thanks.

Re: [PATCH 1/4] Update to Unicode 17.0.0

Reply via email to