On Mon, 15 Sept 2025 at 22:00, Jakub Jelinek <[email protected]> wrote: > > Hi! > > The following patches update GCC from Unicode 16.0.0 to 17.0.0. > I'd like to commit it as one patch, but for mailing list restriction > reasons I've split it into 4 patches. > This part contains all the hand editted stuff and some of the updated > or regenerated stuff, next patch contains some updated and regenerated > stuff and last 2 will contain xz -9e compressed halves of the uname2c.h > regenerated changes (which are just huge). > > I've followed what the README says and updated also one script from > glibc, but that needed another Unicode file - HangulSyllableType.txt - > around as well, so I'm adding it. > I've added one new test to named-universal-char-escape-1.c for > randomly chosen character from new CJK block. > Note, Unicode 17.0.0 authors forgot to adjust the 4-8 table, I've filed > bugreports about that but the UnicodeData.txt changes for the range ends > and the new range seems to match e.g. what is in the glyph tables, so > the patch follows UnicodeData.txt and not 4-8 table here. > > Another thing was that makeuname2c.cc didn't handle correctly when > the size of the generated string table modulo 77 was 76 or 77, in which > case it forgot to emit a semicolon after the string literal and so failed > to compile. > > And as can be seen in the emoji-data.txt diff, some properties like > Extended_Pictographic have been removed from certain characters, e.g. > from the Mahjong cards characters except U+1F004, and one libstdc++ > test was testing that property exactly on U+1F000. Dunno why that was > changed, but U+1F004 is the only colored one among tons of black and white > ones. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2025-09-15 Jakub Jelinek <[email protected]> > > contrib/ > * unicode/README: Add HangulSyllableType.txt file to the > list as newest utf8_gen.py from glibc now needs it. Adjust > git commit hash and change unicode 16 version to 17. > * unicode/from_glibc/utf8_gen.py: Updated from glibc. > * unicode/DerivedCoreProperties.txt: Updated from Unicode 17.0.0. > * unicode/emoji-data.txt: Likewise. > * unicode/PropList.txt: Likewise. > * unicode/GraphemeBreakProperty.txt: Likewise. > * unicode/DerivedNormalizationProps.txt: Likewise. > * unicode/NameAliases.txt: Likewise. > * unicode/UnicodeData.txt: Likewise. > * unicode/EastAsianWidth.txt: Likewise. > * unicode/DerivedGeneralCategory.txt: Likewise. > * unicode/HangulSyllableType.txt: New file. > gcc/ > * c-c++-common/cpp/named-universal-char-escape-1.c: Add test for > \N{CJK UNIFIED IDEOGRAPH-3340E}. > libcpp/ > * makeucnid.cc (write_copyright): Adjust copyright year. > * makeuname2c.cc (generated_ranges): Adjust end points for a couple > of ranges based on UnicodeData.txt Last changes and add a whole new > CJK UNIFIED IDEOGRAPH- entry. None of these changes are in the 4-8 > table, but clearly it has just been forgotten. > (write_copyright): Adjust copyright year. > (write_dict): Fix up condition when to print semicolon. > * generated_cpp_wcwidth.h: Regenerate. > * ucnid.h: Regenerate. > * uname2c.h: Regenerate. > libstdc++-v3/ > * include/bits/unicode-data.h: Regenerate. > * testsuite/ext/unicode/properties.cc: Test __is_extended_pictographic > on U+1F004 rather than U+1F000.
The libstdc++ part is OK, thanks.
