Sorry, this was an old gmail draft that I meant to delete and hit send instead!
On Fri, 8 May 2026 at 10:21, Jonathan Wakely <[email protected]> wrote: > > On Mon, 15 Sept 2025 at 22:00, Jakub Jelinek <[email protected]> wrote: > > > > Hi! > > > > The following patches update GCC from Unicode 16.0.0 to 17.0.0. > > I'd like to commit it as one patch, but for mailing list restriction > > reasons I've split it into 4 patches. > > This part contains all the hand editted stuff and some of the updated > > or regenerated stuff, next patch contains some updated and regenerated > > stuff and last 2 will contain xz -9e compressed halves of the uname2c.h > > regenerated changes (which are just huge). > > > > I've followed what the README says and updated also one script from > > glibc, but that needed another Unicode file - HangulSyllableType.txt - > > around as well, so I'm adding it. > > I've added one new test to named-universal-char-escape-1.c for > > randomly chosen character from new CJK block. > > Note, Unicode 17.0.0 authors forgot to adjust the 4-8 table, I've filed > > bugreports about that but the UnicodeData.txt changes for the range ends > > and the new range seems to match e.g. what is in the glyph tables, so > > the patch follows UnicodeData.txt and not 4-8 table here. > > > > Another thing was that makeuname2c.cc didn't handle correctly when > > the size of the generated string table modulo 77 was 76 or 77, in which > > case it forgot to emit a semicolon after the string literal and so failed > > to compile. > > > > And as can be seen in the emoji-data.txt diff, some properties like > > Extended_Pictographic have been removed from certain characters, e.g. > > from the Mahjong cards characters except U+1F004, and one libstdc++ > > test was testing that property exactly on U+1F000. Dunno why that was > > changed, but U+1F004 is the only colored one among tons of black and white > > ones. > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > 2025-09-15 Jakub Jelinek <[email protected]> > > > > contrib/ > > * unicode/README: Add HangulSyllableType.txt file to the > > list as newest utf8_gen.py from glibc now needs it. Adjust > > git commit hash and change unicode 16 version to 17. > > * unicode/from_glibc/utf8_gen.py: Updated from glibc. > > * unicode/DerivedCoreProperties.txt: Updated from Unicode 17.0.0. > > * unicode/emoji-data.txt: Likewise. > > * unicode/PropList.txt: Likewise. > > * unicode/GraphemeBreakProperty.txt: Likewise. > > * unicode/DerivedNormalizationProps.txt: Likewise. > > * unicode/NameAliases.txt: Likewise. > > * unicode/UnicodeData.txt: Likewise. > > * unicode/EastAsianWidth.txt: Likewise. > > * unicode/DerivedGeneralCategory.txt: Likewise. > > * unicode/HangulSyllableType.txt: New file. > > gcc/ > > * c-c++-common/cpp/named-universal-char-escape-1.c: Add test for > > \N{CJK UNIFIED IDEOGRAPH-3340E}. > > libcpp/ > > * makeucnid.cc (write_copyright): Adjust copyright year. > > * makeuname2c.cc (generated_ranges): Adjust end points for a couple > > of ranges based on UnicodeData.txt Last changes and add a whole new > > CJK UNIFIED IDEOGRAPH- entry. None of these changes are in the 4-8 > > table, but clearly it has just been forgotten. > > (write_copyright): Adjust copyright year. > > (write_dict): Fix up condition when to print semicolon. > > * generated_cpp_wcwidth.h: Regenerate. > > * ucnid.h: Regenerate. > > * uname2c.h: Regenerate. > > libstdc++-v3/ > > * include/bits/unicode-data.h: Regenerate. > > * testsuite/ext/unicode/properties.cc: Test > > __is_extended_pictographic > > on U+1F004 rather than U+1F000. > > The libstdc++ part is OK, thanks.
