Hello, This series of patches update libunistring-related modules (except uniname[1]) to Unicode 7.0.0 (from 6.0.0). The patch 1 to 4 adjust the internal data structure needed to accommodate the upcoming changes and add more test cases. The patch 5 to 8 incrementally update the Unicode standard to the next version. In order to make reviewing easier, each patch doesn't include generated files, which you can generate with a script: http://du-a.org/~ueno/gen-uni-tables.sh
$ cd gnulib/lib $ git am < 000*.patch $ sh gen-uni-tables.sh 7.0.0 $HOME/Downloads/UCD # This will download necessary files from unicode.org, compile and run # gen-uni-tables.c, and copy test data from UCD. I've cut a snapshot of libunistring after all the above patches applied: ftp://alpha.gnu.org/gnu/libunistring/libunistring-0.9.5a.tar.xz It would be nice if you can test it with real-world text-processing applications. I have a little use-case (a character map[2]), but it is clearly too limited. Regards, Footnotes: [1] http://lists.gnu.org/archive/html/bug-libunistring/2014-06/msg00001.html [2] https://git.gnome.org/browse/gnome-characters Daiki Ueno (8): gen-uni-tables: Check out-of-range values added to 3-level tables unictype/joininggroup-of: Switch to 3-level table uniwbrk: Ignore Extended/Format at the beginning of the line uniwbrk/u32-wordbreaks-tests: Test using WordBreakTest.txt from UCD Update to Unicode 6.1.0 Update to Unicode 6.2.0 Update to Unicode 6.3.0 Update to Unicode 7.0.0 lib/gen-uni-tables.c | 375 +++++++++++++++++++++++++----- lib/unictype.in.h | 37 ++- lib/unictype/bidi_byname.gperf | 12 + lib/unictype/joininggroup_byname.gperf | 59 +++++ lib/unictype/joininggroup_name.h | 29 +++ lib/unictype/joininggroup_of.c | 29 ++- lib/unigbrk.in.h | 3 +- lib/unigbrk/uc-is-grapheme-break.c | 9 +- lib/unilbrk/lbrktables.h | 3 + lib/uniwbrk.in.h | 6 +- lib/uniwbrk/u-wordbreaks.h | 79 ++++--- lib/uniwbrk/wbrktable.c | 52 +++-- lib/uniwbrk/wbrktable.h | 2 +- modules/uniwbrk/u32-wordbreaks-tests | 9 +- tests/unigbrk/test-uc-gbrk-prop.c | 1 + tests/unigbrk/test-uc-is-grapheme-break.c | 1 + tests/uniwbrk/test-uc-wordbreaks.c | 178 ++++++++++++++ tests/uniwbrk/test-uc-wordbreaks.sh | 3 + 18 files changed, 768 insertions(+), 119 deletions(-) create mode 100644 tests/uniwbrk/test-uc-wordbreaks.c create mode 100755 tests/uniwbrk/test-uc-wordbreaks.sh -- 2.1.1
