Re: [bug-libunistring] UAX #29 changes

Daiki Ueno Mon, 13 Nov 2017 09:04:52 -0800

Daiki Ueno <[email protected]> writes:

> I have rebased the patch against the latest git master and pushed into
> 'ueno/unicode-9.0.0' branch in the gnulib repository:
> http://git.savannah.gnu.org/cgit/gnulib.git/log/?h=ueno/unicode-9.0.0


The attached is the corresponding documentation change to libunistring.
Bruno, did you have time to look at the Gnulib changes?  I would like to
merge the branch soon before I completely forget about it ;-)

Regards,
-- 
Daiki Ueno

>From 3968938b1d21d87e2f6c03e9fe5453bf413d7d7c Mon Sep 17 00:00:00 2001
From: Daiki Ueno <[email protected]>
Date: Mon, 13 Nov 2017 17:48:27 +0100
Subject: [PATCH] unigbrk: Update from Gnulib

---
 ChangeLog                | 10 ++++++++++
 autogen.sh               |  1 +
 doc/unigbrk.texi         | 30 +++++++++++++++++++++++++++++-
 lib/unigbrk/.gitignore   |  2 ++
 tests/unigbrk/.gitignore |  2 ++
 5 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index f2e4563..c600849 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@
+2017-11-13  Daiki Ueno  <[email protected]>
+
+	* autogen.sh (GNULIB_MODULES): Pull unigbrk/uc-grapheme-breaks.
+	* doc/unigbrk.texi (Grapheme cluster breaks in a string): Mention
+	the limitations of *_grapheme_next and *_grapheme_prev functions
+	and suggest *_grapheme_breaks instead.
+	(Grapheme cluster break property): Document newly added
+	properties; mention the limitations of uc_is_grapheme_break and
+	suggest to use uc_grapheme_breaks instead.
+
 2017-10-21  Bruno Haible  <[email protected]>
 
 	Upgrade to newer libtool.
diff --git a/autogen.sh b/autogen.sh
index e836db8..269ab0d 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -329,6 +329,7 @@ if test $skip_gnulib = false; then
       unigbrk/uc-gbrk-prop
       unigbrk/uc-is-grapheme-break
       unigbrk/ulc-grapheme-breaks
+      unigbrk/uc-grapheme-breaks
       uniwbrk/base
       uniwbrk/u8-wordbreaks
       uniwbrk/u16-wordbreaks
diff --git a/doc/unigbrk.texi b/doc/unigbrk.texi
index 196bd9f..d7847cc 100644
--- a/doc/unigbrk.texi
+++ b/doc/unigbrk.texi
@@ -44,6 +44,11 @@ clusters in a string.
 Returns the start of the next grapheme cluster following @var{s},
 or @var{end} if no grapheme cluster break is encountered before it.
 Returns NULL if and only if @code{@var{s} == @var{end}}.
+
+Note that these functions do not handle the case when a character
+outside of the range between @var{s} and @var{end} is needed to
+determine the boundary.  Use @func{_grapheme_breaks} functions for such
+cases.
 @end deftypefun
 
 @deftypefun void u8_grapheme_prev (const uint8_t *@var{s}, const uint8_t *@var{start})
@@ -52,6 +57,11 @@ Returns NULL if and only if @code{@var{s} == @var{end}}.
 Returns the start of the grapheme cluster preceding @var{s}, or
 @var{start} if no grapheme cluster break is encountered before it.
 Returns NULL if and only if @code{@var{s} == @var{start}}.
+
+Note that these functions do not handle the case when a character
+outside of the range between @var{start} and @var{s} is needed to
+determine the boundary.  Use @func{_grapheme_breaks} functions for such
+cases.
 @end deftypefun
 
 The following functions determine all of the grapheme cluster
@@ -61,8 +71,9 @@ boundaries in a string.
 @deftypefunx void u16_grapheme_breaks (const uint16_t *@var{s}, size_t @var{n}, char *@var{p})
 @deftypefunx void u32_grapheme_breaks (const uint32_t *@var{s}, size_t @var{n}, char *@var{p})
 @deftypefunx void ulc_grapheme_breaks (const char *@var{s}, size_t @var{n}, char *@var{p})
+@deftypefunx void uc_grapheme_breaks (const ucs_t *@var{s}, size_t @var{n}, char *@var{p})
 Determines the grapheme cluster break points in @var{s}, an array of
-@var{n} units, and stores the result at @code{@var{p}[0..@var{n}-1]}.
+@var{n} units, and stores the result at @code{@var{p}[0..@var{nx}-1]}.
 @table @asis
 @item @code{@var{p}[i] = 1}
 means that there is a grapheme cluster boundary between
@@ -73,6 +84,13 @@ same grapheme cluster.
 @end table
 @code{@var{p}[0]} is always set to 1, because there is always a
 grapheme cluster break at start of text.
+
+In addition to the above variants for UTF-8, UTF-16, and UTF-32 strings,
+@code{<unigbrk.h>} provides another variant: @func{uc_grapheme_breaks}.
+
+This is similar to @func{u32_grapheme_breaks}, but it accepts any
+characters which may not be represented in UTF-32, such as control
+characters.
 @end deftypefun
 
 @node Grapheme cluster break property
@@ -99,6 +117,12 @@ property.  More values may be added in the future.
 @deftypevrx Constant int GBP_T
 @deftypevrx Constant int GBP_LV
 @deftypevrx Constant int GBP_LVT
+@deftypevrx Constant int GBP_RI
+@deftypevrx Constant int GBP_ZWJ
+@deftypevrx Constant int GBP_EB
+@deftypevrx Constant int GBP_EM
+@deftypevrx Constant int GBP_GAZ
+@deftypevrx Constant int GBP_EBG
 @end deftypevr
 
 The following function looks up the grapheme cluster break property of a
@@ -123,4 +147,8 @@ of text, respectively.
 This implements the extended (not legacy) grapheme cluster rules
 described in the Unicode standard, because the standard says that they
 are preferred.
+
+Note that this function do not handle the case when three ore more
+consecutive characters are needed to determine the boundary.  Use
+@func{uc_grapheme_breaks} for such cases.
 @end deftypefun
diff --git a/lib/unigbrk/.gitignore b/lib/unigbrk/.gitignore
index a7507c9..a9ae5e6 100644
--- a/lib/unigbrk/.gitignore
+++ b/lib/unigbrk/.gitignore
@@ -1,5 +1,6 @@
 # Files brought in by gnulib-tool:
 /gbrkprop.h
+/u-grapheme-breaks.h
 /u16-grapheme-breaks.c
 /u16-grapheme-next.c
 /u16-grapheme-prev.c
@@ -10,6 +11,7 @@
 /u8-grapheme-next.c
 /u8-grapheme-prev.c
 /uc-gbrk-prop.c
+/uc-grapheme-breaks.c
 /uc-is-grapheme-break.c
 /ulc-grapheme-breaks.c
 
diff --git a/tests/unigbrk/.gitignore b/tests/unigbrk/.gitignore
index 9e1dc4c..a8f7f51 100644
--- a/tests/unigbrk/.gitignore
+++ b/tests/unigbrk/.gitignore
@@ -11,6 +11,8 @@
 /test-u8-grapheme-prev.c
 /test-uc-gbrk-prop.c
 /test-uc-gbrk-prop.h
+/test-uc-grapheme-breaks.c
+/test-uc-grapheme-breaks.sh
 /test-uc-is-grapheme-break.c
 /test-uc-is-grapheme-break.sh
 /test-ulc-grapheme-breaks.c
-- 
2.13.6

Re: [bug-libunistring] UAX #29 changes

Reply via email to