On 04/20/2010 12:47 AM, Eric Blake wrote:
On 04/19/2010 06:14 AM, Paolo Bonzini wrote:
+ /* A valid UTF-8 character is
+
+ ([0x00-0x7f]
+ |[0xc2-0xdf][0x80-0xbf]
+ |[0xe0-0xef[0x80-0xbf][0x80-0xbf]
+ |[0xf0-f7][0x80-0xbf][0x80-0xbf][0x80-0xbf])
Yes, but in POSIX XBD 9.3.4,
http://www.opengroup.org/onlinepubs/9699919799/toc.htm, the ANYCHAR does
not match NUL. Do you need to adjust this patch to exclude 0x00?
Yes (following the syntax bits).
Does this seem okay?
Paolo
diff --git a/gnulib b/gnulib
index 5fbd6e3..bfffe40 160000
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit 5fbd6e3e571c6e59270fa486bd7c83dfe04c87cf
+Subproject commit bfffe408f8b375fd0989266bd8c01580be26d1a8
diff --git a/src/dfa.c b/src/dfa.c
index 61322d1..d9c5ba2 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -1487,7 +1487,17 @@ add_utf8_anychar (void)
/* Define the five character classes that are needed below. */
if (dfa->utf8_anychar_classes[0] == 0)
for (i = 0; i < n; i++)
- dfa->utf8_anychar_classes[i] = CSET + charclass_index(utf8_classes[i]);
+ {
+ charclass c = utf8_classes[i];
+ if (i == 1)
+ {
+ if (!(syntax_bits & RE_DOT_NEWLINE))
+ clrbit (c, eolbyte);
+ if (syntax_bits & RE_DOT_NOT_NULL)
+ clrbit (c, '\0');
+ }
+ dfa->utf8_anychar_classes[i] = CSET + charclass_index(c);
+ }
/* A valid UTF-8 character is