On 05/01/2014 11:29 AM, Aharon Robbins wrote:
custom.h is for system customization to override things that Autoconf can't figure out or gets wrong

OK, it's easy to have something else include mbsupport.h instead. config.h, say. The attached patch does that. It doesn't really matter what includes it, so long as it's done before dfa.c and dfa.h start using the multibyte functions.

Requiring gnulib in that header makes it less attractive to other projects that might want to use dfa as a black box. Are there such? I don't know. (I thought I'd heard something about gettext using dfa but I am unsure if that is true.)

gettext uses gnulib, so that's not an issue.

Does the GL_PURE stuff have to be on every declaration? Or can it just be on the body?

It should be on the declaration for external functions, so that the function's caller knows to optimize it.

What does it even mean

It means the function has no effects except the return value and that the return value depends only on the parameters and/or global variables.

Does whatever optimization it enables *really* make a big difference, or is it just a micro-optimization?

We put it in because GCC nowadays complains if we leave it out, if we configure with --enable-gcc-warnings. The optimization seems to be a win in general and (more important) an aid for humans reading the code, so we typically just add the pure attribute and move on.

Yes, I know. I am unsure if your patch, which totally eliminates the ability to compile gawk on systems without multibyte support

It's not supposed to do that. It's supposed to work on those hosts, by supplying substitutes for wchar_t, wctype_t, etc. Hmm, are you worried about hosts that don't even have wchar.h and wctype.h? If so, that can be worked around reasonably easily; please see attached patch.

I just looked at the patch again. It really doesn't do the trick; there are lots of places where MBS_SUPPORT is checked in the gawk code and pulling mbsupport.h out of awk.h is likely to break things

No, it should still work. With the revised patch, config.h includes mbsupport.h, so MBS_SUPPORT will be defined appropriately for gawk code and gawk's other MBS_SUPPORT usages will continue to work as before.

I'll CC: this to Bug#17157 and Bug#17072 as it's following up to the last messages in both those threads, too.
From f6112aca41ea8bd2028ea5b00a3a75db14a32eef Mon Sep 17 00:00:00 2001
From: Paul Eggert <[email protected]>
Date: Thu, 1 May 2014 23:09:00 -0700
Subject: [PATCH] awk: simplify dfa.c by having it not include mbsupport.h
 directly

This syncs dfa.c better with 'grep'.
* Makefile.am (STDBOOL_H, WCHAR_H, WCTYPE_H): New macros.
($(gawk_OBJCETS)): Depend on them.
(stdbool.h, wchar.h, wctype.h): New rules.
(CLEANFILES): Add the new files to this list.
* awk.h, regex_internal.h, dfa.c: Don't include mbsupport.h.
* configure.ac: Arrange for config.h to include it instead.
(STDBOOL_H, WCHAR_H, WCTYPE_H): New configuration items.
* custom.h (_GL_ATTRIBUTE_PURE): Move here from dfa.c, to lessen the
number of differences between grep's dfa.c and ours.
* dfa.c: Include wchar.h and wctype.h unconditionally, as
this simplifies the use of dfa.c in grep, and it does no harm
in gawk.
(setlocale) [!LC_ALL]:
(gawk_mb_cur_max, MB_CUR_MAX, mbrtowc) [LIBC_IS_BORKED]:
Move to mbsupport.h (needed for consistency in all uses),
and fix mbrtowc to return size_t.
(struct dfa, dfambcache, mbs_to_wchar)
(is_valid_unibyte_character, setbit_wc, using_utf8, FETCH_WC)
(addtok_wc, add_utf8_anychar, atom, state_index, epsclosure)
(dfaanalyze, dfastate, prepare_wc_buf, dfaoptimize, dfafree, dfamust):
* dfasearch.c (EGexecute):
* grep.c (main):
* searchutils.c (mbtoupper):
Assume MBS_SUPPORT.
* dfa.h: Include stdbool.h unconditionally, so that this file is
closer to what's in grep.
* mbsupport.h [!MBS_SUPPORT]: Include wchar.h, wctype.h
before overriding their definitions.
(WEOF, towupper, towlower, btowc, iswalnum, iswalpha, iswupper)
(iswlower, mbrtowc, wcrtomb, wctype, iswctype, wcscoll):
(btowc): Parenthesize properly.
(mbrtowc, wcrtomb): New macros.
(wctype, iswctype, wcscoll): Define to gawk_wctype etc. to avoid
collisions with standard library.
---
 ChangeLog           | 40 +++++++++++++++++++++++
 Makefile.am         | 11 +++++++
 awk.h               |  2 --
 configure.ac        |  9 ++++++
 custom.h            |  7 ++++
 dfa.c               | 93 +++++++----------------------------------------------
 dfa.h               |  4 ---
 mbsupport.h         | 57 +++++++++++++++++++++++++++++---
 missing_d/ChangeLog |  4 +++
 missing_d/wcmisc.c  | 10 ------
 regex.h             | 18 ++++++++---
 regex_internal.h    |  2 --
 12 files changed, 149 insertions(+), 108 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index c1b294b..8ebfaed 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,43 @@
+2014-05-01  Paul Eggert  <[email protected]>
+
+       awk: simplify dfa.c by having it not include mbsupport.h directly
+       This syncs dfa.c better with 'grep'.
+       * Makefile.am (STDBOOL_H, WCHAR_H, WCTYPE_H): New macros.
+       ($(gawk_OBJCETS)): Depend on them.
+       (stdbool.h, wchar.h, wctype.h): New rules.
+       (CLEANFILES): Add the new files to this list.
+       * awk.h, regex_internal.h, dfa.c: Don't include mbsupport.h.
+       * configure.ac: Arrange for config.h to include it instead.
+       (STDBOOL_H, WCHAR_H, WCTYPE_H): New configuration items.
+       * custom.h (_GL_ATTRIBUTE_PURE): Move here from dfa.c, to lessen the
+       number of differences between grep's dfa.c and ours.
+       * dfa.c: Include wchar.h and wctype.h unconditionally, as
+       this simplifies the use of dfa.c in grep, and it does no harm
+       in gawk.
+       (setlocale) [!LC_ALL]:
+       (gawk_mb_cur_max, MB_CUR_MAX, mbrtowc) [LIBC_IS_BORKED]:
+       Move to mbsupport.h (needed for consistency in all uses),
+       and fix mbrtowc to return size_t.
+       (struct dfa, dfambcache, mbs_to_wchar)
+       (is_valid_unibyte_character, setbit_wc, using_utf8, FETCH_WC)
+       (addtok_wc, add_utf8_anychar, atom, state_index, epsclosure)
+       (dfaanalyze, dfastate, prepare_wc_buf, dfaoptimize, dfafree, dfamust):
+       * dfasearch.c (EGexecute):
+       * grep.c (main):
+       * searchutils.c (mbtoupper):
+       Assume MBS_SUPPORT.
+       * dfa.h: Include stdbool.h unconditionally, so that this file is
+       closer to what's in grep.
+       * mbsupport.h [!MBS_SUPPORT]: Include wchar.h, wctype.h
+       before overriding their definitions.
+       (WEOF, towupper, towlower, btowc, iswalnum, iswalpha, iswupper)
+       (iswlower, mbrtowc, wcrtomb, wctype, iswctype, wcscoll):
+       #undef before #defining.
+       (btowc): Parenthesize properly.
+       (mbrtowc, wcrtomb): New macros.
+       (wctype, iswctype, wcscoll): Define to gawk_wctype etc. to avoid
+       collisions with standard library.
+
 2014-04-25         Andrew J. Schorr     <[email protected]>
 
        * io.c (two_way_open): In forked child, reset SIGPIPE to SIG_DFL.
diff --git a/Makefile.am b/Makefile.am
index 6e5715d..f1a725a 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -196,6 +196,17 @@ command.c: command.y
        $(YACC) -p zz $<
        sed 's/parse error/syntax error/g' < y.tab.c | awk -f 
$(srcdir)/bisonfix.awk command > $*.c && rm y.tab.c
 
+# Arrange for some standard headers on platforms that lack them.
+STDBOOL_H = @STDBOOL_H@
+WCHAR_H = @WCHAR_H@
+WCTYPE_H = @WCTYPE_H@
+$(gawk_OBJECTS): $(STDBOOL_H) $(WCHAR_H) $(WCTYPE_H)
+stdbool.h:
+       echo '#include "missing_d/gawkbool.h"' >$@
+wchar.h wctype.h:
+       echo '' >$@
+CLEANFILES += stdbool.h wchar.h wctype.h
+
 # This is for my development & testing.
 efence: gawk
        $(CC) $(LDFLAGS) -o gawk $$(ls *.o | grep -v '_p.o$$') $(LIBS) -lefence
diff --git a/awk.h b/awk.h
index aefdd07..cdba7a8 100644
--- a/awk.h
+++ b/awk.h
@@ -95,8 +95,6 @@ extern int errno;
 #include "missing_d/gawkbool.h"
 #endif
 
-#include "mbsupport.h" /* defines MBS_SUPPORT */
-
 #if MBS_SUPPORT
 /* We can handle multibyte strings.  */
 #include <wchar.h>
diff --git a/configure.ac b/configure.ac
index e7e2d5f..2447c32 100644
--- a/configure.ac
+++ b/configure.ac
@@ -153,6 +153,14 @@ else
        AC_CHECK_HEADERS(strings.h)
 fi
 
+STDBOOL_H= WCHAR_H= WCTYPE_H=
+test "$ac_cv_header_stdbool_h" != yes && STDBOOL_H=stdbool.h
+test "$ac_cv_header_wchar_h" != yes && WCHAR_H=wchar.h
+test "$ac_cv_header_wctype_h" != yes && WCTYPE_H=wctype.h
+AC_SUBST([STDBOOL_H])
+AC_SUBST([WCHAR_H])
+AC_SUBST([WCTYPE_H])
+
 dnl Check cross compiling
 AM_CONDITIONAL([TEST_CROSS_COMPILE], [test "x$build_alias" != "x$host_alias"])
 
@@ -390,6 +398,7 @@ AC_C_STRINGIZE
 
 AC_CONFIG_HEADERS([config.h:configh.in])
 AH_BOTTOM([#include "custom.h"])
+AH_BOTTOM([#include "mbsupport.h"])
 
 dnl Crude but small hack to make plug-ins work on Mac OS X
 dnl We should really use the libtool value for shrext_cmds, but that
diff --git a/custom.h b/custom.h
index 36b4aa0..5b19dd4 100644
--- a/custom.h
+++ b/custom.h
@@ -76,3 +76,10 @@
 extern int setenv(const char *name, const char *value, int rewrite);
 extern int unsetenv(const char *name);
 #endif
+
+/* The __pure__ attribute was added in gcc 2.96.  */
+#if __GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 96)
+# define _GL_ATTRIBUTE_PURE __attribute__ ((__pure__))
+#else
+# define _GL_ATTRIBUTE_PURE /* empty */
+#endif
diff --git a/dfa.c b/dfa.c
index d306d5c..9c41fd1 100644
--- a/dfa.c
+++ b/dfa.c
@@ -22,6 +22,8 @@
 
 #include <config.h>
 
+#include "dfa.h"
+
 #include <assert.h>
 #include <ctype.h>
 #include <stdio.h>
@@ -38,11 +40,6 @@
 #include <locale.h>
 #endif
 
-/* Gawk doesn't use Gnulib, so don't assume that setlocale is present.  */
-#ifndef LC_ALL
-# define setlocale(category, locale) NULL
-#endif
-
 #define STREQ(a, b) (strcmp (a, b) == 0)
 
 /* ISASCIIDIGIT differs from isdigit, as follows:
@@ -59,26 +56,11 @@
 #include "gettext.h"
 #define _(str) gettext (str)
 
-#include "mbsupport.h" /* Define MBS_SUPPORT to 1 or 0, as appropriate.  */
-#if MBS_SUPPORT
-/* We can handle multibyte strings.  */
-# include <wchar.h>
-# include <wctype.h>
-#endif
-
-#ifdef GAWK
-/* The __pure__ attribute was added in gcc 2.96.  */
-#if __GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 96)
-# define _GL_ATTRIBUTE_PURE __attribute__ ((__pure__))
-#else
-# define _GL_ATTRIBUTE_PURE /* empty */
-#endif
-#endif /* GAWK */
+#include <wchar.h>
+#include <wctype.h>
 
 #include "xalloc.h"
 
-#include "dfa.h"
-
 #ifdef GAWK
 static int
 is_blank (int c)
@@ -87,14 +69,6 @@ is_blank (int c)
 }
 #endif /* GAWK */
 
-#ifdef LIBC_IS_BORKED
-extern int gawk_mb_cur_max;
-#undef MB_CUR_MAX
-#define MB_CUR_MAX gawk_mb_cur_max
-#undef mbrtowc
-#define mbrtowc(a, b, c, d) (-1)
-#endif
-
 /* HPUX defines these as macros in sys/param.h.  */
 #ifdef setbit
 # undef setbit
@@ -402,13 +376,11 @@ struct dfa
    */
   int *multibyte_prop;
 
-#if MBS_SUPPORT
   /* A table indexed by byte values that contains the corresponding wide
      character (if any) for that byte.  WEOF means the byte is the
      leading byte of a multibyte character.  Invalid and null bytes are
      mapped to themselves.  */
   wint_t mbrtowc_cache[NOTCHAR];
-#endif
 
   /* Array of the bracket expression in the DFA.  */
   struct mb_char_classes *mbcsets;
@@ -488,7 +460,6 @@ static void regexp (void);
 static void
 dfambcache (struct dfa *d)
 {
-#if MBS_SUPPORT
   int i;
   for (i = CHAR_MIN; i <= CHAR_MAX; ++i)
     {
@@ -505,10 +476,8 @@ dfambcache (struct dfa *d)
         }
       d->mbrtowc_cache[uc] = wi;
     }
-#endif
 }
 
-#if MBS_SUPPORT
 /* Store into *PWC the result of converting the leading bytes of the
    multibyte buffer S of length N bytes, using the mbrtowc_cache in *D
    and updating the conversion state in *D.  On conversion error,
@@ -543,7 +512,6 @@ mbs_to_wchar (wchar_t *pwc, char const *s, size_t n, struct 
dfa *d)
   *pwc = wc;
   return 1;
 }
-#endif
 
 #ifdef DEBUG
 
@@ -737,7 +705,7 @@ static charclass newline;
 #ifdef __GLIBC__
 # define is_valid_unibyte_character(c) 1
 #else
-# define is_valid_unibyte_character(c) (! (MBS_SUPPORT && btowc (c) == WEOF))
+# define is_valid_unibyte_character(c) (btowc (c) != WEOF)
 #endif
 
 /* Return non-zero if C is a "word-constituent" byte; zero otherwise.  */
@@ -798,17 +766,12 @@ dfasyntax (reg_syntax_t bits, int fold, unsigned char eol)
 static bool
 setbit_wc (wint_t wc, charclass c)
 {
-#if MBS_SUPPORT
   int b = wctob (wc);
   if (b == EOF)
     return false;
 
   setbit (b, c);
   return true;
-#else
-  abort ();
-   /*NOTREACHED*/ return false;
-#endif
 }
 
 /* Set a bit for B and its case variants in the charclass C.
@@ -904,7 +867,6 @@ static wchar_t wctok;           /* Wide character 
representation of the current
                                    multibyte character.  */
 
 
-#if MBS_SUPPORT
 /* Note that characters become unsigned here.  */
 # define FETCH_WC(c, wc, eoferr)               \
   do {                                         \
@@ -927,23 +889,6 @@ static wchar_t wctok;           /* Wide character 
representation of the current
       }                                                \
   } while (0)
 
-#else
-/* Note that characters become unsigned here.  */
-# define FETCH_WC(c, unused, eoferr)  \
-  do {                               \
-    if (! lexleft)                   \
-      {                                      \
-        if ((eoferr) != 0)           \
-          dfaerror (eoferr);         \
-        else                         \
-          return lasttok = END;              \
-      }                                      \
-    (c) = to_uchar (*lexptr++);       \
-    --lexleft;                       \
-  } while (0)
-
-#endif /* MBS_SUPPORT */
-
 #ifndef MIN
 # define MIN(a,b) ((a) < (b) ? (a) : (b))
 #endif
@@ -1728,7 +1673,6 @@ addtok (token t)
     }
 }
 
-#if MBS_SUPPORT
 /* We treat a multibyte character as a single atom, so that DFA
    can treat a multibyte character as a single expression.
 
@@ -1760,17 +1704,10 @@ addtok_wc (wint_t wc)
       addtok (CAT);
     }
 }
-#else
-static void
-addtok_wc (wint_t wc)
-{
-}
-#endif
 
 static void
 add_utf8_anychar (void)
 {
-#if MBS_SUPPORT
   static const charclass utf8_classes[5] = {
     {0, 0, 0, 0, ~0, ~0, 0, 0},                /* 80-bf: non-leading bytes */
     {~0, ~0, ~0, ~0, 0, 0, 0, 0},       /* 00-7f: 1-byte sequence */
@@ -1815,7 +1752,6 @@ add_utf8_anychar (void)
       addtok (CAT);
       addtok (OR);
     }
-#endif
 }
 
 /* The grammar understood by the parser is as follows.
@@ -1856,7 +1792,7 @@ add_utf8_anychar (void)
 static void
 atom (void)
 {
-  if (MBS_SUPPORT && tok == WCHAR)
+  if (tok == WCHAR)
     {
       addtok_wc (wctok);
 
@@ -1873,7 +1809,7 @@ atom (void)
 
       tok = lex ();
     }
-  else if (MBS_SUPPORT && tok == ANYCHAR && using_utf8 ())
+  else if (tok == ANYCHAR && using_utf8 ())
     {
       /* For UTF-8 expand the period to a series of CSETs that define a valid
          UTF-8 character.  This avoids using the slow multibyte path.  I'm
@@ -1887,9 +1823,7 @@ atom (void)
     }
   else if ((tok >= 0 && tok < NOTCHAR) || tok >= CSET || tok == BACKREF
            || tok == BEGLINE || tok == ENDLINE || tok == BEGWORD
-#if MBS_SUPPORT
            || tok == ANYCHAR || tok == MBCSET
-#endif /* MBS_SUPPORT */
            || tok == ENDWORD || tok == LIMWORD || tok == NOTLIMWORD)
     {
       addtok (tok);
@@ -2224,10 +2158,8 @@ epsclosure (position_set * s, struct dfa const *d)
   for (i = 0; i < s->nelem; ++i)
     if (d->tokens[s->elems[i].index] >= NOTCHAR
         && d->tokens[s->elems[i].index] != BACKREF
-#if MBS_SUPPORT
         && d->tokens[s->elems[i].index] != ANYCHAR
         && d->tokens[s->elems[i].index] != MBCSET
-#endif
         && d->tokens[s->elems[i].index] < CSET)
       {
         old = s->elems[i];
@@ -2541,9 +2473,7 @@ dfaanalyze (struct dfa *d, int searchflag)
      it with its epsilon closure.  */
   for (i = 0; i < d->tindex; ++i)
     if (d->tokens[i] < NOTCHAR || d->tokens[i] == BACKREF
-#if MBS_SUPPORT
         || d->tokens[i] == ANYCHAR || d->tokens[i] == MBCSET
-#endif
         || d->tokens[i] >= CSET)
       {
 #ifdef DEBUG
@@ -2643,9 +2573,8 @@ dfastate (state_num s, struct dfa *d, state_num trans[])
         setbit (d->tokens[pos.index], matches);
       else if (d->tokens[pos.index] >= CSET)
         copyset (d->charclasses[d->tokens[pos.index] - CSET], matches);
-      else if (MBS_SUPPORT
-               && (d->tokens[pos.index] == ANYCHAR
-                   || d->tokens[pos.index] == MBCSET))
+      else if (d->tokens[pos.index] == ANYCHAR
+               || d->tokens[pos.index] == MBCSET)
         /* MB_CUR_MAX > 1  */
         {
           /* ANYCHAR and MBCSET must match with a single character, so we
@@ -2820,7 +2749,7 @@ dfastate (state_num s, struct dfa *d, state_num trans[])
       /* If we are building a searching matcher, throw in the positions
          of state 0 as well.  */
       if (d->searchflag
-          && (!MBS_SUPPORT || (!d->multibyte || !next_isnt_1st_byte)))
+          && (MB_CUR_MAX == 1 || !next_isnt_1st_byte))
         for (j = 0; j < d->states[0].elems.nelem; ++j)
           insert (d->states[0].elems.elems[j], &follows);
 
@@ -3541,7 +3470,7 @@ dfaoptimize (struct dfa *d)
 {
   size_t i;
 
-  if (!MBS_SUPPORT || !using_utf8 ())
+  if (!using_utf8 ())
     return;
 
   for (i = 0; i < d->tindex; ++i)
diff --git a/dfa.h b/dfa.h
index 1514236..60aff11 100644
--- a/dfa.h
+++ b/dfa.h
@@ -19,11 +19,7 @@
 /* Written June, 1988 by Mike Haertel */
 
 #include <regex.h>
-#ifdef HAVE_STDBOOL_H
 #include <stdbool.h>
-#else
-#include "missing_d/gawkbool.h"
-#endif /* HAVE_STDBOOL_H */
 #include <stddef.h>
 
 /* Element of a list of strings, at least one of which is known to
diff --git a/mbsupport.h b/mbsupport.h
index 9a62486..198a0f3 100644
--- a/mbsupport.h
+++ b/mbsupport.h
@@ -66,6 +66,15 @@
 #endif
 
 #if ! MBS_SUPPORT
+
+/* Include wchar.h and wctype.h so their definitions can be overridden.  */
+
+# include <wchar.h>
+# include <wctype.h>
+
+/* Override the definitions of wchar.h and wctype.h to provide a
+   unibyte substitute that is good enough for Gawk.  */
+
 # undef MB_CUR_MAX
 # define MB_CUR_MAX 1
 
@@ -78,15 +87,24 @@
 #define wctype_t       int
 #define wint_t         int
 #define mbstate_t      int
+#undef WEOF
 #define WEOF           EOF
+#undef towupper
 #define towupper       toupper
+#undef towlower
 #define towlower       tolower
 #ifndef __DJGPP__
-#define btowc(x)       ((int)x)
+#undef btowc
+#define btowc(x)       ((int) (x))
 #endif
+#undef iswalnum
 #define iswalnum       isalnum
+#undef iswalpha
 #define iswalpha       isalpha
+#undef iswupper
 #define iswupper       isupper
+#undef iswlower
+#define iswlower       islower
 #if defined(ZOS_USS)
 #undef towupper
 #undef towlower
@@ -94,12 +112,43 @@
 #undef iswalnum
 #undef iswalpha
 #undef iswupper
-#undef wctype
-#undef iswctype
-#undef wcscoll
 #endif
 
+#undef mbrtowc
+#define mbrtowc(pwc, s, n, ps) ((size_t) -1)
+#undef wcrtomb
+#define wcrtomb(s, wc, ps) ((size_t) -1)
+
+#undef wctype
+#define wctype gawk_wctype
 extern wctype_t wctype(const char *name);
+#undef iswctype
+#define iswctype gawk_iswctype
 extern int iswctype(wint_t wc, wctype_t desc);
+#undef wcscoll
+#define wcscoll gawk_wcscoll
 extern int wcscoll(const wchar_t *ws1, const wchar_t *ws2);
 #endif
+
+#ifdef LIBC_IS_BORKED
+# include <wchar.h>
+extern int gawk_mb_cur_max;
+# undef MB_CUR_MAX
+# undef mbrtowc
+# define MB_CUR_MAX gawk_mb_cur_max
+# define mbrtowc(a, b, c, d) ((size_t) -1)
+#endif
+
+#include <locale.h>
+#ifndef LC_ALL
+# define setlocale(category, locale) NULL
+#endif
+
+#include <assert.h>
+#ifndef static_assert
+# define static_assert(cond, diagnostic) \
+    extern int (*foo (void)) [!!sizeof (struct { int foo: (cond) ? 8 : -1; })]
+#endif
+
+/* Make sure RE_DUP_MAX gets the correct value.  */
+#define _REGEX_INCLUDE_LIMITS_H
diff --git a/missing_d/ChangeLog b/missing_d/ChangeLog
index 70fbde6..4686c74 100644
--- a/missing_d/ChangeLog
+++ b/missing_d/ChangeLog
@@ -1,3 +1,7 @@
+2014-05-01  Paul Eggert  <[email protected]>
+
+       * wcmisc.c: Remove now-unnecessary ifdefs.
+
 2014-04-08         Arnold D. Robbins     <[email protected]>
 
        * 4.1.1: Release tar ball made.
diff --git a/missing_d/wcmisc.c b/missing_d/wcmisc.c
index d2b7aa0..89e24c9 100644
--- a/missing_d/wcmisc.c
+++ b/missing_d/wcmisc.c
@@ -16,7 +16,6 @@
    Foundation, Inc.,
    51 Franklin Street - Fifth Floor, Boston, MA  02110-1301, USA */
 
-#if !defined(HAVE_WCTYPE) || !defined(HAVE_ISWCTYPE)
 static const char *classes[] = {
        "<dummy>",
        "alnum",
@@ -33,16 +32,12 @@ static const char *classes[] = {
        "xdigit",
        NULL
 };
-#endif
 
-#ifndef HAVE_ISWCTYPE
 static int is_blank (int c)
 {
    return (c == ' ' || c == '\t');
 }
-#endif
 
-#ifndef HAVE_WCTYPE
 wctype_t wctype(const char *name)
 {
        int i;
@@ -53,9 +48,7 @@ wctype_t wctype(const char *name)
 
        return 0;
 }
-#endif
 
-#ifndef HAVE_ISWCTYPE
 int iswctype(wint_t wc, wctype_t desc)
 {
        int j = sizeof(classes) / sizeof(classes[0]);
@@ -79,9 +72,7 @@ int iswctype(wint_t wc, wctype_t desc)
        default:        return 0;
        }
 }
-#endif
 
-#ifndef HAVE_WCSCOLL
 int wcscoll(const wchar_t *ws1, const wchar_t *ws2)
 {
        size_t i;
@@ -95,6 +86,5 @@ int wcscoll(const wchar_t *ws1, const wchar_t *ws2)
 
        return (ws1[i] - ws2[i]);
 }
-#endif
 
 /*wcmisc.c*/
diff --git a/regex.h b/regex.h
index 5660296..400b407 100644
--- a/regex.h
+++ b/regex.h
@@ -264,14 +264,24 @@ extern reg_syntax_t re_syntax_options;
    | RE_NO_BK_PARENS        | RE_NO_BK_REFS                            \
    | RE_NO_BK_VBAR         | RE_UNMATCHED_RIGHT_PAREN_ORD)
 /* [[[end syntaxes]]] */
-
-/* Maximum number of duplicates an interval can allow.  Some systems
-   (erroneously) define this in other header files, but we want our
+
+/* Maximum number of duplicates an interval can allow.  POSIX-conforming
+   systems might define this in <limits.h>, but we want our
    value, so remove any previous define.  */
+# ifdef _REGEX_INCLUDE_LIMITS_H
+#  include <limits.h>
+# endif
 # ifdef RE_DUP_MAX
 #  undef RE_DUP_MAX
 # endif
-/* If sizeof(int) == 2, then ((1 << 15) - 1) overflows.  */
+
+/* RE_DUP_MAX is 2**15 - 1 because an earlier implementation stored
+   the counter as a 2-byte signed integer.  This is no longer true, so
+   RE_DUP_MAX could be increased to (INT_MAX / 10 - 1), or to
+   ((SIZE_MAX - 9) / 10) if _REGEX_LARGE_OFFSETS is defined.
+   However, there would be a huge performance problem if someone
+   actually used a pattern like a\{214748363\}, so RE_DUP_MAX retains
+   its historical value.  */
 # define RE_DUP_MAX (0x7fff)
 #endif
 
diff --git a/regex_internal.h b/regex_internal.h
index c8981a0..758cf47 100644
--- a/regex_internal.h
+++ b/regex_internal.h
@@ -26,8 +26,6 @@
 #include <stdlib.h>
 #include <string.h>
 
-#include "mbsupport.h" /* gawk */
-
 #if defined HAVE_LANGINFO_H || defined HAVE_LANGINFO_CODESET || defined _LIBC
 # include <langinfo.h>
 #endif
-- 
1.9.0

Reply via email to