On 10/5/19 7:41 AM, Akim Demaille wrote:

Why have they chosen this?  I guess the point is speed.

Yes, typically these machines are not byte-addressable and it's faster to make every integer the same size. This is sort of the "BCPL variant" of C, which the C standard allows.

Maybe you want to extend the notes you added to NEWS.  Including
the bits about limits.h and stdint.h.  Maybe that should also make
its way into the doc, yet I wouldn't know where to write that.

OK, I did that in the revised patch (first attachment).

+typedef int yytype_uint8;
  #endif

Wow!  Why do we fallback to int? Is this part where unsigned int == unsigned 
short == unsigned char on a little number of architecture?

Yes, it's for those odd platforms. Though I now see that the above was too extreme: it should have been 'short', not 'int'. Fixed in the first attached patch.
The medicine seems worse than the disease to me.

Well, let's use stronger medicine then :-). I did it in a different way in the first attached patch, so that yytype_uint8 should be 'unsigned char' except for the odd but valid platforms where unsigned char and/or unsigned short do not promote to int. On compilers not compatible with GCC, the revised patch includes <limits.h> and (if available) <stdint.h> which infringes on the user namespace, but yacc.c already infringes elsewhere for other reasons so that should be OK.

Eventually, all this should move up into c.m4, and be applied to
glr.c too.

Done in the second attached patch. (I haven't installed either patch yet.)

And see what to do about C++ parsers.

I was hoping our C++ expert could look into that....
>From 7f222b11bbb3c57220b47cb0a59072e90cca2ae5 Mon Sep 17 00:00:00 2001
From: Paul Eggert <[email protected]>
Date: Sat, 5 Oct 2019 13:06:40 -0700
Subject: [PATCH 1/2] =?UTF-8?q?Use=20=E2=80=9Cleast=E2=80=9D=20types=20for?=
 =?UTF-8?q?=20integers=20in=20Yacc=20tables?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This changes the Yacc skeleton to use “least” integer types to
keep tables smaller on some platforms, which should lessen cache
pressure.  Since Bison uses the Yacc skeleton, it follows suit.
* data/skeletons/yacc.c: Include limits.h and stdint.h if this
seems to be needed.
(yytype_uint8, yytype_int8, yytype_uint16, yytype_int16):
If available, use GCC predefined macros __INT_MAX__ etc. to select
a “least” type, as this avoids namespace hassles.  Otherwise, if
available fall back on selecting a “least” type via the C99 macros
INT_MAX, INT_LEAST8_MAX, etc.  Otherwise, fall further back on one of
the builtin C99 types signed char, short, and int.  Make sure that
any selected type promotes to int.  Ignore any macros YYTYPE_INT16,
YYTYPE_INT8, YYTYPE_UINT16, YYTYPE_UINT8 defined by the user.
(ptrdiff_t, PTRDIFF_MAX): Simplify in the light of the above.
(yytype_uint8, yytype_uint16): Do not assume that unsigned char
and unsigned short promote to int, as this isn’t true on some
platforms (e.g., TI TMS320C55x).
* src/parse-gram.y (YYTYPE_INT16, YYTYPE_INT8, YYTYPE_UINT16)
(YYTYPE_UINT8): Remove, as these are no longer effective.
---
 NEWS                  |  6 ++++-
 data/skeletons/yacc.c | 61 ++++++++++++++++++++++++++++++-------------
 doc/bison.texi        | 10 +++----
 src/parse-gram.y      |  5 ----
 4 files changed, 53 insertions(+), 29 deletions(-)

diff --git a/NEWS b/NEWS
index a64e3492..91d95775 100644
--- a/NEWS
+++ b/NEWS
@@ -48,7 +48,11 @@ GNU Bison NEWS
 
   Bison templates now prefer signed to unsigned integer types when
   either will do, as the signed types are less error-prone and allow
-  for better checking with 'gcc -fsanitize=undefined'.
+  for better checking with 'gcc -fsanitize=undefined'.  Also, the
+  types chosen are now portable to unusual machines where char, short and
+  int are all the same width.  On non-GNU platforms this may entail
+  including <limits.h> and (if available) <stdint.h> to define integer types
+  and constants.
 
 * Noteworthy changes in release 3.4.2 (2019-09-12) [stable]
 
diff --git a/data/skeletons/yacc.c b/data/skeletons/yacc.c
index fdf4ba6d..a0f7ffff 100644
--- a/data/skeletons/yacc.c
+++ b/data/skeletons/yacc.c
@@ -114,7 +114,7 @@ m4_ifset([b4_parse_param], [b4_args(b4_parse_param), ])])
 # ---------------------
 # Return a narrow int type able to handle numbers ranging from
 # MIN to MAX (included).  Overwrite the version from c.m4,
-# so that the user can override the shorter types.
+# so that the code can use C99 types if available.
 m4_define([b4_int_type],
 [m4_if(b4_ints_in($@,   [-127],   [127]), [1], [yytype_int8],
        b4_ints_in($@,      [0],   [255]), [1], [yytype_uint8],
@@ -388,26 +388,54 @@ m4_if(b4_api_prefix, [yy], [],
 # undef short
 #endif
 
-#ifdef YYTYPE_UINT8
-typedef YYTYPE_UINT8 yytype_uint8;
-#else
+/* On compilers that do not define __PTRDIFF_MAX__ etc., include
+   <limits.h> and (if available) <stdint.h> so that the code can
+   choose integer types of a good width.  */
+
+#ifndef __PTRDIFF_MAX__
+# include <limits.h> /* INFRINGES ON USER NAME SPACE */
+# if defined __STDC_VERSION__ && 199901 <= __STDC_VERSION__
+#  include <stdint.h> /* INFRINGES ON USER NAME SPACE */
+# endif
+#endif
+
+/* Narrow types that promote to a signed type and that can represent a
+   signed or unsigned integer of at least N bits.  In tables they can
+   save space and decrease cache pressure.  Promoting to a signed type
+   helps avoid bugs in integer arithmetic.  */
+
+#if defined __UINT_LEAST8_MAX__ && __UINT_LEAST8_MAX__ <= __INT_MAX__
+typedef __UINT_LEAST8_TYPE__ yytype_uint8;
+#elif defined UINT_LEAST8_MAX && UINT_LEAST8_MAX <= INT_MAX
+typedef uint_least8_t yytype_uint8;
+#elif UCHAR_MAX <= INT_MAX
 typedef unsigned char yytype_uint8;
+#else
+typedef short yytype_uint8;
 #endif
 
-#ifdef YYTYPE_INT8
-typedef YYTYPE_INT8 yytype_int8;
+#if defined __INT_LEAST8_MAX__ && __INT_LEAST8_MAX__ <= __INT_MAX__
+typedef __INT_LEAST8_TYPE__ yytype_int8;
+#elif defined INT_LEAST8_MAX && INT_LEAST8_MAX <= INT_MAX
+typedef int_least8_t yytype_int8;
 #else
 typedef signed char yytype_int8;
 #endif
 
-#ifdef YYTYPE_UINT16
-typedef YYTYPE_UINT16 yytype_uint16;
-#else
+#if defined __UINT_LEAST16_MAX__ && __UINT_LEAST16_MAX__ <= __INT_MAX__
+typedef __UINT_LEAST16_TYPE__ yytype_uint16;
+#elif defined UINT_LEAST16_MAX && UINT_LEAST16_MAX <= INT_MAX
+typedef uint_least16_t yytype_uint16;
+#elif USHRT_MAX <= INT_MAX
 typedef unsigned short yytype_uint16;
+#else
+typedef int yytype_uint16;
 #endif
 
-#ifdef YYTYPE_INT16
-typedef YYTYPE_INT16 yytype_int16;
+#if defined __INT_LEAST16_MAX__ && __INT_LEAST16_MAX__ <= __INT_MAX__
+typedef __INT_LEAST16_TYPE__ yytype_int16;
+#elif defined INT_LEAST16_MAX && INT_LEAST16_MAX <= INT_MAX
+typedef int_least16_t yytype_int16;
 #else
 typedef short yytype_int16;
 #endif
@@ -416,17 +444,14 @@ typedef short yytype_int16;
 # if defined __PTRDIFF_TYPE__ && defined __PTRDIFF_MAX__
 #  define YYPTRDIFF_T __PTRDIFF_TYPE__
 #  define YYPTRDIFF_MAXIMUM __PTRDIFF_MAX__
-# elif defined ptrdiff_t && defined PTRDIFF_MAX
-#  define YYPTRDIFF_T ptrdiff_t
-#  define YYPTRDIFF_MAXIMUM PTRDIFF_MAX
-# elif defined __STDC_VERSION__ && 199901 <= __STDC_VERSION__
-#  include <stddef.h> /* INFRINGES ON USER NAME SPACE */
+# elif defined PTRDIFF_MAX
+#  ifndef ptrdiff_t
+#   include <stddef.h> /* INFRINGES ON USER NAME SPACE */
+#  endif
 #  define YYPTRDIFF_T ptrdiff_t
-#  include <stdint.h> /* INFRINGES ON USER NAME SPACE */
 #  define YYPTRDIFF_MAXIMUM PTRDIFF_MAX
 # else
 #  define YYPTRDIFF_T int
-#  include <limits.h> /* INFRINGES ON USER NAME SPACE */
 #  define YYPTRDIFF_MAXIMUM INT_MAX
 # endif
 #endif
diff --git a/doc/bison.texi b/doc/bison.texi
index 6f4ca9e2..c78bd31e 100644
--- a/doc/bison.texi
+++ b/doc/bison.texi
@@ -1447,13 +1447,13 @@ anything other than their usual meanings.
 
 In some cases the Bison parser implementation file includes system
 headers, and in those cases your code should respect the identifiers
-reserved by those headers.  On some non-GNU hosts, @code{<alloca.h>},
-@code{<malloc.h>}, @code{<stddef.h>}, and @code{<stdlib.h>} are
-included as needed to declare memory allocators and related types.
+reserved by those headers.  On some non-GNU hosts, @code{<limits.h>},
+@code{<stddef.h>}, @code{<stdint.h>} (if available), and @code{<stdlib.h>}
+are included to declare memory allocators and integer types and constants.
 @code{<libintl.h>} is included if message translation is in use
 (@pxref{Internationalization}).  Other system headers may be included
-if you define @code{YYDEBUG} to a nonzero value (@pxref{Tracing,
-,Tracing Your Parser}).
+if you define @code{YYDEBUG} (@pxref{Tracing, ,Tracing Your Parser}) or
+@code{YYSTACK_USE_ALLOCA} (@pxref{Table of Symbols}) to a nonzero value.
 
 @node Stages
 @section Stages in Using Bison
diff --git a/src/parse-gram.y b/src/parse-gram.y
index b4132841..043fa581 100644
--- a/src/parse-gram.y
+++ b/src/parse-gram.y
@@ -112,11 +112,6 @@
   /* A string that describes a char (e.g., 'a' -> "'a'").  */
   static char const *char_name (char);
 
-  #define YYTYPE_INT16 int_fast16_t
-  #define YYTYPE_INT8 int_fast8_t
-  #define YYTYPE_UINT16 uint_fast16_t
-  #define YYTYPE_UINT8 uint_fast8_t
-
   /* Add style to semantic values in traces.  */
   static void tron (FILE *yyo);
   static void troff (FILE *yyo);
-- 
2.21.0

>From 9199bcd4edbe4127a08df58d8e24394fd039e10a Mon Sep 17 00:00:00 2001
From: Paul Eggert <[email protected]>
Date: Sat, 5 Oct 2019 18:56:52 -0700
Subject: [PATCH 2/2] Move the integer-type selection into c.m4
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

That way, glr.c can use it too.
* data/skeletons/c.m4 (b4_int_type):
Do not special-case ‘char’; it’s not worth the trouble,
as clang complains about char subscripts.
(b4_c99_int_type, b4_c99_int_type_define): New macros,
taken from yacc.c.
* data/skeletons/glr.c: Use b4_int_type_define.
* data/skeletons/yacc.c (b4_int_type): Remove, since there’s
no longer any need to redefine it.
Use b4_c99_int_type_define rather than its body.
---
 data/skeletons/c.m4   | 77 +++++++++++++++++++++++++++++++++++++++++--
 data/skeletons/glr.c  |  2 ++
 data/skeletons/yacc.c | 71 +--------------------------------------
 3 files changed, 78 insertions(+), 72 deletions(-)

diff --git a/data/skeletons/c.m4 b/data/skeletons/c.m4
index 25b96a8f..851bc652 100644
--- a/data/skeletons/c.m4
+++ b/data/skeletons/c.m4
@@ -171,8 +171,7 @@ b4_parse_param_for([Decl], [Formal], [  YYUSE (Formal);
 # to MAX (included) in portable C code.  Assume MIN and MAX fall in
 # 'int' range.
 m4_define([b4_int_type],
-[m4_if(b4_ints_in($@,      [0],   [127]), [1], [char],
-       b4_ints_in($@,   [-127],   [127]), [1], [signed char],
+[m4_if(b4_ints_in($@,   [-127],   [127]), [1], [signed char],
        b4_ints_in($@,      [0],   [255]), [1], [unsigned char],
 
        b4_ints_in($@, [-32767], [32767]), [1], [short],
@@ -180,6 +179,80 @@ m4_define([b4_int_type],
 
                                                [int])])
 
+# b4_c99_int_type(MIN, MAX)
+# -------------------------
+# Like b4_int_type, but for C99.
+# b4_c99_int_type_define replaces b4_int_type with this.
+m4_define([b4_c99_int_type],
+[m4_if(b4_ints_in($@,   [-127],   [127]), [1], [yytype_int8],
+       b4_ints_in($@,      [0],   [255]), [1], [yytype_uint8],
+
+       b4_ints_in($@, [-32767], [32767]), [1], [yytype_int16],
+       b4_ints_in($@,      [0], [65535]), [1], [yytype_uint16],
+
+                                               [int])])
+
+# b4_c99_int_type_define
+# ----------------------
+# Define private types suitable for holding small integers in C99 or later.
+m4_define([b4_c99_int_type_define],
+[m4_copy_force([b4_c99_int_type], [b4_int_type])dnl
+[/* On compilers that do not define __PTRDIFF_MAX__ etc., make sure
+   <limits.h> and (if available) <stdint.h> are included
+   so that the code can choose integer types of a good width.  */
+
+#ifndef __PTRDIFF_MAX__
+# ifndef INT_MAX
+#  include <limits.h> /* INFRINGES ON USER NAME SPACE */
+# endif
+# ifndef PTRDIFF_MAX
+#  if defined __STDC_VERSION__ && 199901 <= __STDC_VERSION__
+#   include <stdint.h> /* INFRINGES ON USER NAME SPACE */
+#  endif
+# endif
+#endif
+
+/* Narrow types that promote to a signed type and that can represent a
+   signed or unsigned integer of at least N bits.  In tables they can
+   save space and decrease cache pressure.  Promoting to a signed type
+   helps avoid bugs in integer arithmetic.  */
+
+#if defined __UINT_LEAST8_MAX__ && __UINT_LEAST8_MAX__ <= __INT_MAX__
+typedef __UINT_LEAST8_TYPE__ yytype_uint8;
+#elif defined UINT_LEAST8_MAX && UINT_LEAST8_MAX <= INT_MAX
+typedef uint_least8_t yytype_uint8;
+#elif UCHAR_MAX <= INT_MAX
+typedef unsigned char yytype_uint8;
+#else
+typedef short yytype_uint8;
+#endif
+
+#if defined __INT_LEAST8_MAX__ && __INT_LEAST8_MAX__ <= __INT_MAX__
+typedef __INT_LEAST8_TYPE__ yytype_int8;
+#elif defined INT_LEAST8_MAX && INT_LEAST8_MAX <= INT_MAX
+typedef int_least8_t yytype_int8;
+#else
+typedef signed char yytype_int8;
+#endif
+
+#if defined __UINT_LEAST16_MAX__ && __UINT_LEAST16_MAX__ <= __INT_MAX__
+typedef __UINT_LEAST16_TYPE__ yytype_uint16;
+#elif defined UINT_LEAST16_MAX && UINT_LEAST16_MAX <= INT_MAX
+typedef uint_least16_t yytype_uint16;
+#elif USHRT_MAX <= INT_MAX
+typedef unsigned short yytype_uint16;
+#else
+typedef int yytype_uint16;
+#endif
+
+#if defined __INT_LEAST16_MAX__ && __INT_LEAST16_MAX__ <= __INT_MAX__
+typedef __INT_LEAST16_TYPE__ yytype_int16;
+#elif defined INT_LEAST16_MAX && INT_LEAST16_MAX <= INT_MAX
+typedef int_least16_t yytype_int16;
+#else
+typedef short yytype_int16;
+#endif]])
+
 
 # b4_int_type_for(NAME)
 # ---------------------
diff --git a/data/skeletons/glr.c b/data/skeletons/glr.c
index c3c1a9a4..af7a78c3 100644
--- a/data/skeletons/glr.c
+++ b/data/skeletons/glr.c
@@ -265,6 +265,8 @@ static YYLTYPE yyloc_default][]b4_yyloc_default;])[
 #include <stdlib.h>
 #include <string.h>
 
+]b4_c99_int_type_define[
+
 #ifndef YY_
 # if defined YYENABLE_NLS && YYENABLE_NLS
 #  if ENABLE_NLS
diff --git a/data/skeletons/yacc.c b/data/skeletons/yacc.c
index a0f7ffff..b1060511 100644
--- a/data/skeletons/yacc.c
+++ b/data/skeletons/yacc.c
@@ -106,25 +106,6 @@ m4_ifset([b4_parse_param], [b4_args(b4_parse_param), ])])
 
 
 
-## ------------ ##
-## Data Types.  ##
-## ------------ ##
-
-# b4_int_type(MIN, MAX)
-# ---------------------
-# Return a narrow int type able to handle numbers ranging from
-# MIN to MAX (included).  Overwrite the version from c.m4,
-# so that the code can use C99 types if available.
-m4_define([b4_int_type],
-[m4_if(b4_ints_in($@,   [-127],   [127]), [1], [yytype_int8],
-       b4_ints_in($@,      [0],   [255]), [1], [yytype_uint8],
-
-       b4_ints_in($@, [-32767], [32767]), [1], [yytype_int16],
-       b4_ints_in($@,      [0], [65535]), [1], [yytype_uint16],
-
-                                               [int])])
-
-
 ## ----------------- ##
 ## Semantic Values.  ##
 ## ----------------- ##
@@ -388,57 +369,7 @@ m4_if(b4_api_prefix, [yy], [],
 # undef short
 #endif
 
-/* On compilers that do not define __PTRDIFF_MAX__ etc., include
-   <limits.h> and (if available) <stdint.h> so that the code can
-   choose integer types of a good width.  */
-
-#ifndef __PTRDIFF_MAX__
-# include <limits.h> /* INFRINGES ON USER NAME SPACE */
-# if defined __STDC_VERSION__ && 199901 <= __STDC_VERSION__
-#  include <stdint.h> /* INFRINGES ON USER NAME SPACE */
-# endif
-#endif
-
-/* Narrow types that promote to a signed type and that can represent a
-   signed or unsigned integer of at least N bits.  In tables they can
-   save space and decrease cache pressure.  Promoting to a signed type
-   helps avoid bugs in integer arithmetic.  */
-
-#if defined __UINT_LEAST8_MAX__ && __UINT_LEAST8_MAX__ <= __INT_MAX__
-typedef __UINT_LEAST8_TYPE__ yytype_uint8;
-#elif defined UINT_LEAST8_MAX && UINT_LEAST8_MAX <= INT_MAX
-typedef uint_least8_t yytype_uint8;
-#elif UCHAR_MAX <= INT_MAX
-typedef unsigned char yytype_uint8;
-#else
-typedef short yytype_uint8;
-#endif
-
-#if defined __INT_LEAST8_MAX__ && __INT_LEAST8_MAX__ <= __INT_MAX__
-typedef __INT_LEAST8_TYPE__ yytype_int8;
-#elif defined INT_LEAST8_MAX && INT_LEAST8_MAX <= INT_MAX
-typedef int_least8_t yytype_int8;
-#else
-typedef signed char yytype_int8;
-#endif
-
-#if defined __UINT_LEAST16_MAX__ && __UINT_LEAST16_MAX__ <= __INT_MAX__
-typedef __UINT_LEAST16_TYPE__ yytype_uint16;
-#elif defined UINT_LEAST16_MAX && UINT_LEAST16_MAX <= INT_MAX
-typedef uint_least16_t yytype_uint16;
-#elif USHRT_MAX <= INT_MAX
-typedef unsigned short yytype_uint16;
-#else
-typedef int yytype_uint16;
-#endif
-
-#if defined __INT_LEAST16_MAX__ && __INT_LEAST16_MAX__ <= __INT_MAX__
-typedef __INT_LEAST16_TYPE__ yytype_int16;
-#elif defined INT_LEAST16_MAX && INT_LEAST16_MAX <= INT_MAX
-typedef int_least16_t yytype_int16;
-#else
-typedef short yytype_int16;
-#endif
+]b4_c99_int_type_define[
 
 #ifndef YYPTRDIFF_T
 # if defined __PTRDIFF_TYPE__ && defined __PTRDIFF_MAX__
-- 
2.21.0

Reply via email to