-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lots of little semantic bugs in our handling of numeric arguments.  POSIX
requires rejecting non-numeric characters; while Solaris treated divert(`
1a') as divert(1).  Previously, GNU m4 silently accepted divert(` 1') but
rejected divert(1a).  So as a compromise, I've made m4 warn anytime the
numeric arg wasn't a true number but was previously accepted, including
adding a check for overflow.

Our handling of radix 1 was haphazard - eval claimed to parse it (although
that was buggy) and would not generate it.  I fixed the parse bug, copied
Solaris' behavior on generating radix 1, and followed Solaris and POSIX in
that the width parameter to eval does not include the negative sign.

I noticed several places where our semantics are wrong, such as
ifdef(none,yes,no,oops) expanding to `' rather than `no'.  This was due to
not properly ignoring extra arguments.

There is also another POSIX incompatibility that I did not fix, but just
documented for now.  POSIX requires, and Solaris agreed, that
translit(abcd,a-d,e-h) should result in ebch, not efgh.  In other words,
the range operator of GNU m4 is an incompatible extension.  However, you
can also achieve range transliteration with patsubst.  Maybe what we
should do on the 1.4.x branch is mark the range operation of translit as
deprecated, issuing a warning and stating that in a future release it will
have POSIX semantics, but producing the same expansion; then fix translit
to obey POSIX on CVS head.  But I'd like some feedback before I attempt that.

2006-07-13  Eric Blake  <[EMAIL PROTECTED]>

        * src/builtin.c (numeric_arg): Treat empty string as 0, with a
        warning.  Detect quoted leading space and overflow as warnings.
        (m4_eval): Treat empty radix as 10, and allow output in radix 1.
        Treat width as minimum number of digits, as required by POSIX.
        (m4_ifdef, m4_divert, m4_m4exit, m4_translit): Ignore extra
        arguments.
        (m4_substr): Likewise.  Silently treat empty start as 0.
        (m4_undivert): Treat ` 1a' as file, not diversion 1.
        * src/eval.c (eval_lex): Parse radix 1 numbers.
        * doc/m4.texinfo (Invoking m4): Fix wording; there is more than
        one type of warning.
        (Manual): Document behavior of numeric parsing of empty string.
        (Divert, Incr): Document error handling.
        (Eval): Document radices better.
        (Incompatibilities): Document translit incompatibility.
        * NEWS: Document these changes.

- --
Life is short - so eat dessert first!

Eric Blake             [EMAIL PROTECTED]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEtkyg84KuGfSFAYARAj5qAKCtVXadlJbdrvabn6TN1nQUXLOWUwCeIGR1
H119qgWfCh0ft+ltmufAbJg=
=NnF2
-----END PGP SIGNATURE-----
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.34
diff -u -p -r1.1.1.1.2.34 NEWS
--- NEWS        12 Jul 2006 13:12:40 -0000      1.1.1.1.2.34
+++ NEWS        13 Jul 2006 13:37:23 -0000
@@ -36,6 +36,14 @@ Version 1.4.5 - ?? 2006, by ???  (CVS ve
   under FDL 1.2, rather than a stricter verbatim-only license.
 * Raise the -L (--nesting-limit) command line option limit from 250 to
   1024.
+* The decr, incr, divert, m4exit, and substr macros treat an empty number
+  as 0, issue a warning, and expand as normal; rather than issuing an error
+  and expanding to the empty string.
+* The eval macro now treats an empty radix argument as 10, handles radix 1,
+  and treats the width argument as number of digits excluding the sign,
+  for compatibility with other m4 implementations.
+* The ifdef, divert, m4exit, substr, and translit macros now correctly
+  ignore extra arguments.
 
 Version 1.4.4b - 17 June 2006, by Eric Blake  (CVS version 1.4.4a)
 
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.38
diff -u -p -r1.1.1.1.2.38 m4.texinfo
--- doc/m4.texinfo      12 Jul 2006 13:27:26 -0000      1.1.1.1.2.38
+++ doc/m4.texinfo      13 Jul 2006 13:37:24 -0000
@@ -418,7 +418,8 @@ also specified.
 @item -Q
 @itemx --quiet
 @itemx --silent
-Suppress warnings about missing or superfluous arguments in macro calls.
+Suppress warnings, such as missing or superfluous arguments in macro
+calls, or treating the empty string as zero.
 
 @item -W @var{REGEXP}
 @itemx [EMAIL PROTECTED]
@@ -553,7 +554,7 @@ Finally, there are several options for a
 scripts.
 
 @table @code
[EMAIL PROTECTED] [EMAIL PROTECTED]
[EMAIL PROTECTED] [EMAIL PROTECTED]
 @itemx [EMAIL PROTECTED]
 Set the debug-level according to the flags @var{FLAGS}.  The debug-level
 controls the format and amount of information presented by the debugging
@@ -644,6 +645,7 @@ Example of input line
 @error{}and an error message
 @end example
 
+The sequence @samp{^D} in an example indicates the end of the input file.
 The majority of these examples are self-contained, and you can run them
 with similar results by invoking @kbd{m4 -d}.  In fact, the testsuite
 that is bundled in the GNU M4 package consists of the examples in this
@@ -658,8 +660,15 @@ arguments, e.g.,
 regexp(@var{string}, @var{regexp}, opt @var{replacement})
 @end example
 
-All macro arguments in @code{m4} are strings, but some are given special
-interpretation, e.g., as numbers, file names, regular expressions, etc.
+All macro arguments in @code{m4} are strings, but some are given
+special interpretation, e.g., as numbers, file names, regular
+expressions, etc.  The documentation for each macro will state how the
+parameters are interpreted, and what happens if the argument cannot be
+parsed according to the desired interpretation.  Unless specified
+otherwise, a parameter specified to be a number is parsed as a decimal,
+even if the argument has leading zeros; and parsing the empty string as
+a number results in 0 rather than an error, although a warning will be
+issued.
 
 The @samp{opt} before the third argument shows that this argument is
 optional---if it is left out, it is taken to be the empty string.  An
@@ -1638,6 +1647,9 @@ define(`foo', `')
 @result{}
 ifdef(`foo', ``foo' is defined', ``foo' is not defined')
 @result{}foo is defined
+ifdef(`no_such_macro', `yes', `no', `extra argument')
[EMAIL PROTECTED]:4: m4: Warning: excess arguments to builtin `ifdef' ignored
[EMAIL PROTECTED]
 @end example
 
 The macro @code{ifdef} is recognized only with parameters.
@@ -2649,7 +2661,8 @@ divert(opt @var{number})
 
 @noindent
 where @var{number} is the diversion to be used.  If @var{number} is left
-out, it is assumed to be zero.
+out or empty, it is assumed to be zero.  If @var{number} cannot be
+parsed, the diversion is unchanged.
 
 The expansion of @code{divert} is void.
 
@@ -3007,7 +3020,8 @@ substr(@var{string}, @var{from}, opt @va
 which expands to the substring of @var{string}, which starts at index
 @var{from}, and extends for @var{length} characters, or to the end of
 @var{string}, if @var{length} is omitted.  The starting index of a string
-is always 0.
+is always 0.  The expansion is empty if there is an error parsing
[EMAIL PROTECTED] or @var{length}.
 
 @example
 substr(`gnus, gnats, and armadillos', `6')
@@ -3233,13 +3247,20 @@ decr(@var{number})
 
 @noindent
 which expand to the numerical value of @var{number}, incremented,
-or decremented, respectively, by one.
+or decremented, respectively, by one.  Except for the empty string, the
+expansion is empty if @var{number} could not be parsed.
 
 @example
 incr(`4')
 @result{}5
 decr(`7')
 @result{}6
+incr()
[EMAIL PROTECTED]:3: m4: empty string treated as 0 in builtin `incr'
[EMAIL PROTECTED]
+decr()
[EMAIL PROTECTED]:4: m4: empty string treated as 0 in builtin `decr'
[EMAIL PROTECTED]
 @end example
 
 The builtin macros @code{incr} and @code{decr} are recognized only when
@@ -3260,7 +3281,8 @@ eval(@var{expression}, opt @var{radix}, 
 @end example
 
 @noindent
-which expands to the value of @var{expression}.
+which expands to the value of @var{expression}.  The expansion is empty
+if an error is encountered while parsing the arguments.
 
 Expressions can contain the following operators, listed in order of
 decreasing precedence.
@@ -3305,12 +3327,15 @@ implementations.  This behavior is likel
 version to match @acronym{POSIX}, so use parentheses to force the
 desired precedence.
 
-Numbers without special prefix are given decimal.  A simple @samp{0}
+Within @var{expression}, (but not @var{radix} or @var{width}),
+numbers without a special prefix are decimal.  A simple @samp{0}
 prefix introduces an octal number.  @samp{0x} introduces a hexadecimal
 number.  @samp{0b} introduces a binary number.  @samp{0r} introduces a
 number expressed in any radix between 1 and 36: the prefix should be
 immediately followed by the decimal expression of the radix, a colon,
-then the digits making the number.  For any radix, the digits are
+then the digits making the number.  For radix 1, leading zeros are
+ignored and all remaining digits must be @samp{1}; for all other
+radices, the digits are
 @samp{0}, @samp{1}, @samp{2}, @dots{}.  Beyond @samp{9}, the digits are
 @samp{a}, @samp{b} @dots{} up to @samp{z}.  Lower and upper case letters
 can be used interchangeably in numbers prefixes and as number digits.
@@ -3326,6 +3351,8 @@ eval(`-3 * 5')
 @result{}-15
 eval(index(`Hello world', `llo') >= 0)
 @result{}1
+eval(`0r1:0111 + 0b100 + 0r3:12')
[EMAIL PROTECTED]
 define(`square', `eval(`('$1`)**2')')
 @result{}
 square(`9')
@@ -3335,7 +3362,7 @@ square(square(`5')`+1')
 define(`foo', `666')
 @result{}
 eval(`foo/6')
[EMAIL PROTECTED]:7: m4: bad expression in eval: foo/6
[EMAIL PROTECTED]:8: m4: bad expression in eval: foo/6
 @result{}
 eval(foo/6)
 @result{}111
@@ -3365,10 +3392,15 @@ eval(-4 >> 33)
 @end example
 
 If @var{radix} is specified, it specifies the radix to be used in the
-expansion.  The default radix is 10.  The result of @code{eval} is
-always taken to be signed.  The @var{width} argument specifies a minimum
-output width.  The result is zero-padded to extend the expansion to the
-requested width.
+expansion.  The default radix is 10; this is also the case if
[EMAIL PROTECTED] is the empty string.  It is an error if the radix is outside
+the range of 1 through 36, inclusive.  The result of @code{eval} is
+always taken to be signed.  No radix prefix is output, and for radices
+greater than 10, the digits are lower case.  The @var{width} argument
+specifies the minimum output width, excluding any negative sign.  The
+result is zero-padded to extend the expansion to the requested width.
+It is an error if the width is negative.  On error, the expansion of
[EMAIL PROTECTED] is empty.
 
 @example
 eval(`666', `10')
@@ -3380,11 +3412,15 @@ eval(`666', `6')
 eval(`666', `6', `10')
 @result{}0000003030
 eval(`-666', `6', `10')
[EMAIL PROTECTED]
[EMAIL PROTECTED]
+eval(`10', `', `0')
[EMAIL PROTECTED]
+`0r1:'eval(`10', `1', `11')
[EMAIL PROTECTED]:01111111111
+eval(`10', `16')
[EMAIL PROTECTED]
 @end example
 
-Take note that @var{radix} cannot be larger than 36.
-
 The builtin macro @code{eval} is recognized only when given arguments.
 
 @node Shell commands
@@ -4209,6 +4245,11 @@ to ensure proper precedence.  As extensi
 @code{m4} treats the shift operators @samp{<<} and @samp{>>} as
 well-defined on signed integers (even though they are not in C), and
 adds the exponentiation operator @samp{**}.
+
[EMAIL PROTECTED]
[EMAIL PROTECTED] requires @code{translit} (@pxref{Translit}) to treat
+each character of the second and third arguments literally, but GNU
[EMAIL PROTECTED] treats @samp{-} as a range operator.
 @end itemize
 
 @node Other Incompatibilities
Index: src/builtin.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/builtin.c,v
retrieving revision 1.1.1.1.2.20
diff -u -p -r1.1.1.1.2.20 builtin.c
--- src/builtin.c       11 Jul 2006 12:17:11 -0000      1.1.1.1.2.20
+++ src/builtin.c       13 Jul 2006 13:37:24 -0000
@@ -250,16 +250,16 @@ builtin_init (void)
   for (bp = &builtin_tab[0]; bp->name != NULL; bp++)
     if (!no_gnu_extensions || !bp->gnu_extension)
       {
-        if (prefix_all_builtins)
-          {
-            string = (char *) xmalloc (strlen (bp->name) + 4);
-            strcpy (string, "m4_");
-            strcat (string, bp->name);
-            define_builtin (string, bp, SYMBOL_INSERT);
-            free (string);
-          }
-        else
-          define_builtin (bp->name, bp, SYMBOL_INSERT);
+       if (prefix_all_builtins)
+         {
+           string = (char *) xmalloc (strlen (bp->name) + 4);
+           strcpy (string, "m4_");
+           strcat (string, bp->name);
+           define_builtin (string, bp, SYMBOL_INSERT);
+           free (string);
+         }
+       else
+         define_builtin (bp->name, bp, SYMBOL_INSERT);
       }
 
   for (pp = &predefined_tab[0]; pp->func != NULL; pp++)
@@ -315,12 +315,32 @@ numeric_arg (token_data *macro, const ch
 {
   char *endp;
 
-  if (*arg == 0 || (*valuep = strtol (arg, &endp, 10), *endp != 0))
+  if (*arg == '\0')
     {
+      *valuep = 0;
       M4ERROR ((warning_status, 0,
-               "non-numeric argument to builtin `%s'",
+               "empty string treated as 0 in builtin `%s'",
                TOKEN_DATA_TEXT (macro)));
-      return FALSE;
+    }
+  else
+    {
+      errno = 0;
+      *valuep = strtol (arg, &endp, 10);
+      if (*endp != '\0')
+       {
+         M4ERROR ((warning_status, 0,
+                   "non-numeric argument to builtin `%s'",
+                   TOKEN_DATA_TEXT (macro)));
+         return FALSE;
+       }
+      if (isspace (to_uchar (*arg)))
+       M4ERROR ((warning_status, 0,
+                 "leading whitespace ignored in builtin `%s'",
+                 TOKEN_DATA_TEXT (macro)));
+      else if (errno == ERANGE)
+       M4ERROR ((warning_status, 0,
+                 "numeric overflow detected in builtin `%s'",
+                 TOKEN_DATA_TEXT (macro)));
     }
   return TRUE;
 }
@@ -508,7 +528,7 @@ m4_ifdef (struct obstack *obs, int argc,
 
   if (s != NULL && SYMBOL_TYPE (s) != TOKEN_VOID)
     result = ARG (2);
-  else if (argc == 4)
+  else if (argc >= 4)
     result = ARG (3);
   else
     result = NULL;
@@ -750,10 +770,10 @@ m4_defn (struct obstack *obs, int argc, 
     case TOKEN_FUNC:
       b = SYMBOL_FUNC (s);
       if (b == m4_placeholder)
-        M4ERROR ((warning_status, 0, "\
+       M4ERROR ((warning_status, 0, "\
 builtin `%s' requested by frozen file is not supported", ARG (1)));
       else
-        push_macro (b);
+       push_macro (b);
       break;
 
     case TOKEN_VOID:
@@ -873,7 +893,7 @@ m4_sysval (struct obstack *obs, int argc
 static void
 m4_eval (struct obstack *obs, int argc, token_data **argv)
 {
-  eval_t value;
+  eval_t value = 0;
   int radix = 10;
   int min = 1;
   const char *s;
@@ -881,35 +901,53 @@ m4_eval (struct obstack *obs, int argc, 
   if (bad_argc (argv[0], argc, 2, 4))
     return;
 
-  if (argc >= 3 && !numeric_arg (argv[0], ARG (2), &radix))
+  if (*ARG (2) && !numeric_arg (argv[0], ARG (2), &radix))
     return;
 
-  if (radix <= 1 || radix > (int) strlen (digits))
+  if (radix < 1 || radix > (int) strlen (digits))
     {
       M4ERROR ((warning_status, 0,
                "radix in builtin `%s' out of range (radix = %d)",
-                ARG (0), radix));
+               ARG (0), radix));
       return;
     }
 
   if (argc >= 4 && !numeric_arg (argv[0], ARG (3), &min))
     return;
-  if  (min <= 0)
+  if (min < 0)
     {
       M4ERROR ((warning_status, 0,
                "negative width to builtin `%s'", ARG (0)));
       return;
     }
 
-  if (evaluate (ARG (1), &value))
+  if (!*ARG (1))
+    M4ERROR ((warning_status, 0,
+             "empty string treated as 0 in builtin `%s'", ARG (0)));
+  else if (evaluate (ARG (1), &value))
     return;
 
+  if (radix == 1)
+    {
+      if (value < 0)
+       {
+         obstack_1grow (obs, '-');
+         value = -value;
+       }
+      /* This assumes 2's-complement for correctly handling INT_MIN.  */
+      while (min-- - value > 0)
+       obstack_1grow (obs, '0');
+      while (value-- != 0)
+       obstack_1grow (obs, '1');
+      obstack_1grow (obs, '\0');
+      return;
+    }
+
   s = ntoa (value, radix);
 
   if (*s == '-')
     {
       obstack_1grow (obs, '-');
-      min--;
       s++;
     }
   for (min -= strlen (s); --min >= 0;)
@@ -962,7 +1000,7 @@ m4_divert (struct obstack *obs, int argc
   if (bad_argc (argv[0], argc, 1, 2))
     return;
 
-  if (argc == 2 && !numeric_arg (argv[0], ARG (1), &i))
+  if (argc >= 2 && !numeric_arg (argv[0], ARG (1), &i))
     return;
 
   make_diversion (i);
@@ -992,16 +1030,16 @@ m4_undivert (struct obstack *obs, int ar
 {
   int i, file;
   FILE *fp;
+  char *endp;
 
   if (argc == 1)
     undivert_all ();
   else
     for (i = 1; i < argc; i++)
       {
-       if (sscanf (ARG (i), "%d", &file) == 1)
+       file = strtol (ARG (i), &endp, 10);
+       if (*endp == '\0' && !isspace (to_uchar (*ARG (i))))
          insert_diversion (file);
-       else if (!*ARG (i))
-         /* Ignore empty string.  */;
        else if (no_gnu_extensions)
          M4ERROR ((warning_status, 0,
                    "non-numeric argument to builtin `%s'", ARG (0)));
@@ -1165,7 +1203,7 @@ m4_maketemp (struct obstack *obs, int ar
   if ((fd = mkstemp (ARG (1))) < 0)
     {
       M4ERROR ((warning_status, errno, "cannot create tempfile `%s'",
-                ARG (1)));
+               ARG (1)));
       return;
     }
   close(fd);
@@ -1219,7 +1257,7 @@ m4_m4exit (struct obstack *obs, int argc
 
   if (bad_argc (argv[0], argc, 1, 2))
     return;
-  if (argc == 2  && !numeric_arg (argv[0], ARG (1), &exit_code))
+  if (argc >= 2  && !numeric_arg (argv[0], ARG (1), &exit_code))
     exit_code = 0;
 
   exit (exit_code);
@@ -1424,7 +1462,8 @@ m4_index (struct obstack *obs, int argc,
 static void
 m4_substr (struct obstack *obs, int argc, token_data **argv)
 {
-  int start, length, avail;
+  int start = 0;
+  int length, avail;
 
   if (bad_argc (argv[0], argc, 3, 4))
     return;
@@ -1433,7 +1472,7 @@ m4_substr (struct obstack *obs, int argc
   if (!numeric_arg (argv[0], ARG (2), &start))
     return;
 
-  if (argc == 4 && !numeric_arg (argv[0], ARG (3), &length))
+  if (argc >= 4 && !numeric_arg (argv[0], ARG (3), &length))
     return;
 
   if (start < 0 || length <= 0 || start >= avail)
@@ -1467,9 +1506,9 @@ expand_ranges (const char *s, struct obs
          to = *++s;
          if (to == '\0')
            {
-              /* trailing dash */
-              obstack_1grow (obs, '-');
-              break;
+             /* trailing dash */
+             obstack_1grow (obs, '-');
+             break;
            }
          else if (from <= to)
            {
@@ -1515,7 +1554,7 @@ m4_translit (struct obstack *obs, int ar
        return;
     }
 
-  if (argc == 4)
+  if (argc >= 4)
     {
       to = ARG (3);
       if (strchr (to, '-') != NULL)
Index: src/eval.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/eval.c,v
retrieving revision 1.1.1.1.2.4
diff -u -p -r1.1.1.1.2.4 eval.c
--- src/eval.c  27 Jun 2006 13:31:44 -0000      1.1.1.1.2.4
+++ src/eval.c  13 Jul 2006 13:37:24 -0000
@@ -161,10 +161,19 @@ eval_lex (eval_t *val)
          else
            break;
 
-         if (digit >= base)
+         if (base == 1)
+           {
+             if (digit == 1)
+               (*val)++;
+             else if (digit == 0 && !*val)
+               continue;
+             else
+               break;
+           }
+         else if (digit >= base)
            break;
-
-         (*val) = (*val) * base + digit;
+         else
+           (*val) = (*val) * base + digit;
        }
       return NUMBER;
     }
_______________________________________________
M4-patches mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/m4-patches

Reply via email to