token recognition order

Eric Blake Tue, 17 Feb 2009 06:21:52 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The 1.4.x manual states that GNU m4 recognizes comments differently than
other m4 implementations, and that things would change in the future.  The
master branch has already made the change; I'm now porting it to
branch-1.6 as well:


- --
Don't work too hard, make some time for fun as well!

Eric Blake             [email protected]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEUEARECAAYFAkmaxcgACgkQ84KuGfSFAYCVRACdGmGc8k4gvZQHkOnSwQ6hJff1
iEAAmLuK3ZFndwrP+FHHK1tWJzhnh1k=
=khVV
-----END PGP SIGNATURE-----

>From 16e712b9dbcfcc49a54dd7c010ca1cab075fd79a Mon Sep 17 00:00:00 2001
From: Eric Blake <[email protected]>
Date: Tue, 17 Feb 2009 07:08:55 -0700
Subject: [PATCH] Reorder token recognition to match other implementations.

* src/input.c (next_token): Recognize comments after quotes, but
before macro arguments.
* doc/m4.texinfo (Changecom): Document this.
* NEWS: Likewise.

Signed-off-by: Eric Blake <[email protected]>
---
 ChangeLog      |    8 +++
 NEWS           |    4 ++
 doc/m4.texinfo |   38 +++++++++++---
 src/input.c    |  158 ++++++++++++++++++++++++++++----------------------------
 4 files changed, 121 insertions(+), 87 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 88a3723..85f2c5b 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2009-02-17  Eric Blake  <[email protected]>
+
+       Reorder token recognition to match other implementations.
+       * src/input.c (next_token): Recognize comments after quotes, but
+       before macro arguments.
+       * doc/m4.texinfo (Changecom): Document this.
+       * NEWS: Likewise.
+
 2009-02-16  Eric Blake  <[email protected]>

        Stage 29: Process input by buffer, not bytes.
diff --git a/NEWS b/NEWS
index 69c0bb8..d4839f5 100644
--- a/NEWS
+++ b/NEWS
@@ -50,6 +50,10 @@ Software Foundation, Inc.
    then apply this patch:
      http://git.sv.gnu.org/gitweb/?p=autoconf.git;a=commitdiff;h=56d42fa71

+** The `changecom' builtin semantics now match traditional
+   implementations; if the start-comment string resembles a macro name or
+   the start-quote string, comments are effectively disabled.
+
 ** The `divert' builtin now accepts an optional second argument of text
    that is immediately placed in the new diversion, regardless of whether
    the current expansion is nested within argument collection of another
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 10fa4d2..3da0443 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -4932,13 +4932,15 @@ Changecom
 of any length.  Other implementations cap the delimiter length to five
 characters, but @acronym{GNU} has no inherent limit.

-Comments are recognized in preference to macros.  However, this is not
-compatible with other implementations, where macros and even quoting
-takes precedence over comments, so it may change in a future release.
-For portability, this means that @var{start} should not begin with a
-letter, digit, or @samp{_} (underscore), and that neither the
-start-quote nor the start-comment string should be a prefix of the
-other.
+As of M4 1.6, macros and quotes are recognized in preference to
+comments, so if a prefix of @var{start} can be recognized as part of a
+potential macro name, or confused with a quoted string, the comment
+mechanism is effectively disabled (earlier versions of @acronym{GNU} M4
+favored comments, but this was inconsistent with other implementations).
+Unless you use @code{changeword} (@pxref{Changeword}), this means
+that @var{start} should not begin with a letter, digit, or @samp{_}
+(underscore), and that neither the start-quote nor the start-comment
+string should be a prefix of the other.

 @example
 define(`hi', `HI')
@@ -4948,13 +4950,33 @@ Changecom
 changecom(`q', `Q')
 @result{}
 q hi Q hi
-...@result{}q hi Q HI
+...@result{}q HI Q HI
 changecom(`1', `2')
 @result{}
 hi1hi2
 @result{}hello
 hi 1hi2
 @result{}HI 1hi2
+changecom(`[[', `]]')
+...@result{}
+changequote(`[[[', `]]]')
+...@result{}
+[hi]
+...@result{}[hi]
+[[hi]]
+...@result{}[[hi]]
+[[[hi]]]
+...@result{}hi
+changequote
+...@result{}
+changecom(`[[[', `]]]')
+...@result{}
+changequote(`[[', `]]')
+...@result{}
+[[hi]]
+...@result{}hi
+[[[hi]]]
+...@result{}[hi]
 @end example

 Comments are recognized in preference to argument collection.  In
diff --git a/src/input.c b/src/input.c
index 2acbd70..709ef3e 100644
--- a/src/input.c
+++ b/src/input.c
@@ -1864,64 +1864,7 @@ next_token (token_data *td, int *line, struct obstack 
*obs, bool allow_argv,
       return TOKEN_ARGV;
     }

-  if (MATCH (ch, curr_comm.str1, curr_comm.len1, true))
-    {
-      if (obs)
-       obs_td = obs;
-      obstack_grow (obs_td, curr_comm.str1, curr_comm.len1);
-      while (1)
-       {
-         /* Start with buffer search for potential end delimiter.  */
-         size_t len;
-         const char *buffer = next_buffer (&len, false);
-         if (buffer)
-           {
-             const char *p = (char *) memchr (buffer, *curr_comm.str2, len);
-             if (p)
-               {
-                 obstack_grow (obs_td, buffer, p - buffer);
-                 ch = to_uchar (*p);
-                 consume_buffer (p - buffer + 1);
-               }
-             else
-               {
-                 obstack_grow (obs_td, buffer, len);
-                 consume_buffer (len);
-                 continue;
-               }
-           }
-
-         /* Fall back to byte-wise search.  */
-         else
-           ch = next_char (false, false);
-         if (ch == CHAR_EOF)
-           {
-             /* Current_file changed to "" if we see CHAR_EOF, use
-                the previous value we stored earlier.  */
-             if (!caller)
-               {
-                 assert (line);
-                 current_line = *line;
-                 current_file = file;
-               }
-             m4_error (EXIT_FAILURE, 0, caller, _("end of file in comment"));
-           }
-         if (ch == CHAR_MACRO)
-           {
-             init_macro_token (obs, obs ? td : NULL);
-             continue;
-           }
-         if (MATCH (ch, curr_comm.str2, curr_comm.len2, true))
-           {
-             obstack_grow (obs_td, curr_comm.str2, curr_comm.len2);
-             break;
-           }
-         assert (ch < CHAR_EOF);
-         obstack_1grow (obs_td, ch);
-       }
-      type = TOKEN_COMMENT;
-    }
-  else if (default_word_regexp && (isalpha (ch) || ch == '_'))
+  if (default_word_regexp && (isalpha (ch) || ch == '_'))
     {
       obstack_1grow (&token_stack, ch);
       while (1)
@@ -1996,27 +1939,7 @@ next_token (token_data *td, int *line, struct obstack 
*obs, bool allow_argv,

 #endif /* ENABLE_CHANGEWORD */

-  else if (!MATCH (ch, curr_quote.str1, curr_quote.len1, true))
-    {
-      assert (ch < CHAR_EOF);
-      switch (ch)
-       {
-       case '(':
-         type = TOKEN_OPEN;
-         break;
-       case ',':
-         type = TOKEN_COMMA;
-         break;
-       case ')':
-         type = TOKEN_CLOSE;
-         break;
-       default:
-         type = TOKEN_SIMPLE;
-         break;
-       }
-      obstack_1grow (&token_stack, ch);
-    }
-  else
+  else if (MATCH (ch, curr_quote.str1, curr_quote.len1, true))
     {
       if (obs)
        obs_td = obs;
@@ -2096,6 +2019,83 @@ next_token (token_data *td, int *line, struct obstack 
*obs, bool allow_argv,
            }
        }
     }
+  else if (MATCH (ch, curr_comm.str1, curr_comm.len1, true))
+    {
+      if (obs)
+       obs_td = obs;
+      obstack_grow (obs_td, curr_comm.str1, curr_comm.len1);
+      while (1)
+       {
+         /* Start with buffer search for potential end delimiter.  */
+         size_t len;
+         const char *buffer = next_buffer (&len, false);
+         if (buffer)
+           {
+             const char *p = (char *) memchr (buffer, *curr_comm.str2, len);
+             if (p)
+               {
+                 obstack_grow (obs_td, buffer, p - buffer);
+                 ch = to_uchar (*p);
+                 consume_buffer (p - buffer + 1);
+               }
+             else
+               {
+                 obstack_grow (obs_td, buffer, len);
+                 consume_buffer (len);
+                 continue;
+               }
+           }
+
+         /* Fall back to byte-wise search.  */
+         else
+           ch = next_char (false, false);
+         if (ch == CHAR_EOF)
+           {
+             /* Current_file changed to "" if we see CHAR_EOF, use
+                the previous value we stored earlier.  */
+             if (!caller)
+               {
+                 assert (line);
+                 current_line = *line;
+                 current_file = file;
+               }
+             m4_error (EXIT_FAILURE, 0, caller, _("end of file in comment"));
+           }
+         if (ch == CHAR_MACRO)
+           {
+             init_macro_token (obs, obs ? td : NULL);
+             continue;
+           }
+         if (MATCH (ch, curr_comm.str2, curr_comm.len2, true))
+           {
+             obstack_grow (obs_td, curr_comm.str2, curr_comm.len2);
+             break;
+           }
+         assert (ch < CHAR_EOF);
+         obstack_1grow (obs_td, ch);
+       }
+      type = TOKEN_COMMENT;
+    }
+  else
+    {
+      assert (ch < CHAR_EOF);
+      switch (ch)
+       {
+       case '(':
+         type = TOKEN_OPEN;
+         break;
+       case ',':
+         type = TOKEN_COMMA;
+         break;
+       case ')':
+         type = TOKEN_CLOSE;
+         break;
+       default:
+         type = TOKEN_SIMPLE;
+         break;
+       }
+      obstack_1grow (&token_stack, ch);
+    }

   if (TOKEN_DATA_TYPE (td) == TOKEN_VOID)
     {
-- 
1.6.1.2

_______________________________________________
M4-patches mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/m4-patches

token recognition order

Reply via email to