[SCM] GNU M4 source repository branch, branch-1.6, updated. v1.5.89a-45-g0d6fb01

Eric Blake Sun, 03 Aug 2008 17:46:04 -0700

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU M4 source repository".


http://git.sv.gnu.org/gitweb/?p=m4.git;a=commitdiff;h=0d6fb01e76bc35550a00cbf7710d1471db9e7b00

The branch, branch-1.6 has been updated
       via  0d6fb01e76bc35550a00cbf7710d1471db9e7b00 (commit)
      from  c9d53ab9bcef0cb04d59f5797e6f20159150b75d (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 0d6fb01e76bc35550a00cbf7710d1471db9e7b00
Author: Eric Blake <[EMAIL PROTECTED]>
Date:   Sun Aug 3 14:23:19 2008 -0600

    Fix regression in commenting unbalanced quotes, from 2008-02-16.
    
    * src/m4.h (enum token_type): Add TOKEN_COMMENT.
    * src/input.c (next_token, peek_token, token_type_string)
    (print_token): Supply new token type for comments.
    * src/macro.c (expand_token): Remove penalty for unquoted `-'
    bytes.  Penalize comments, as they can contain unbalanced quotes;
    latent bug since 2007-12-07, exposed by passing $@ references
    built from comments.
    (expand_argument): Adjust caller.
    * doc/m4.texinfo (Comments): Test the fix.
    * NEWS: Mention the fix.
    
    Signed-off-by: Eric Blake <[EMAIL PROTECTED]>

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog      |   14 ++++++++++++++
 NEWS           |   14 +++++++-------
 doc/m4.texinfo |   21 +++++++++++++++++++++
 src/input.c    |   36 ++++++++++++++++++++++--------------
 src/m4.h       |    3 ++-
 src/macro.c    |   49 +++++++++++++++++++++++++++++++------------------
 6 files changed, 97 insertions(+), 40 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index d4f182e..325bf7a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,17 @@
+2008-08-03  Eric Blake  <[EMAIL PROTECTED]>
+
+       Fix regression in commenting unbalanced quotes, from 2008-02-16.
+       * src/m4.h (enum token_type): Add TOKEN_COMMENT.
+       * src/input.c (next_token, peek_token, token_type_string)
+       (print_token): Supply new token type for comments.
+       * src/macro.c (expand_token): Remove penalty for unquoted `-'
+       bytes.  Penalize comments, as they can contain unbalanced quotes;
+       latent bug since 2007-12-07, exposed by passing $@ references
+       built from comments.
+       (expand_argument): Adjust caller.
+       * doc/m4.texinfo (Comments): Test the fix.
+       * NEWS: Mention the fix.
+
 2008-07-30  Eric Blake  <[EMAIL PROTECTED]>
 
        Fix regression in trace output, introduced 2008-05-09.
diff --git a/NEWS b/NEWS
index bea5f07..fe6f9e8 100644
--- a/NEWS
+++ b/NEWS
@@ -9,13 +9,13 @@ Foundation, Inc.
    a macro.  This was most noticeable with `traceon(`traceon')', but
    would also happen in cases such as `foo(traceon(`foo'))'.
 
-** Fix regression introduced in 1.4.10b (but not present in 1.4.11) where
-   using `builtin' or `indir' to perform nested `shift' calls triggered an
-   assertion failure.
-
-** Fix regression introduced in 1.4.10b (but not present in 1.4.11) where
-   the command-line option -dV, as well as the builtin `debugmode(V)',
-   failed to enable `t' and `c' debug options.
+** Fix regressions introduced in 1.4.10b but not present in 1.4.11:
+*** Using `builtin' or `indir' to perform nested `shift' calls triggered
+    an assertion failure.
+*** The command-line option -dV, as well as the builtin `debugmode(V)',
+    failed to enable `t' and `c' debug options.
+*** Comments that contain unbalanced quotes were not rescanned correctly
+    when passed through [EMAIL PROTECTED]
 
 ** Fix the `m4wrap' builtin to accumulate wrapped text in FIFO order, as
    required by POSIX.  The manual mentions a way to restore the LIFO order
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index abacef9..d8e2625 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -1038,6 +1038,27 @@ The comment delimiters can be changed to any string at 
any time, using
 the builtin macro @code{changecom}.  @xref{Changecom}, for more
 information.
 
[EMAIL PROTECTED]
[EMAIL PROTECTED] Detect regression in 1.4.10b in regards to reparsing comments.
[EMAIL PROTECTED] Not worth including in the manual.
[EMAIL PROTECTED]
+define(`e', `$@@')define(`q', ``$@@'')define(`foo', `bar')
[EMAIL PROTECTED]
+q(e(`one
+',#two ' foo
+))
[EMAIL PROTECTED]
[EMAIL PROTECTED]',`#two  bar
[EMAIL PROTECTED]''
+changecom(`<', `>')define(`n', `$#')
[EMAIL PROTECTED]
+n(e(<`>, <'>))
[EMAIL PROTECTED]
+len(e(<`>, ,<'>))
[EMAIL PROTECTED]
[EMAIL PROTECTED] example
[EMAIL PROTECTED] ignore
+
 @node Other tokens
 @section Other kinds of input tokens
 
diff --git a/src/input.c b/src/input.c
index 0d08215..4f969b7 100644
--- a/src/input.c
+++ b/src/input.c
@@ -1590,17 +1590,18 @@ quote_cache (struct obstack *obs, unsigned int age, 
const string_pair *quotes)
 /*--------------------------------------------------------------------.
 | Parse a single token from the input stream, set TD to its          |
 | contents, and return its type.  A token is TOKEN_EOF if the        |
-| input_stack is empty; TOKEN_STRING for a quoted string or comment;  |
-| TOKEN_WORD for something that is a potential macro name; and       |
-| TOKEN_SIMPLE for any single character that is not a part of any of  |
-| the previous types.  If LINE is not NULL, set *LINE to the line     |
-| where the token starts.  If OBS is not NULL, expand TOKEN_STRING    |
-| directly into OBS rather than in token_stack temporary storage      |
-| area, and TD could be a TOKEN_COMP instead of the usual            |
-| TOKEN_TEXT.  If ALLOW_ARGV, OBS must be non-NULL, and an entire     |
-| series of arguments can be returned as TOKEN_ARGV when a $@        |
-| reference is encountered.  Report errors (unterminated comments or  |
-| strings) on behalf of CALLER, if non-NULL.                         |
+| input_stack is empty; TOKEN_STRING for a quoted string;            |
+| TOKEN_COMMENT for a comment; TOKEN_WORD for something that is a     |
+| potential macro name; and TOKEN_SIMPLE for any single character     |
+| that is not a part of any of the previous types.  If LINE is not    |
+| NULL, set *LINE to the line where the token starts.  If OBS is not  |
+| NULL, expand TOKEN_STRING and TOKEN_COMMENT directly into OBS              |
+| rather than in token_stack temporary storage area, and TD could be  |
+| a TOKEN_COMP instead of the usual TOKEN_TEXT.  If ALLOW_ARGV, OBS   |
+| must be non-NULL, and an entire series of arguments can be         |
+| returned as TOKEN_ARGV when a $@ reference is encountered.  Report  |
+| errors (unterminated comments or strings) on behalf of CALLER, if   |
+| non-NULL.                                                          |
 |                                                                    |
 | Next_token () returns the token type, and passes back a pointer to  |
 | the token data through TD.  Non-string token text is collected on   |
@@ -1695,7 +1696,7 @@ next_token (token_data *td, int *line, struct obstack 
*obs, bool allow_argv,
          assert (ch < CHAR_EOF);
          obstack_1grow (obs_td, ch);
        }
-      type = TOKEN_STRING;
+      type = TOKEN_COMMENT;
     }
   else if (default_word_regexp && (isalpha (ch) || ch == '_'))
     {
@@ -1837,7 +1838,8 @@ next_token (token_data *td, int *line, struct obstack 
*obs, bool allow_argv,
     }
   else
     {
-      assert (TOKEN_DATA_TYPE (td) == TOKEN_COMP && type == TOKEN_STRING);
+      assert (TOKEN_DATA_TYPE (td) == TOKEN_COMP
+             && (type == TOKEN_STRING || type == TOKEN_COMMENT));
 #ifdef DEBUG_INPUT
       {
        token_chain *chain;
@@ -1895,7 +1897,7 @@ peek_token (void)
     }
   else if (MATCH (ch, curr_comm.str1, curr_comm.len1, false))
     {
-      result = TOKEN_STRING;
+      result = TOKEN_COMMENT;
     }
   else if ((default_word_regexp && (isalpha (ch) || ch == '_'))
 #ifdef ENABLE_CHANGEWORD
@@ -1943,6 +1945,8 @@ token_type_string (token_type t)
       return "EOF";
     case TOKEN_STRING:
       return "STRING";
+    case TOKEN_COMMENT:
+      return "COMMENT";
     case TOKEN_WORD:
       return "WORD";
     case TOKEN_OPEN:
@@ -1981,6 +1985,10 @@ print_token (const char *s, token_type t, token_data *td)
       xfprintf (stderr, "string:");
       break;
 
+    case TOKEN_COMMENT:
+      xfprintf (stderr, "comment:");
+      break;
+
     case TOKEN_MACDEF:
       xfprintf (stderr, "macro: %p\n", TOKEN_DATA_FUNC (td));
       break;
diff --git a/src/m4.h b/src/m4.h
index ff0377a..40aa5ec 100644
--- a/src/m4.h
+++ b/src/m4.h
@@ -218,7 +218,8 @@ typedef struct token_chain token_chain;
 enum token_type
 {
   TOKEN_EOF = 4,/* End of file, TOKEN_VOID.  */
-  TOKEN_STRING,        /* Quoted string or comment, TOKEN_TEXT or TOKEN_COMP.  
*/
+  TOKEN_STRING,        /* Quoted string, TOKEN_TEXT or TOKEN_COMP.  */
+  TOKEN_COMMENT,/* Comment, TOKEN_TEXT or TOKEN_COMP.  */
   TOKEN_WORD,  /* An identifier, TOKEN_TEXT.  */
   TOKEN_OPEN,  /* Active character `(', TOKEN_TEXT.  */
   TOKEN_COMMA, /* Active character `,', TOKEN_TEXT.  */
diff --git a/src/macro.c b/src/macro.c
index 0b57436..9d8ffbb 100644
--- a/src/macro.c
+++ b/src/macro.c
@@ -260,8 +260,7 @@ expand_token (struct obstack *obs, token_type t, token_data 
*td, int line,
              bool first)
 {
   symbol *sym;
-  bool result;
-  int ch;
+  bool result = false;
 
   switch (t)
     {                          /* TOKSW */
@@ -271,13 +270,19 @@ expand_token (struct obstack *obs, token_type t, 
token_data *td, int line,
       return true;
 
     case TOKEN_STRING:
-      /* Tokens and comments are safe in isolation (since quote_age()
-        detects any change in delimiters).  But if other text is
-        already present, multi-character delimiters could be an
-        issue, so use a conservative heuristic.  If obstack is
-        provided, the string was already expanded into it during
-        next_token.  */
+      /* Strings are safe in isolation (since quote_age() detects any
+        change in delimiters), or when safe_quotes is true.  When
+        safe_quotes is false, we could technically return true if we
+        can prove that the concatenation of this string to prior text
+        does not form a multi-byte quote delimiter, but that is a lot
+        of overhead, so we give the conservative answer of false.  */
       result = first || safe_quotes ();
+      /* fallthru */
+    case TOKEN_COMMENT:
+      /* Comments can contain unbalanced quote delimiters.  Rather
+        than search for one, we return the conservative answer of
+        false.  If obstack is provided, the string or comment was
+        already expanded into it during next_token.  */
       if (obs)
        return result;
       break;
@@ -285,18 +290,23 @@ expand_token (struct obstack *obs, token_type t, 
token_data *td, int line,
     case TOKEN_OPEN:
     case TOKEN_COMMA:
     case TOKEN_CLOSE:
-      /* Conservative heuristic; thanks to multi-character delimiter
-        concatenation.  */
+      /* If safe_quotes is true, then these do not form a quote
+        delimiter.  If it is false, we give the conservative answer
+        of false rather than taking time to prove that no multi-byte
+        quote delimiter is formed.  */
       result = safe_quotes ();
       break;
 
     case TOKEN_SIMPLE:
-      /* Conservative heuristic; if these characters are whitespace or
-        numeric, then behavior of safe_quotes is applicable.
-        Otherwise, assume these characters have a high likelihood of
-        use in quote delimiters.  */
-      ch = to_uchar (*TOKEN_DATA_TEXT (td));
-      result = (isspace (ch) || isdigit (ch)) && safe_quotes ();
+      /* If safe_quotes is true, then all but the single-byte end
+        quote delimiter is safe in a quoted context; a single-byte
+        start delimiter will trigger a TOKEN_STRING instead.  If
+        safe_quotes is false, we give the conservative answer of
+        false rather than taking time to prove that no multi-byte
+        quote delimiter is formed.  */
+      result = *TOKEN_DATA_TEXT (td) != *curr_quote.str2 && safe_quotes ();
+      if (result)
+       assert (*TOKEN_DATA_TEXT (td) != *curr_quote.str1);
       break;
 
     case TOKEN_WORD:
@@ -313,8 +323,10 @@ expand_token (struct obstack *obs, token_type t, 
token_data *td, int line,
 #else
          divert_text (obs, TOKEN_DATA_TEXT (td), TOKEN_DATA_LEN (td), line);
 #endif /* !ENABLE_CHANGEWORD */
-         /* The word just appended is unquoted, but the heuristics of
-            safe_quote are applicable.  */
+         /* If safe_quotes is true, then words do not overlap with
+            quote delimiters.  If it is false, we give the
+            conservative answer of false rather than prove that no
+            multi-byte delimiters are formed.  */
          return safe_quotes();
        }
       expand_macro (sym);
@@ -420,6 +432,7 @@ expand_argument (struct obstack *obs, token_data *argp,
 
        case TOKEN_WORD:
        case TOKEN_STRING:
+       case TOKEN_COMMENT:
        case TOKEN_MACDEF:
          if (!expand_token (obs, t, &td, line, first))
            age = 0;


hooks/post-receive
--
GNU M4 source repository

[SCM] GNU M4 source repository branch, branch-1.6, updated. v1.5.89a-45-g0d6fb01

Reply via email to