This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU M4 source repository".
http://git.sv.gnu.org/gitweb/?p=m4.git;a=commitdiff;h=0d6fb01e76bc35550a00cbf7710d1471db9e7b00 The branch, branch-1.6 has been updated via 0d6fb01e76bc35550a00cbf7710d1471db9e7b00 (commit) from c9d53ab9bcef0cb04d59f5797e6f20159150b75d (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 0d6fb01e76bc35550a00cbf7710d1471db9e7b00 Author: Eric Blake <[EMAIL PROTECTED]> Date: Sun Aug 3 14:23:19 2008 -0600 Fix regression in commenting unbalanced quotes, from 2008-02-16. * src/m4.h (enum token_type): Add TOKEN_COMMENT. * src/input.c (next_token, peek_token, token_type_string) (print_token): Supply new token type for comments. * src/macro.c (expand_token): Remove penalty for unquoted `-' bytes. Penalize comments, as they can contain unbalanced quotes; latent bug since 2007-12-07, exposed by passing $@ references built from comments. (expand_argument): Adjust caller. * doc/m4.texinfo (Comments): Test the fix. * NEWS: Mention the fix. Signed-off-by: Eric Blake <[EMAIL PROTECTED]> ----------------------------------------------------------------------- Summary of changes: ChangeLog | 14 ++++++++++++++ NEWS | 14 +++++++------- doc/m4.texinfo | 21 +++++++++++++++++++++ src/input.c | 36 ++++++++++++++++++++++-------------- src/m4.h | 3 ++- src/macro.c | 49 +++++++++++++++++++++++++++++++------------------ 6 files changed, 97 insertions(+), 40 deletions(-) diff --git a/ChangeLog b/ChangeLog index d4f182e..325bf7a 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,17 @@ +2008-08-03 Eric Blake <[EMAIL PROTECTED]> + + Fix regression in commenting unbalanced quotes, from 2008-02-16. + * src/m4.h (enum token_type): Add TOKEN_COMMENT. + * src/input.c (next_token, peek_token, token_type_string) + (print_token): Supply new token type for comments. + * src/macro.c (expand_token): Remove penalty for unquoted `-' + bytes. Penalize comments, as they can contain unbalanced quotes; + latent bug since 2007-12-07, exposed by passing $@ references + built from comments. + (expand_argument): Adjust caller. + * doc/m4.texinfo (Comments): Test the fix. + * NEWS: Mention the fix. + 2008-07-30 Eric Blake <[EMAIL PROTECTED]> Fix regression in trace output, introduced 2008-05-09. diff --git a/NEWS b/NEWS index bea5f07..fe6f9e8 100644 --- a/NEWS +++ b/NEWS @@ -9,13 +9,13 @@ Foundation, Inc. a macro. This was most noticeable with `traceon(`traceon')', but would also happen in cases such as `foo(traceon(`foo'))'. -** Fix regression introduced in 1.4.10b (but not present in 1.4.11) where - using `builtin' or `indir' to perform nested `shift' calls triggered an - assertion failure. - -** Fix regression introduced in 1.4.10b (but not present in 1.4.11) where - the command-line option -dV, as well as the builtin `debugmode(V)', - failed to enable `t' and `c' debug options. +** Fix regressions introduced in 1.4.10b but not present in 1.4.11: +*** Using `builtin' or `indir' to perform nested `shift' calls triggered + an assertion failure. +*** The command-line option -dV, as well as the builtin `debugmode(V)', + failed to enable `t' and `c' debug options. +*** Comments that contain unbalanced quotes were not rescanned correctly + when passed through [EMAIL PROTECTED] ** Fix the `m4wrap' builtin to accumulate wrapped text in FIFO order, as required by POSIX. The manual mentions a way to restore the LIFO order diff --git a/doc/m4.texinfo b/doc/m4.texinfo index abacef9..d8e2625 100644 --- a/doc/m4.texinfo +++ b/doc/m4.texinfo @@ -1038,6 +1038,27 @@ The comment delimiters can be changed to any string at any time, using the builtin macro @code{changecom}. @xref{Changecom}, for more information. [EMAIL PROTECTED] [EMAIL PROTECTED] Detect regression in 1.4.10b in regards to reparsing comments. [EMAIL PROTECTED] Not worth including in the manual. [EMAIL PROTECTED] +define(`e', `$@@')define(`q', ``$@@'')define(`foo', `bar') [EMAIL PROTECTED] +q(e(`one +',#two ' foo +)) [EMAIL PROTECTED] [EMAIL PROTECTED]',`#two bar [EMAIL PROTECTED]'' +changecom(`<', `>')define(`n', `$#') [EMAIL PROTECTED] +n(e(<`>, <'>)) [EMAIL PROTECTED] +len(e(<`>, ,<'>)) [EMAIL PROTECTED] [EMAIL PROTECTED] example [EMAIL PROTECTED] ignore + @node Other tokens @section Other kinds of input tokens diff --git a/src/input.c b/src/input.c index 0d08215..4f969b7 100644 --- a/src/input.c +++ b/src/input.c @@ -1590,17 +1590,18 @@ quote_cache (struct obstack *obs, unsigned int age, const string_pair *quotes) /*--------------------------------------------------------------------. | Parse a single token from the input stream, set TD to its | | contents, and return its type. A token is TOKEN_EOF if the | -| input_stack is empty; TOKEN_STRING for a quoted string or comment; | -| TOKEN_WORD for something that is a potential macro name; and | -| TOKEN_SIMPLE for any single character that is not a part of any of | -| the previous types. If LINE is not NULL, set *LINE to the line | -| where the token starts. If OBS is not NULL, expand TOKEN_STRING | -| directly into OBS rather than in token_stack temporary storage | -| area, and TD could be a TOKEN_COMP instead of the usual | -| TOKEN_TEXT. If ALLOW_ARGV, OBS must be non-NULL, and an entire | -| series of arguments can be returned as TOKEN_ARGV when a $@ | -| reference is encountered. Report errors (unterminated comments or | -| strings) on behalf of CALLER, if non-NULL. | +| input_stack is empty; TOKEN_STRING for a quoted string; | +| TOKEN_COMMENT for a comment; TOKEN_WORD for something that is a | +| potential macro name; and TOKEN_SIMPLE for any single character | +| that is not a part of any of the previous types. If LINE is not | +| NULL, set *LINE to the line where the token starts. If OBS is not | +| NULL, expand TOKEN_STRING and TOKEN_COMMENT directly into OBS | +| rather than in token_stack temporary storage area, and TD could be | +| a TOKEN_COMP instead of the usual TOKEN_TEXT. If ALLOW_ARGV, OBS | +| must be non-NULL, and an entire series of arguments can be | +| returned as TOKEN_ARGV when a $@ reference is encountered. Report | +| errors (unterminated comments or strings) on behalf of CALLER, if | +| non-NULL. | | | | Next_token () returns the token type, and passes back a pointer to | | the token data through TD. Non-string token text is collected on | @@ -1695,7 +1696,7 @@ next_token (token_data *td, int *line, struct obstack *obs, bool allow_argv, assert (ch < CHAR_EOF); obstack_1grow (obs_td, ch); } - type = TOKEN_STRING; + type = TOKEN_COMMENT; } else if (default_word_regexp && (isalpha (ch) || ch == '_')) { @@ -1837,7 +1838,8 @@ next_token (token_data *td, int *line, struct obstack *obs, bool allow_argv, } else { - assert (TOKEN_DATA_TYPE (td) == TOKEN_COMP && type == TOKEN_STRING); + assert (TOKEN_DATA_TYPE (td) == TOKEN_COMP + && (type == TOKEN_STRING || type == TOKEN_COMMENT)); #ifdef DEBUG_INPUT { token_chain *chain; @@ -1895,7 +1897,7 @@ peek_token (void) } else if (MATCH (ch, curr_comm.str1, curr_comm.len1, false)) { - result = TOKEN_STRING; + result = TOKEN_COMMENT; } else if ((default_word_regexp && (isalpha (ch) || ch == '_')) #ifdef ENABLE_CHANGEWORD @@ -1943,6 +1945,8 @@ token_type_string (token_type t) return "EOF"; case TOKEN_STRING: return "STRING"; + case TOKEN_COMMENT: + return "COMMENT"; case TOKEN_WORD: return "WORD"; case TOKEN_OPEN: @@ -1981,6 +1985,10 @@ print_token (const char *s, token_type t, token_data *td) xfprintf (stderr, "string:"); break; + case TOKEN_COMMENT: + xfprintf (stderr, "comment:"); + break; + case TOKEN_MACDEF: xfprintf (stderr, "macro: %p\n", TOKEN_DATA_FUNC (td)); break; diff --git a/src/m4.h b/src/m4.h index ff0377a..40aa5ec 100644 --- a/src/m4.h +++ b/src/m4.h @@ -218,7 +218,8 @@ typedef struct token_chain token_chain; enum token_type { TOKEN_EOF = 4,/* End of file, TOKEN_VOID. */ - TOKEN_STRING, /* Quoted string or comment, TOKEN_TEXT or TOKEN_COMP. */ + TOKEN_STRING, /* Quoted string, TOKEN_TEXT or TOKEN_COMP. */ + TOKEN_COMMENT,/* Comment, TOKEN_TEXT or TOKEN_COMP. */ TOKEN_WORD, /* An identifier, TOKEN_TEXT. */ TOKEN_OPEN, /* Active character `(', TOKEN_TEXT. */ TOKEN_COMMA, /* Active character `,', TOKEN_TEXT. */ diff --git a/src/macro.c b/src/macro.c index 0b57436..9d8ffbb 100644 --- a/src/macro.c +++ b/src/macro.c @@ -260,8 +260,7 @@ expand_token (struct obstack *obs, token_type t, token_data *td, int line, bool first) { symbol *sym; - bool result; - int ch; + bool result = false; switch (t) { /* TOKSW */ @@ -271,13 +270,19 @@ expand_token (struct obstack *obs, token_type t, token_data *td, int line, return true; case TOKEN_STRING: - /* Tokens and comments are safe in isolation (since quote_age() - detects any change in delimiters). But if other text is - already present, multi-character delimiters could be an - issue, so use a conservative heuristic. If obstack is - provided, the string was already expanded into it during - next_token. */ + /* Strings are safe in isolation (since quote_age() detects any + change in delimiters), or when safe_quotes is true. When + safe_quotes is false, we could technically return true if we + can prove that the concatenation of this string to prior text + does not form a multi-byte quote delimiter, but that is a lot + of overhead, so we give the conservative answer of false. */ result = first || safe_quotes (); + /* fallthru */ + case TOKEN_COMMENT: + /* Comments can contain unbalanced quote delimiters. Rather + than search for one, we return the conservative answer of + false. If obstack is provided, the string or comment was + already expanded into it during next_token. */ if (obs) return result; break; @@ -285,18 +290,23 @@ expand_token (struct obstack *obs, token_type t, token_data *td, int line, case TOKEN_OPEN: case TOKEN_COMMA: case TOKEN_CLOSE: - /* Conservative heuristic; thanks to multi-character delimiter - concatenation. */ + /* If safe_quotes is true, then these do not form a quote + delimiter. If it is false, we give the conservative answer + of false rather than taking time to prove that no multi-byte + quote delimiter is formed. */ result = safe_quotes (); break; case TOKEN_SIMPLE: - /* Conservative heuristic; if these characters are whitespace or - numeric, then behavior of safe_quotes is applicable. - Otherwise, assume these characters have a high likelihood of - use in quote delimiters. */ - ch = to_uchar (*TOKEN_DATA_TEXT (td)); - result = (isspace (ch) || isdigit (ch)) && safe_quotes (); + /* If safe_quotes is true, then all but the single-byte end + quote delimiter is safe in a quoted context; a single-byte + start delimiter will trigger a TOKEN_STRING instead. If + safe_quotes is false, we give the conservative answer of + false rather than taking time to prove that no multi-byte + quote delimiter is formed. */ + result = *TOKEN_DATA_TEXT (td) != *curr_quote.str2 && safe_quotes (); + if (result) + assert (*TOKEN_DATA_TEXT (td) != *curr_quote.str1); break; case TOKEN_WORD: @@ -313,8 +323,10 @@ expand_token (struct obstack *obs, token_type t, token_data *td, int line, #else divert_text (obs, TOKEN_DATA_TEXT (td), TOKEN_DATA_LEN (td), line); #endif /* !ENABLE_CHANGEWORD */ - /* The word just appended is unquoted, but the heuristics of - safe_quote are applicable. */ + /* If safe_quotes is true, then words do not overlap with + quote delimiters. If it is false, we give the + conservative answer of false rather than prove that no + multi-byte delimiters are formed. */ return safe_quotes(); } expand_macro (sym); @@ -420,6 +432,7 @@ expand_argument (struct obstack *obs, token_data *argp, case TOKEN_WORD: case TOKEN_STRING: + case TOKEN_COMMENT: case TOKEN_MACDEF: if (!expand_token (obs, t, &td, line, first)) age = 0; hooks/post-receive -- GNU M4 source repository
