Hi, Gary, Gary V. Vaughan <gary <at> gnu.org> writes:
> > On 12 Oct 2006, at 16:13, Eric Blake wrote: > > I still want CVS head to follow Solaris' parsing precedence > > rules (macros, then quotes, then comments), rather than the current > > behavior > > (comments, macros, quotes). > > Can you remind me why that is? The first thing that happens in any > parser I'm familiar with is to discard the comments, why is it a good > thing for M4 to behave differently? (I think I know an answer, but > I'm curious to understand your reasoning here) Most languages have the (rather nice) property that you cannot confuse comments with other tokens. M4, on the other hand, thanks to changequote and changecom, can be placed into a position where it is ambiguous whether the parser should recognize the current character as the start of a macro or the start of a comment. (Fortunately for changesyntax, we document that syntax designations are mutually exclusive - you cannot use changesyntax to simultaneously make a character both a letter and a comment start.) The dilemma is not that macros are not discarded without expanding macros inside the comment, so much as recognizing what constitutes a comment. I guess an analog to this dilemma is the C89 vs. C99 parse question: int i = 1 //* //*/ -1; /* Is i 0 or -1? */ In C89, there are no // comments, so it parses as 'int i = 1 / <comment> -1;', giving -1. In C99, the parser sees 'int i = 1 <comment> <comment> -1;', giving an answer of 0. Because C99 changed the comment syntax to allow an additional form, it is possible to encounter (admittedly unusual) test cases that can expose the difference. Now, for a concrete example in m4. $ /usr/xpg4/bin/m4 define(a,A)define(a1a2a,b)changecom(1,2)a1a2a b a 1 a 2 a A 1 a 2 A $ Here, both Solaris and GNU agree - once you start parsing a macro name, you greedily consume as many additional characters as fit in a name, even if you could otherwise recognize a comment or quote were you to not be greedy. $ /usr/xpg4/bin/m4 define(a,A)define(b,B)changequote(`a',c) a b c A B c $ Again, both implementations agree - the a is recognized as a macro name and expanded to A, and not reconized as a quote start, so b gets expanded and all three letters printed. $ /usr/xpg4/bin/m4 define(a,A)define(b,B)changecom(`a',]) a b ] A B ] $ m4 define(a,A)define(b,B)changecom(`a',]) a b ] a b ] $ Hmm, now we have a difference. Solaris said that 'a' matches a macro name, so expand it to A, at which point there is no comment recognized and b gets expanded. GNU 1.4.x said that 'a' matches the comment start string, so look for ], and everything in between, including 'b', is output untouched. $ /usr/xpg4/bin/m4 changecom(`[[[',`]]]')changequote(`[[',`]]')define(a,A) [[a]] a [[[a]]] [a] changequote changecom changecom(`[[',`]]')changequote(`[[[',`]]]') [[a]] [[a]] [[[a]]] a $ m4 changecom(`[[[',`]]]')changequote(`[[',`]]')define(a,A) [[a]] a [[[a]]] [[[a]]] changequote changecom changecom(`[[',`]]')changequote(`[[[',`]]]') [[a]] [[a]] [[[a]]] [[[a]]] $ Hmm, in Solaris, when the prefix was ambiguous between quote and comment, it always chose quote when given a chance, even when quote was the shorter prefix. In GNU, on the other hand, the comment was always recognized first. If either implementation were a strictly greedy parser, then you would expect the longer start token to be recognized in preference to the shorter one. POSIX does not explicitly document precedence in m4 between the three types of tokens. However, it does document macros, then quotes, then comments, which is the same precedence that Solaris uses. The only time it should matter is if comments and quotes share a common prefix; or if comments and/or quotes start with a letter or underscore. If anything, the reason I am proposing delaying the recognition of comments until after macro names and quote starts have been recognized is to match historical behavior, and so that GNU M4 parsing at least follows the order that the three token types are mentioned in POSIX. -- Eric Blake _______________________________________________ M4-patches mailing list [email protected] http://lists.gnu.org/mailman/listinfo/m4-patches
