David Caplan wrote: > Hi Gary, Hi David!
> Thanks for the quick response! No probs. > According to the documentation "All characters between the comment > delimeters are ignored", but you are saying that m4 tokenizes on quotes > as the very first pass. Yeah, kindof. A quoted comment (ala your define argument) is slightly different to an unquoted comment. When M4 see's an open quote, then it reads to the matching close quote, strips the outer quotes and then goes on to process the quoted content. > This seems contrary to the documentation as > well as how every other programming language is parsed. I think the > reasonable expectation when one inserts a comment is that the comment > text will not be parsed or processed in any way. But similarly, M4 is supposed to read text between quotes without looking at the content and behaving differently depending on what it sees. > Is the difference because m4 is not really a traditional programming > language, but a pre-processing language? I still think it is > unreasonable (i.e., a bug) to allow processing to be done within a > comment by default. I think I see where we differ here, and it is unfortunate that quoting in M4 is so difficult to get right. Rest assured that when you have mastered the subtleties of quoting and rescanning, then M4s behaviour becomes much more predictable. I think you meant to write this: define(`foo', # there aren't any arguments to foo ``this is output of the foo macro'') Notice that the comment is not quoted now, so references to macros (foo) and unbalanced quotes (') are left untouched as the reader tokenises the text between # and \n as a single comment token. > I think the reasonable solution is to use the > changequote, or changecom, when one _desires_ parsing of something > normally thought of as a comment (see example in documentation for > changecom). The opposite is true of M4, so unfortunately, I think you will be surprised by the expansion of foo given the definition above: foo => # there aren't any arguments to foo this is output of the foo macro But this is correct according to POSIX SUSv3 (http://www.opengroup.org/onlinepubs/009695399/utilities/m4.html): Comments are written but not scanned for matching macro names; by default, the begin-comment string consists of the number sign character and the end-comment string consists of a <newline>. If you want to write text in the arguments to macros, but have it removed, then you must remove it yourself since comments in m4 are also a little different to what you might expect to see in an imperative language. Fortunately, because arguments are rescanned for expansions, it is easy to do this (it is referenced in the GNU M4 docs IIRC): define(`foo', ifelse( # there aren't any arguments to foo )``this is output of the foo macro'') => foo => this is output of the foo macro So, I've used ifelse to discard the text during scanning, but carefully retained the # comment start character to prevent the unmatched ' or the reference to foo from being expanded during rescanning (of the argument to ifelse). The original "output" string is still double quoted to prevent expansion of foo. Note that if I had started the quotes before ifelse, then the ' in aren't would have been matched as the end of a quoted string, because the # would have been quoted. So this is WRONG: define(`foo', `ifelse( # there aren't any arguments to foo )`this is output of the foo macro'') Note also that double quoting the entire argument would prevent the ifelse from being expanded (and discarding its argument) during rescanning. So this is WRONG too: define(`foo', ``ifelse( # there aren't any arguments to foo )this is output of the foo macro'') It is good style to single quote all arguments to macros, except where macros should be expanded (or comments noticed!) during tokenising, in which case the quotes must be left off; or when an argument must be left untouched, when double quoting must be used. So stylisticly, this is better (although a little harder to understand): define(`foo', `ifelse(' # there aren't any arguments to foo `)`this is output of the foo macro'') => foo =>this is output of the foo macro If you are still getting to grips with M4, then there are more surprises ahead when positional parameters ($1 etc.) come in to play, but feel free to ask on the list if they are not behaving as you expect. Also you need to be careful about quoting commas correctly otherwise you might find the reader starts the next argument prematurely. And remember that when the text of arguments to macros are rescanned for expansions, an unexpected comma could be inserted... > In your example you changed the quote characters to brackets. I think > that it becomes overwhelming to have to constantly change the quote > characters or comment characters because of punctuation one wants to use > in a comment. Indeed. And, especially because the choice is so critical to the behaviour of the tokeniser and parser, it is important to choose comment and quote characters that will not interfere with the body of the files that are being processed. In practice, the standard `' quotes occur alone in english text so often that it is unusual NOT to change them. The normal practice is to change them once right at the start of the file, choosing the replacements wisely to avoid having to change them again later in the file just to avoid the kinds of problems you are encountering. Autoconf uses [] because those characters almost never occur unpaired, so they can be double quoted to pass them through to the output. changequote([,]) define([foo], [[open: [, close: ]]]) => foo => open: [, close: ] > I'm working with SELinux policy, which uses m4 macros as a convention > for generating parts of the security policy. I've found that people > occasionally put quotes in their comments and this hoses up the > policies. Perhaps this "convention" was an inappropriate use of m4? Unfortunately so. But it is a very common misunderstanding. > At any rate, as the official voice of m4, you are saying that this is > not a bug and is the appropriate, reasonable, and expected behavior for > m4, correct? Absolutely. I'm only the official voice of GNU M4 though, the POSIX committee holds the reins of the standard. > [I don't intend for this to come across as overly argumentative. I just > want to make my case to you/whoever is in charge of m4.] Not at all. Hopefully, my long explanation will save others from tripping over the same gotchas. Cheers, Gary. -- Gary V. Vaughan ())_. [EMAIL PROTECTED],gnu.org} Research Scientist ( '/ http://tkd.kicks-ass.net GNU Hacker / )= http://www.gnu.org/software/libtool Technical Author `(_~)_ http://sources.redhat.com/autobook
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Bug-m4 mailing list Bug-m4@gnu.org http://lists.gnu.org/mailman/listinfo/bug-m4