This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU M4 source repository".
http://git.sv.gnu.org/gitweb/?p=m4.git;a=commitdiff;h=3be7d5f421c547a2118a4fc703c37ff648b55c55 The branch, branch-1_4 has been updated via 3be7d5f421c547a2118a4fc703c37ff648b55c55 (commit) from 000a6507a4ab5b46a609efc6fc5152e34c86e095 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit 3be7d5f421c547a2118a4fc703c37ff648b55c55 Author: Eric Blake <[EMAIL PROTECTED]> Date: Tue Oct 2 14:01:51 2007 -0600 Document quoting pitfalls in capitalize. * doc/m4.texinfo (Patsubst): Use the examples directory. Also document shortfall. (Improved capitalize): New node. * examples/capitalize.m4: Update to match manual. * examples/capitalize2.m4: New file. Signed-off-by: Eric Blake <[EMAIL PROTECTED]> ----------------------------------------------------------------------- Summary of changes: ChangeLog | 9 +++ doc/m4.texinfo | 166 +++++++++++++++++++++++++++++++++++++++++++++-- examples/capitalize.m4 | 16 +++-- examples/capitalize2.m4 | 19 ++++++ 4 files changed, 197 insertions(+), 13 deletions(-) create mode 100644 examples/capitalize2.m4 diff --git a/ChangeLog b/ChangeLog index 396a64f..7c9755a 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,12 @@ +2007-10-02 Eric Blake <[EMAIL PROTECTED]> + + Document quoting pitfalls in capitalize. + * doc/m4.texinfo (Patsubst): Use the examples directory. Also + document shortfall. + (Improved capitalize): New node. + * examples/capitalize.m4: Update to match manual. + * examples/capitalize2.m4: New file. + 2007-10-01 Eric Blake <[EMAIL PROTECTED]> Another Autoconf usage pattern optimization. diff --git a/doc/m4.texinfo b/doc/m4.texinfo index 6c76b7b..1bc5be9 100644 --- a/doc/m4.texinfo +++ b/doc/m4.texinfo @@ -270,6 +270,7 @@ Correct version of some examples * Improved forloop:: Solution for @code{forloop} * Improved foreach:: Solution for @code{foreach} * Improved cleardivert:: Solution for @code{cleardivert} +* Improved capitalize:: Solution for @code{capitalize} * Improved fatal_error:: Solution for @code{fatal_error} How to make copies of the overall M4 package @@ -4886,18 +4887,45 @@ to lower case, and @code{capitalize} changes the first character of each word to upper case and the remaining characters to lower case. @end deffn +First, an example of their usage, using implementations distributed in [EMAIL PROTECTED]@value{VERSION}/@/examples/@/capitalize.m4}. + @example -define(`upcase', `translit(`$*', `a-z', `A-Z')')dnl -define(`downcase', `translit(`$*', `A-Z', `a-z')')dnl -define(`capitalize1', - `regexp(`$1', `^\(\w\)\(\w*\)', - `upcase(`\1')`'downcase(`\2')')')dnl -define(`capitalize', - `patsubst(`$1', `\w+', `capitalize1(`\&')')')dnl +$ @kbd{m4 -I examples} +include(`capitalize.m4') [EMAIL PROTECTED] +upcase(`GNUs not Unix') [EMAIL PROTECTED] NOT UNIX +downcase(`GNUs not Unix') [EMAIL PROTECTED] not unix capitalize(`GNUs not Unix') @result{}Gnus Not Unix @end example +Now for the implementation. There is a helper macro @code{_capitalize} +which puts only its first word in mixed case. Then @code{capitalize} +merely parses out the words, and replaces them with an invocation of [EMAIL PROTECTED] (As presented here, the @code{capitalize} macro has +some subtle flaws. You should try to see if you can find and correct +them; or @pxref{Improved capitalize, , Answers}). + [EMAIL PROTECTED] +$ @kbd{m4 -I examples} +undivert(`capitalize.m4')dnl [EMAIL PROTECTED](`-1') [EMAIL PROTECTED] upcase(text) [EMAIL PROTECTED] downcase(text) [EMAIL PROTECTED] capitalize(text) [EMAIL PROTECTED] change case of text, simple version [EMAIL PROTECTED](`upcase', `translit(`$*', `a-z', `A-Z')') [EMAIL PROTECTED](`downcase', `translit(`$*', `A-Z', `a-z')') [EMAIL PROTECTED](`_capitalize', [EMAIL PROTECTED] `regexp(`$1', `^\(\w\)\(\w*\)', [EMAIL PROTECTED] `upcase(`\1')`'downcase(`\2')')') [EMAIL PROTECTED](`capitalize', `patsubst(`$1', `\w+', `_$0(`\&')')') [EMAIL PROTECTED]'dnl [EMAIL PROTECTED] example + While @code{regexp} replaces the whole input with the replacement as soon as there is a match, @code{patsubst} replaces each @emph{occurrence} of a match and preserves non-matching pieces: @@ -6490,6 +6518,7 @@ presented here. * Improved forloop:: Solution for @code{forloop} * Improved foreach:: Solution for @code{foreach} * Improved cleardivert:: Solution for @code{cleardivert} +* Improved capitalize:: Solution for @code{capitalize} * Improved fatal_error:: Solution for @code{fatal_error} @end menu @@ -6792,6 +6821,129 @@ undivert @result{} @end example [EMAIL PROTECTED] Improved capitalize [EMAIL PROTECTED] Solution for @code{capitalize} + +The @code{capitalize} macro (@pxref{Patsubst}) as presented earlier does +not allow clients to follow the quoting rule of thumb. Consider the +three macros @code{active}, @code{Active}, and @code{ACTIVE}, and the +difference between calling @code{capitalize} with the expansion of a +macro, expanding the result of a case change, and changing the case of a +double-quoted string: + [EMAIL PROTECTED] +$ @kbd{m4 -I examples} +include(`capitalize.m4')dnl +define(`active', `act1, ive')dnl +define(`Active', `Act2, Ive')dnl +define(`ACTIVE', `ACT3, IVE')dnl +upcase(active) [EMAIL PROTECTED],IVE +upcase(`active') [EMAIL PROTECTED], IVE +upcase(``active'') [EMAIL PROTECTED] +downcase(ACTIVE) [EMAIL PROTECTED],ive +downcase(`ACTIVE') [EMAIL PROTECTED], ive +downcase(``ACTIVE'') [EMAIL PROTECTED] +capitalize(active) [EMAIL PROTECTED] +capitalize(`active') [EMAIL PROTECTED] +capitalize(``active'') [EMAIL PROTECTED](`active') +define(`A', `OOPS') [EMAIL PROTECTED] +capitalize(active) [EMAIL PROTECTED] +capitalize(`active') [EMAIL PROTECTED] [EMAIL PROTECTED] example + +First, when @code{capitalize} is called with more than one argument, it +was throwing away later arguments, whereas @code{upcase} and [EMAIL PROTECTED] used @samp{$*} to collect them all. The fix is simple: +use @samp{$*} consistently. + +Next, with single-quoting, @code{capitalize} outputs a single character, +a set of quotes, then the rest of the characters, making it impossible +to invoke @code{Active} after the fact, and allowing the alternate macro [EMAIL PROTECTED] to interfere. Here, the solution is to use additional quoting +in the helper macros, then pass the final over-quoted output string +through @code{_arg1} to remove the extra quoting and finally invoke the +concatenated portions as a single string. + +Finally, when passed a double-quoted string, the nested macro [EMAIL PROTECTED] is never invoked because it ended up nested inside +quotes. This one is the toughest to fix. In short, we have no idea how +many levels of quotes are in effect on the substring being altered by [EMAIL PROTECTED] If the replacement string cannot be expressed entirely +in terms of literal text and backslash substitutions, then we need a +mechanism to guarantee that the helper macros are invoked outside of +quotes. In other words, this sounds like a job for @code{changequote} +(@pxref{Changequote}). By changing the active quoting characters, we +can guarantee that replacement text injected by @code{patsubst} always +occurs in the middle of a string that has exactly one level of +over-quoting using alternate quotes; so the replacement text closes the +quoted string, invokes the helper macros, then reopens the quoted +string. In turn, that means the replacement text has unbalanced quotes, +necessitating another round of @code{changequote}. + +In the fixed version below, (also shipped as [EMAIL PROTECTED]@value{VERSION}/@/examples/@/capitalize.m4}), @code{capitalize} +uses the alternate quotes of @samp{<<[} and @samp{]>>} (the longer +strings are chosen so as to be less likely to appear in the text being +converted). The helpers @code{_to_alt} and @code{_from_alt} merely +reduce the number of characters required to perform a [EMAIL PROTECTED], since the definition changes twice. The outermost +pair means that @code{patsubst} and @code{_capitalize_alt} are invoked +with alternate quoting; the innermost pair is used so that the third +argument to @code{patsubst} can contain an unbalanced [EMAIL PROTECTED]>>}/@samp{<<[} pair. Note that @code{upcase} and @code{downcase} +must be redefined as @code{_upcase_alt} and @code{_downcase_alt}, since +they contain nested quotes but are invoked with the alternate quoting +scheme in effect. + [EMAIL PROTECTED] +$ @kbd{m4 -I examples} +include(`capitalize2.m4')dnl +define(`active', `act1, ive')dnl +define(`Active', `Act2, Ive')dnl +define(`ACTIVE', `ACT3, IVE')dnl +define(`A', `OOPS')dnl +capitalize(active) [EMAIL PROTECTED],Ive +capitalize(`active') [EMAIL PROTECTED], Ive +capitalize(``active'') [EMAIL PROTECTED] +capitalize(```actIVE''') [EMAIL PROTECTED]' +undivert(`capitalize2.m4')dnl [EMAIL PROTECTED](`-1') [EMAIL PROTECTED] upcase(text) [EMAIL PROTECTED] downcase(text) [EMAIL PROTECTED] capitalize(text) [EMAIL PROTECTED] change case of text, improved version [EMAIL PROTECTED](`upcase', `translit(`$*', `a-z', `A-Z')') [EMAIL PROTECTED](`downcase', `translit(`$*', `A-Z', `a-z')') [EMAIL PROTECTED](`_arg1', `$1') [EMAIL PROTECTED](`_to_alt', `changequote(`<<[', `]>>')') [EMAIL PROTECTED](`_from_alt', `changequote(<<[`]>>, <<[']>>)') [EMAIL PROTECTED](`_upcase_alt', `translit(<<[$*]>>, <<[a-z]>>, <<[A-Z]>>)') [EMAIL PROTECTED](`_downcase_alt', `translit(<<[$*]>>, <<[A-Z]>>, <<[a-z]>>)') [EMAIL PROTECTED](`_capitalize_alt', [EMAIL PROTECTED] `regexp(<<[$1]>>, <<[^\(\w\)\(\w*\)]>>, [EMAIL PROTECTED] <<[_upcase_alt(<<[<<[\1]>>]>>)_downcase_alt(<<[<<[\2]>>]>>)]>>)') [EMAIL PROTECTED](`capitalize', [EMAIL PROTECTED] `_arg1(_to_alt()patsubst(<<[<<[$*]>>]>>, <<[\w+]>>, [EMAIL PROTECTED] _from_alt()`]>>_$0_alt(<<[\&]>>)<<['_to_alt())_from_alt())') [EMAIL PROTECTED]'dnl [EMAIL PROTECTED] example + @node Improved fatal_error @section Solution for @code{fatal_error} diff --git a/examples/capitalize.m4 b/examples/capitalize.m4 index 5c28de2..d4e4a50 100644 --- a/examples/capitalize.m4 +++ b/examples/capitalize.m4 @@ -1,8 +1,12 @@ -dnl -dnl convert to upper- resp. lowercase +divert(`-1') +# upcase(text) +# downcase(text) +# capitalize(text) +# change case of text, simple version define(`upcase', `translit(`$*', `a-z', `A-Z')') define(`downcase', `translit(`$*', `A-Z', `a-z')') -dnl -dnl capitalize a single word -define(`capitalize1', `regexp(`$1', `^\(\w\)\(\w*\)', `upcase(`\1')`'downcase(`\2')')') -define(`capitalize', `patsubst(`$1', `\w+', ``'capitalize1(`\0')')') +define(`_capitalize', + `regexp(`$1', `^\(\w\)\(\w*\)', + `upcase(`\1')`'downcase(`\2')')') +define(`capitalize', `patsubst(`$1', `\w+', `_$0(`\&')')') +divert`'dnl diff --git a/examples/capitalize2.m4 b/examples/capitalize2.m4 new file mode 100644 index 0000000..154dc50 --- /dev/null +++ b/examples/capitalize2.m4 @@ -0,0 +1,19 @@ +divert(`-1') +# upcase(text) +# downcase(text) +# capitalize(text) +# change case of text, improved version +define(`upcase', `translit(`$*', `a-z', `A-Z')') +define(`downcase', `translit(`$*', `A-Z', `a-z')') +define(`_arg1', `$1') +define(`_to_alt', `changequote(`<<[', `]>>')') +define(`_from_alt', `changequote(<<[`]>>, <<[']>>)') +define(`_upcase_alt', `translit(<<[$*]>>, <<[a-z]>>, <<[A-Z]>>)') +define(`_downcase_alt', `translit(<<[$*]>>, <<[A-Z]>>, <<[a-z]>>)') +define(`_capitalize_alt', + `regexp(<<[$1]>>, <<[^\(\w\)\(\w*\)]>>, + <<[_upcase_alt(<<[<<[\1]>>]>>)_downcase_alt(<<[<<[\2]>>]>>)]>>)') +define(`capitalize', + `_arg1(_to_alt()patsubst(<<[<<[$*]>>]>>, <<[\w+]>>, + _from_alt()`]>>_$0_alt(<<[\&]>>)<<['_to_alt())_from_alt())') +divert`'dnl hooks/post-receive -- GNU M4 source repository
