[PATCH branch-1.6] macro: Wipe quote_age after changequote during argument collection

Eric Blake via M4-patches Fri, 09 Jan 2026 11:01:08 -0800

I recently figured out a way to (ab)use translit/changequote to
perform O(n) tokenization of a string with a single-byte separator
when elements of the string do not have to worry about being used
unquoted (better than naive O(n^2) looping on index/substr or even O(n
log n) divide-and-conquer substr on halves of the string).


But while my discovery worked in m4 1.4.19 and with BSD m4, and even
worked in branch-1.6 if the changequote occurs outside of the
"requote" call that I added in the manual, it failed on branch-1.6
with changequote moved later during argument collection, before this
patch.  It turns out that I stumbled on a scenario where
argv.quote_age and quote_age() both matched, but still differed from
the argv->quote_age in place before the translit call, and so I was
still using `' instead of the new quote characters in the expansion of
$@.

Since this regression was never released, it is not worth a NEWS
entry.
---

New year, new bug.  I'm pushing this along with a bump in copyright
years.  I also think it is worth backporting the unit test to
branch-1.4; I'll take care of that later.  Any time I can come up with
a way to get better scaling out of m4, I want to keep it working.

 doc/m4.texi | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 src/macro.c |  2 +-
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/doc/m4.texi b/doc/m4.texi
index 78e46bb7..28d73f02 100644
--- a/doc/m4.texi
+++ b/doc/m4.texi
@@ -5396,6 +5396,52 @@ Changequote
 @result{}hiHIhi
 @end example

+During macro expansion, instances of @code{$@@} in the macro's
+definition will use the quotation strings that are in effect at the end
+of argument collection, even if this is different than the quotation
+strings in effect when the macro was defined.  When combined with
+@code{translit} (@pxref{Translit}), this can be exploited for splitting
+a string containing multiple instances of a single-byte separator into a
+macro call on each token of the string with linear scaling (a naive loop
+that uses @code{index} to search for the first instance of the
+separator, followed by @code{substr} to process the rest of the input
+string, scales quadratically, since each iteration of the loop can only
+process one substring before re-processing an average of half of the
+overall input to get to the next substring, instead of getting at all
+substrings in a single pass).  However, note that this trick cannot
+prevent premature expansion of tokens within the string.
+
+@example
+define(`some', `several')
+@result{}
+define(`text', `long.string.with.some.separators')
+@result{}
+define(`display', ``<$1>'')
+@result{}
+display(text)
+@result{}<long.string.with.several.separators>
+display(defn(`text'))
+@result{}<long.string.with.some.separators>
+dnl quotes are still `' at the time requote is defined:
+define(`requote', `"[$@@]')
+@result{}
+define(`tokenized', requote(translit(defn(`text'), `.',
+changequote(`"[', `]')"[,])))
+@result{}
+dnl but quotes are "[] at the time $@@ in requote is computed
+changequote
+@result{}
+dnl quotes are back to `', yet the content of tokenized still has "[]
+dnl however, this form of tokenizing already expanded "some"
+tokenized
+@result{}"[long],"[string],"[with],"[several],"[separators]
+define(`a', ` display(`$1')')
+@result{}
+dnl now it is possible to call `a' on each token
+translit(defn(`tokenized'), `"[],', `a()')
+@result{} <long> <string> <with> <several> <separators>
+@end example
+
 @ignore
 @comment And another stress test, not worth documenting in the manual.
 @example
diff --git a/src/macro.c b/src/macro.c
index 0bc43457..06661983 100644
--- a/src/macro.c
+++ b/src/macro.c
@@ -559,7 +559,7 @@ collect_arguments (symbol *sym, call_info *info, struct 
obstack *arguments,
   argv->wrapper = args.wrapper;
   argv->has_ref = args.has_ref;
   argv->has_func = args.has_func;
-  if (args.quote_age != quote_age ())
+  if (args.quote_age != quote_age () || !quote_age ())
     argv->quote_age = 0;
   argv->arraylen = args.arraylen;
   return argv;
-- 
2.52.0


_______________________________________________
M4-patches mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/m4-patches

[PATCH branch-1.6] macro: Wipe quote_age after changequote during argument collection

Reply via email to