On 07/26/2016 08:28 AM, Ján Tomko wrote: > This check has a static list of words that are checked for repetitions. > Expand it before running the perl script to avoid using expensive > captures. > --- > ChangeLog | 9 +++++++++ > top/maint.mk | 7 ++++++- > 2 files changed, 15 insertions(+), 1 deletion(-) > > diff --git a/ChangeLog b/ChangeLog > index 7dd78e3..b698a6c 100644 > --- a/ChangeLog > +++ b/ChangeLog > @@ -1,5 +1,14 @@ > 2016-07-26 Ján Tomko <[email protected]> > > + maint.mk: expand the prohibit_doubled_word regex > + > + This check has a static list of words that are checked for > + repetitions. > + Expand it before running the perl script to avoid using expensive > + captures.
gnulib is still stuck in the old ways of GNU-style changelog entries
where you call out the file and section touched, as in:
* maint.mk (prohibit_doubled_word): Pre-expand the regex to
avoid expensive perl regex backreferences.
Can be touched up on commit.
>
> +prohibit_doubled_words_ = \
> + the then in an on if is it but for or at and do to
> +# expand the regex before running the check to avoid using expensive captures
> +prohibit_doubled_word_expanded_ = \
> + $(shell echo $(prohibit_doubled_words_) | sed -r
> 's/\b(\S+)\b/\1\\s\+\1/g')
I bet GNU make has builtins that could do this operation without forking
to $(shell). This stage results in a variable containing:
the\s\+the then\s\+then ...
Maybe:
$(join $(prohibit_doubled_words_),$(addprefix
\s\+,$(prohibit_doubled_words_)))
> prohibit_doubled_word_RE_ ?= \
> - /\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt]o)\s+\1\b/gims
> + /\b(?:$(subst $(space),|,$(prohibit_doubled_word_expanded_)))\b/gims
At any rate, you want to end up with the perl regex:
\b(?:the\s\+the|then\s\+then|...)\b/gims
> prohibit_doubled_word_ = \
> -e 'while ($(prohibit_doubled_word_RE_))'
> \
> $(perl_filename_lineno_text_)
>
At any rate, I doubt my make fine-tuning matters, and you are definitely
correct that avoiding back-references makes perl regexes more efficient.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
