On Sunday 04 October 2015 12:51:13, Graham Leggett wrote:
> On 04 Oct 2015, at 12:46 PM, Rainer Jung <[email protected]>
wrote:
> > Yes, I agree. When starting to think closer, I noticed that the
> > string mode currently only supports a syntax that is pretty
> > different from the boolean mode and is much more limited. In that
> > mode everything is a string except it is marked via %{XXX}, in
> > which case XXX is a variable name, except XXX is AAA:BBB in which
> > case it is AAA("BBB").
> >
> > So AFAIK we don't support functions with more than one argument in
> > string mode and my naive idea of using "STRING =~
> > s/PATTERN/REPLACEMENT/FLAGS" runs into the problem, that we
> > currently don't support operators like "=~" etc. in string mode.
This is correct.
> > So I wonder whether it would be useful to allow for a more general
> > mode which would depending on operators or functions handle the
> > argument and result as strings or booleans using auto conversion
> > between them where needed. Of course in that mode verbatim
> > strings would need proper quoting (unlike pure string mode in
> > which everything by default is a verbatim string). We could then
> > even support>
> > BOOLEXPR ? STRINGEXPR1 : STRINGEXPR 2
> >
> > For compatibility that generalized mode would probably need a mode
> > differentiator syntax for compatibility reasons in 2.4 but could
> > be the default mode in trunk. Something like your "%!" prefix.
This is definitely a possible approach. I am not 100% sure that we
would want that mode to become the default, though, because it would
always require double quoting for simple string expressions. Like
LogMessage "'Foo=%{HTTP_FOO}'"
Somehow I also think this approach would be quite a bit of work,
especially to deal with all corner cases and ambiguities introduced by
auto conversion.
Another possible approach would be to implement functions with
multiple arguments in string mode first and worry about an easier
syntax second. If I remember correctly, I once planned to have
%{FUNCTION:'arg1','arg2'} as syntax for this. But i did not get
around to implementing it.
Now that I think of it, maybe
%{FUNCTION: X/arg1/arg2/arg3 }
would be another good syntax for it, where X is an (optional?) letter
and the / separator could be chosen from a list of separators, just
like is already possible with the m/foo/i regex syntax. Or make it
%{FUNCTION/arg1/arg2/arg3}
If we add optional whitespace at the beginning and end, and give our
rexec function an alias of 's', we would get something like
%{ s/TEXT/PATTERN/REPLACEMENT/FLAGS } or
%{ s/PATTERN/REPLACEMENT/FLAGS/TEXT/ }
which is not perfect but maybe acceptable from a readability point of
view.
> How about a regex function?
>
> The single argument could be “s/PATTERN/REPLACEMENT/FLAGS”.
I think this would be easy to implement. It would require the regex to
be parsed on every execution, though, which has the disadvantages that
it is slower and that one would get error messages only during first
execution and not during server startup. Also, it would possibly allow
the admin to configure expressions where the regex pattern can contain
untrusted data, which would turn a lot of libpcre problems from local
into remote vulnerabilities.
If everything else fails or goes nowhere, we can do this. But I would
like to try implementing a better solution, first.