On 3/13/23 2:57 PM, Yann Ylavic wrote:
> On Mon, Mar 13, 2023 at 2:32 PM Eric Covener <cove...@gmail.com> wrote:
>>
>> On Mon, Mar 13, 2023 at 8:31 AM Yann Ylavic <ylavic....@gmail.com> wrote:
>>>
>>> Maybe we could make [B=] and [BTCLS] not mutually exclusive (encode
>>> both controls and whatever in B=).
>>> I was thinking of a [BNEG] flag too (encode everything but what's in
>>> B=), and never encode alnum or '_', so all in all some further patch
>>> like the attached one. WDYT?
>>
>> Looks good to me, I have an ancient patch where I did something very
>> similar to a copy of int:escape where you could set exceptions in
>> subprocess_env. The config is ugly so I never upstreamed it.
>>
>> if you make the change I can add some tests/doc. Should caution
>> against plain [B] and refer to the others in the doc?
>
> r1902323, thanks for the tests/doc.
> IIUC, when the query-string is rewritten, I think we should caution
> against using [B] with a redirect (double encoding) and not using
> [BCTLS] (or some careful flavor of [B]) for a non-redirect..
>
I think the biggest issue with [B] is that when you capture data from the path
which is decoded
and try to use this data in the new URL in path and query string at the same
time. For putting
it in the path B does the wrong thing, for putting it in the query string it
does the correct thing.
If you do not use B then the substitution in the path will be correct, but not
in the query string.
Apart from the additional B variants now available I guess a further internal
rewritemap can help
that unlike 'int:escape' would consider using '+' instead of %20 to encode
spaces.
Of course if you capture data from the query string you have to fight with the
other way round and
need to ensure that you decode the data before putting it in the path as
otherwise you probably have
an issue with double encoding.
BTW: I guess if time is found we should reorganize our escaping / unescaping
code. It seems to be
widespread across the codebase to handle various special cases / edge cases and
it seems to duplicate
code in a lot of locations. I think it would be better if it could be
concentrated in server/util.c
and if needed add some helper functions that can be reused if we cannot move
all special cases / edge cases
there such that their handling uses a more common base.
Another thought that I had from the performance point of view is if we could
create an API that would allow
to build tables similar to the test_char_table on the "fly" e.g. during init
and let functions use these
tables for effective character classification. I think this would be more
effective for longer lists of characters
then using apr_isalnum or strchr in loops again and again.
Regards
Rüdiger