On Mon, Jun 22, 2020 at 8:28 AM Yann Ylavic <[email protected]> wrote: > > > Allow for URI-path pre_translate_name before (and/or instead of) decoding. > > > > Only if no hook takes "ownership" of the URI (returning OK), apply > > percent decoding for the rest of request handling. Otherwise r->uri remains > > encoded meaning that further location/directory/file/if/.. sections (walks) > > should that into account. > > This change allows a module to avoid URI decoding in > pre_translate_name, such that r->uri remains encoded for the whole > request handling. > > For instance (same series), mod_proxy will use that hook when > "ProxyMappingDecoded off" is configured, and return OK if a ProxyPass > matches. The decoding was already disabled (before this commit) for > the forward-proxy case (ProxyRequests on) since r->proxyreq was set at > an early stage, but now it "works" for the reverse-proxy case too. > > This means that when "ProxyMappingDecoded off" is configured (the > default is still "on", though), for instance a <Location URI-path, or > a fixup RewriteRule or a non-early RequestHeader rule will have to be > expressed with the encoded form since reserved characters will not be > decoded as before (note that unreserved characters will always be > decoded anyway). > > I don't know how much it can break existing configurations, and wonder > if we should have a core directive that still controls whether r->uri > should be decoded, regardless of ProxyMappingDecoded or any further > module pre_translate_name hook with its own directives. > > For mod_proxy this is not really an issue to have r->uri modified > after the (pre_)translate phase, because the handler will later work > on r->filename regardless (the "proxy:..." set by proxy_detect or > proxy_trans), so maybe we should simply always decode r->uri after the > pre_translate hooks have run. Though I find it easier in a proxy > context to match a URI-path (LocationMatch, RewriteRule..) in its > encoded form, and not worry about original vs decoded separators. That > may be just my preference, but a new core directive looks sane to me.. > > Thoughts?
I have not followed too closely so take with a grain of salt / for the sake of argument: >From an externals perspective (maybe being detached here is a benefit) it might be better to have something like "ProxyUseOriginalURL" that has the following caveats: 1. it's partially decoded 2. it is required for whatever the servlet mapping option is (iiuc?) 3. it will affect the string used in comparisons/substitutions for <Location*>, Rewrite, <If>, (*CAUTION*) - Some details is needed here about the partial decoding, for example %2f or multiple leading slashes so "simple" context root configs can be written without too much paranoia 4. If there are risks of re-injecting URL's with [PT] or per-dir rewrites try to explain that it's a complex combination to worry about. I would not worry too much about "existing configs" as this is all opt-in and it's probably a lost cause anyway. I would instead give people new built-in vars in the important places (rewrite, expr) to be able to match the partially decoded URI that the proxy will be working on so a savvy user can safely interoperate without monstrous regexes.
