Hello! On Thu, Mar 30, 2023 at 05:19:08PM +0000, Michael Kourlas via nginx-devel wrote:
> Hello, > > Thanks again for your comments. > > > This implies, basically, that there are 3 forms of the request > > URI: 1) fully encoded, as in $request_uri, 2) fully decoded, as in > > $uri now, and 3) "all-except-percent-and-reserved". To implement this > > correctly, it needs clear definition when each form is used, and > > it is going to be a non-trivial task to do this safely. > > I agree. A simple way to do this would be to make percent-decoding > customizable > on a per-directive basis. The core use case I was hoping to support is > preserving encoded reserved characters in location matching (basically what > was > proposed in [1]), so that is what I would like to focus on in a reworked > version of this patch. > > I propose the following: > > (1) The addition of a new variable called $uri_encoded_percent_and_reserved. > As > discussed, this variable is a special version of the normalized URI ($uri) > that preserves any percent-encoded "%" or reserved characters. > > (2) Every transformation applied to $uri (e.g. from the "rewrite" directive, > internal redirects, etc.) is automatically applied to > $uri_encoded_percent_and_reserved as well. > > If this raises performance concerns, a new flag could be added to enable or > disable the availability of $uri_encoded_percent_and_reserved. You suggest that transformations of $uri are "automatically applied" to the non-fully-decoded variant. Consider the following rewrite: rewrite ^/(.*) /$1 break; Assuming request to "GET /foo%2fbar/", what $uri_encoded_percent_and_reserved do you expect after each of these rewrites? Similarly, consider the following rewrite: rewrite ^/foo/(.*) /$1 break; What $uri_encoded_percent_and_reserved is expected after the rewrite? > (3) The addition of a new optional parameter to the URI form of "location" > blocks called "match-source": > > location [ = | ~ | ~* | ^~ ] uri > [match-source=uri|uri-encoded-percent-and-reserved] { > ... > } > > For example: > > location ~ ^/api/objects/[^/]+/subobjects(/.*)?$ > match-source=uri-encoded-percent-and-reserved { > ... > } > > "match-source=uri" is the default and the current behaviour. When > "uri-encoded-percent-and-reserved" is used, the location matching for that > block uses $uri_encoded_percent_and_reserved rather than $uri. Nested location > blocks are not affected (unless they also use > "uri-encoded-percent-and-reserved"). > > In future it would be possible to use a similar pattern with other directives > that use $uri, such as "proxy_pass", but that can be done as part of a > separate > patch. > > If you think this is a sensible approach, I will submit a revised patch > implementing it. Consider the following configuration: location /foo%2fbar/ match-source=uri-encoded-percent-and-reserved { ... } location /foo/bar/ match-source=uri { ... } The question is: which location is expected to be matched for the request "GET /foo%2fbar/"? Other questions include: - Which location is expected to be matched for the request "GET /foo%2Fbar/" (note that it is exactly equivalent to "GET /foo%2fbar/"). - Assuming static handling in the locations, what happens with the request "GET /foo%2fbar/..%2fbazz"? Note that the behaviour does not seem to be obvious, and it is an open question if it can be clarified to be safe. -- Maxim Dounin http://mdounin.ru/ _______________________________________________ nginx-devel mailing list nginx-devel@nginx.org https://mailman.nginx.org/mailman/listinfo/nginx-devel