Hello! On Mon, Apr 03, 2023 at 06:33:21PM +0000, Michael Kourlas via nginx-devel wrote:
> > Consider the following rewrite: > > > > rewrite ^/(.*) /$1 break; > > > > Assuming request to "GET /foo%2fbar/", what > > $uri_encoded_percent_and_reserved do you expect after each of > > these rewrites? > > I do not think that rewrite does anything in practice. Following the rewrite, > I would expect $uri to remain unchanged at its current value of "/foo/bar/" > and > $uri_encoded_percent_and_reserved to similarly remain unchanged at its current > value of "/foo%2fbar/". How do you expect this to be detected by the code? Direct comparison of the new URI as generated by the rewrite with the old URI? Bonus question: what happens with the following rewrite: rewrite ^/(.*) /bazz/$1 break; It is basically equivalent to the above except it certainly changes the URI. > > Similarly, consider the following rewrite: > > > > rewrite ^/foo/(.*) /$1 break; > > > > What $uri_encoded_percent_and_reserved is expected after the > > rewrite? > > In this case the regular expression matches $uri but not > $uri_encoded_percent_and_reserved. One could say that this just means that > only > $uri is updated, but that has the potential to cause confusion when a flag is > used that changes the control flow (unless the user explicitly opts into this > behaviour). > > This could be addressed by adding a "match-source" optional argument to > "rewrite" with three values (and a default of "uri"): > * "uri" - rewrite directive matches and changes $uri only > * "uri_encoded_percent_and_reserved" - rewrite directive matches and changes > $uri_encoded_percent_and_reserved only > * "all" - rewrite directive matches and changes both (if only one is matched, > directive is not applied) > > It might also be a good idea to add "uri_encoded_percent_and_reserved_regex" > and "uri_encoded_percent_and_reserved_replacement" arguments to be used with > "all", so that it is possible to use the same directive and flag even when > needing to perform slightly different rewrites for $uri versus > $uri_encoded_percent_and_reserved. So, your original suggestion that "every transformation applied to $uri ... is automatically applied to $uri_encoded_percent_and_reserved as well" is no longer relevant, correct? Considering "match-source=uri", how do you expect the resulting $uri_encoded_percent_and_reserved to be set? From the description it looks like it is expected to remain unchanged, though for the above example this will result in $uri being "/bar/" and $uri_encoded_percent_and_reserved being "/foo%2fbar/", which looks certainly wrong. Is it really the intention? The same question applies to "match-source=uri_encoded_percent_and_reserved". > > Consider the following configuration: > > > > location /foo%2fbar/ match-source=uri-encoded-percent-and-reserved { > > ... > > } > > > > location /foo/bar/ match-source=uri { > > ... > > } > > > > The question is: which location is expected to be matched for the > > request "GET /foo%2fbar/"? > > Both blocks match their respective variables. Since the first block has the > longest matching prefix, I expect it will be selected. That's not how prefix locations work: they are matched based on the longest prefix, and order of locations does not matter (and not even preserved/known during matching, since matching uses a prefix tree). So your suggestion implies that ordered matching should be introduced for prefix locations, correct? > > Which location is expected to be matched for the request "GET > > /foo%2Fbar/" (note that it is exactly equivalent to "GET > > /foo%2fbar/"). > > Only the second block matches its variable, so I expect it will be selected. > > Although paths are generally case sensitive, a percent-encoded character is > not > supposed to be, so this behaviour is unfortunate. One possibility is to > automatically use case-insensitive matching for any part of a location prefix > using "match-source=uri-encoded-percent-and-reserved" that is a percent > encoded "%" or reserved character. > > Alternatively this behaviour could be documented with an instruction to use > regular expressions instead. You mean, only allow "match-source=uri-encoded-percent-and-reserved" for locations given by regular expressions, correct? > > Assuming static handling in the locations, what happens with the > > request "GET /foo%2fbar/..%2fbazz"? > > The first block would be used. However, $uri would be used for static > handling, so the path "/foo/bazz" would be looked up on the filesystem, not > "/foo%2fbar/..%2fbazz". So it can be easily used to bypass security restrictions, such as in: location /foo%2fbar/ match-source=uri-encoded-percent-and-reserved { } location /admin/ { allow 127.0.0.1; deny all; } Note that the request "GET /foo%2fbar/..%2f..%2fadmin/secret" is going to access protected files in <root>/admin/, while it shouldn't be allowed to. > > Note that the behaviour does not seem to be obvious, and it is an > > open question if it can be clarified to be safe. > > Fair enough. I am certainly happy to continue making changes to my proposal to > address the specific concerns you raise. However, are you saying that you have > broader overall concerns about safety, complexity, etc. that make a patch > implementing the proposal unlikely to be accepted, even if all specific > concerns are addressed? I've just asked some question on how your proposal is expected to work - and summarized that the behaviour suggested is certainly not obvious, since there are lots of questions on how it is expected to behave in various edge cases. But I certainly agree that there are issues with safety and complexity in the proposal. And I don't think that addressing any specific concerns will help here, since addressing them seem to result in increased complexity and more concerns. It might be a good idea to start from scratch instead of trying to fix the proposal. Just in case, below is the summary of what I think about the topic. Short version: All solutions I'm aware of suck. Don't use encoded slashes in URIs, it hurts. Long version: First of all, forget about "reserved characters". The only character that really matters is slash ("/"), as this is the only character which is indeed reserved in URI path nginx works with. There are 3 basic approaches to handling encoded slashes in URIs, mostly outlined in Apache's AllowEncodedSlashes directive (see https://httpd.apache.org/docs/2.4/mod/core.html#allowencodedslashes for details): 1. Reject them. This what Apache does by default (though with questionable error code). 2. Decode them and assume equivalent to non-encoded slashes. This is what nginx does. Apache does something similar with "AllowEncodedSlashes On", but given the note in the directive description it looks like it does not implement the "assume equivalent" part. 3. Do not decode them and expect encoded slashes are corrupted if URI is re-encoded. This also implies that if URI is not re-encoded but proxied as is, restrictions like in "location /admin/ { deny all; }" can be bypassed if slashes are decoded by the backend server. All of the approaches have their pros and cons, with the reject one being the safest, while decode and do not decode resulting in different forms of corruptions, and not decoding resulting in potential security issues on proxying. I don't think that trying to combine different approaches in the normal location matching is going to work. Rather, this will result in unmanageable complexity and will be unsafe, as in the above proposal. In nginx, the only implemented approach is "decode". It does, however, also provides the original request URI in the $request_uri variable, so the "reject" approach can be trivially implemented in the configuration: if ($request_uri ~* "%2f") { return 400; } Similarly, as already mentioned in #2225, one can do something like this: if ($request_uri ~ "^/api/objects/[^/]+/subobjects(/.*)?$") { ... } Further, nginx makes it possible to proxy the request without any URI modifications, as in "proxy_pass http://upstream;" (note no URI component after the upstream server name), so requests with encoded slashes can be inspected and proxied without corruption. This might be already enough for most, if not all, tasks involving encoded slashes: they can be handled without corruption if things are properly configured. Given the above options, this is more than usually available. Note that $request_uri matching doesn't imply various normalizations nginx usually does on URI before matching locations, such as decoding encoded characters, so something like "GET /%61pi/objects/..." won't be matched. This can be improved by providing an additional variable, or by a smart enough urldecode() function (see ticket #52), or by an arbitrary normalization written in an embedded language, such as Perl or njs. Or it might not worth the effort though, and just rejecting such non-normalized requests would be a good enough solution for most use cases. Similarly, it might be possible to explicitly implement the "do not decode" approach, with something like "decode_slashes off;" (similarly to "merge_slashes off;"). I tend to think that it is going to be just another "never use it, it is not secure" directive, much like "merge_slashes", and the mere existence of this directive is not going to help anyone. (Also, it might be actually a good idea to remove "merge_slashes" instead.) Bonus game: The dot (".") character is not reserved in URIs, yet it has special meaning when used in "/./" and "/../" constructs. And escaping dots won't help: since "." is not reserved, "/%2e/" is exactly equivalent to "/./", and it is in turn equivalent to "/". As such, APIs designed with arbitrary object names in URIs and assuming escaping is enough simply won't be able to work for these names. Hope this helps. -- Maxim Dounin http://mdounin.ru/ _______________________________________________ nginx-devel mailing list nginx-devel@nginx.org https://mailman.nginx.org/mailman/listinfo/nginx-devel