Hello! On Wed, Feb 15, 2023 at 11:50:13AM -0500, Michael Kourlas via nginx-devel wrote:
> # HG changeset patch > # User Michael Kourlas <michael.kour...@solace.com> > # Date 1676408746 18000 > # Tue Feb 14 16:05:46 2023 -0500 > # Node ID 129437ade41b14a584fb4b7558accc1b8dee7f45 > # Parent cffaf3f2eec8fd33605c2a37814f5ffc30371989 > HTTP: Add new uri_normalization_percent_decode option > > This patch addresses ticket #2225 by adding a new > uri_normalization_percent_decode configuration option that controls which > characters are percent-decoded by nginx as part of its URI normalization. > > The option has two values: "all" and "all-except-reserved". "all" is the > default value and is the current behaviour. When the option is set to > "all-except-reserved", nginx percent-decodes all characters except those in > the > reserved set defined by RFC 3986: > > reserved = gen-delims / sub-delims > > gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" > > sub-delims = "!" / "$" / "&" / "'" / "(" / ")" > / "*" / "+" / "," / ";" / "=" > > In addition, when "all-except-reserved" is used, nginx will not re-encode "%" > from the request URI when it observes that it is part of a percent-encoded > reserved character. > > When nginx percent-decodes reserved characters, this can often change the > request URI's semantics, making it impossible to use a normalized URI for > certain use cases. "uri_normalization_percent_decode" gives the configuration > author the freedom to determine which reserved characters are semantically > relevant and which are not. > > For example, consider the following location block, which handles part of a > hypothetical API: > > location ~ ^/api/objects/[^/]+/subobjects(/.*)?$ { > ... > } > > Because nginx always normalizes "%2F" to "/", this location block will not > match a path of /api/objects/sample%2Fname/subobjects, even if the API permits > "/" to appear percent-encoded in the URI as part of object names. nginx will > instead interpret this as /api/objects/sample/name/subobjects, a completely > different path. Setting "uri_normalization_percent_decode" to > "all-except-reserved" will leave "%2F" encoded, resulting in the expected > behaviour. Thanks for the patch. As far as I understand, it will irreversibly corrupt URIs with double-encoded reserved characters. For example, "%252F" will become "%2F" when proxying in the following configuration: location /foo/ { proxy_pass http://upstream/foo/; } Further, requests to static files with (properly escaped) reserved characters will simply fail, because nginx won't decode these characters. For example, in the following trivial configuration a request to "/foo%3Fbar" won't be decoded to match "/foo?bar" file under the document root: location / { # static files } Please also note that the configuration directive you've introduced in this patch applies to URI parsing from not-yet-final server block (see [1] for details), but the configuration from the final server block will be used for URI escaping. These configuration can be different, and this might result in various additional issues. Overall, I tend to think that the suggested patch will introduce much more problems than it tries to solve, and I would rather not. [1] http://nginx.org/en/docs/http/server_names.html#virtual_server_selection -- Maxim Dounin http://mdounin.ru/ _______________________________________________ nginx-devel mailing list nginx-devel@nginx.org https://mailman.nginx.org/mailman/listinfo/nginx-devel