Am 2020-03-13 um 15:35 schrieb Mark Thomas:
Hi all,

I am writing this up as this is a change I'd like to make in Tomcat 10
that I think is important to get right. It may also get back-ported.

This first arose in this mod_jk bug:
https://bz.apache.org/bugzilla/show_bug.cgi?id=62459

Ignoring the mod_jk aspects for now (they will come later) the bug
report raises the important question of how to handle the case where the
ID for a resource in a RESTful API includes a "/".

At the moment, Tomcat does not handle this correctly. If
ALLOW_ENCODED_SLASH is false, the request is rejected. If it is true,
the wrong resource identifier will be used. This is an edge case, but
one I'd like to fix.

My research led me back to RFC 3986. Quoting from section 2.2:

<quote>
The purpose of reserved characters is to provide a set of delimiting
characters that are distinguishable from other data within a URI.
URIs that differ in the replacement of a reserved character with its
corresponding percent-encoded octet are not equivalent.  Percent-
encoding a reserved character, or decoding a percent-encoded octet
that corresponds to a reserved character, will change how the URI is
interpreted by most applications.  Thus, characters in the reserved
set are protected from normalization and are therefore safe to be
used by scheme-specific and producer-specific algorithms for
delimiting data subcomponents within a URI.
</quote>

My reading of this is that there are some %nn sequences that we should
*never* decode. The values we pass to applications for ServletPath,
PathInfo etc. should still include these %nn sequences and the
application should decode them.

My next thought was "Which %nn sequences should be leave alone?". That
got me thinking about URIEncoding values and how to differentiate
between a %nn sequence we wanted to leave alone and the same sequence
appearing where a code point is represented by multiple bytes.
Fortunately, RFC7230 saves us from that complication as it requires all
encodings to be supersets of US-ASCII. Or to put is another way, the
only time %nn appears where nn is in the range 00 to 7F that %nn
sequence will *always* be representing the equivalent US-ASCII code point.

So, that simplifies things a little as we go back to considering which
%nn sequences we have to leave alone.

The starting point is "reserved" characters. From RFC 3986:

reserved    = gen-delims / sub-delims

gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
             / "*" / "+" / "," / ";" / "="

We are talking about URIs in Tomcat which, at the point we %nn decode,
is just the path. The path parameters and query string have been removed.

From RFC 7230:

absolute-path = 1*( "/" segment )

and from RFC 3986:

segment       = *pchar

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"


So the question is, which reserved characters cannot be safely decoded
from their %nn form.

We know all subdelims because:
- they are valid characters in a segment and with the query string and
   path parameters removed, none of those characters have special meaning

That leaves gen-delims

Of those ":" and "@" are explicitly allowed in a segment. So that leaves:

"/" "?" "#" "[" "]"

"?" is the query delimiter but the query string has been removed so it
is safe to %nn decode to "?".

"#" is the fragment delimiter. The fragment will never reach the server
so it is safe to %nn decode to "#".

"[" and "]" are delimiters in the host but not the path so they are safe.

That just leaves "/".

My proposal is, therefore, actually very simple:

1. Remove the UDecoder.ALLOW_ENCODED_SLASH option.
2. Replace it with a per Connector setting that has three options:
    a) deny (equivalent to ALLOW_ENCODED_SLASH="false")
    b) decode (equivalent to ALLOW_ENCODED_SLASH="true")
    c) allow (leaves as is)

I am CC'ing our expert olegk@ on this topic because at HttpComponents we had numerous JIRA issues regarding the handling and RFC 3986 interpretation. It is, sadly, a constant source of trouble.

Oleg, can you share your view on Mark's proposal?

Michael


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to