I don't understand the behavior
of org.eclipse.jetty.util.UrlEncode.decodeUtf8To methods. Maybe I'm missing
some points, but IMHO there are several inconsistent behaviors in case
request data is not correctly encoded. I'm currently using v9.1.0 (but I
cannot see any change in latest v9.1.3) and I'm using UTF8 as charset for
decoding request data.

The strange behaviors I noticed are:

A) when parsing query string parameters
A.1) if the last value of the query string is an incomplete UTF8 sequence,
the value is added to the map by replacing the last character
with Utf8Appendable.REPLACEMENT (in my opinion this is the correct behavior)
A.2) if a token (ie a value or a key) in the middle of the query string is
an incomplete UTF8 sequence, that token is completely ignored and will
never be added to the map. You'll get just warn-level log message.

B) when parsing a form-urlencoded body of a POST or PUT request
B.1) if the last value of post data is an incomplete UTF8 sequence,
a Utf8Appendable.NotUtf8Exception exception is raised and it bubbles up to,
for instance, Request.getParameter(). And that is a RuntimeException...
B.2) if a token (ie a value or a key) in the middle of the body is an
incomplete UTF8 sequence, that token is ignored, just like point (A.2)
above.

I think that there are several issues in the two overloaded methods
org.eclipse.jetty.util.UrlEncode.decodeUtf8To
We have two overloaded methods decodeUtf8To in UrlEncoded class: the first
one accepts an array of byte as first parameter, while the latter takes an
InputStream. Namely the first one is used in scenario (A) and the second
one in scenario (B).

Both of them, deploy a Utf8StringBuilder to temporary store the current
parsed token. But when the token is converted into String we always call
buffer.toString() that can throw that exception if the bytes are not a
valid UTF8 sequence.
In (A.2) and (B.2), that call is inside a try-catch, but catch block do
nothing, so the buffer is not reset and the value is not added to the map.
In (B.1), call to toString() is outside try-catch so, the exception bubbles
up.
Scenario (A.1) is fine, because in that case (and only there) we use
buffer.toReplacedString() that has a much safer behavior: if the last
character is not a valid UTF8 sequence, the Utf8Appendable.REPLACEMENT is
appended, the exception is logged (but not thrown) and the resulting string
is returned.

IMHO, this is the correct behavior, so in
org.eclipse.jetty.util.UrlEncode.decodeUtf8To methods, we should replace
Utf8StringBuilder.toString calls with Utf8StringBuilder.toReplacedString .
Or am I missing something?

-- Ugo
_______________________________________________
jetty-users mailing list
[email protected]
https://dev.eclipse.org/mailman/listinfo/jetty-users

Reply via email to