-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Mark,

On 6/29/19 16:01, Mark Thomas wrote:
> On 29/06/2019 02:26, GitBox wrote:
>> alpire opened a new pull request #176: CoyoteAdapter: fix
>> out-of-bounds read in checkNormalize URL:
>> https://github.com/apache/tomcat/pull/176
>> 
>> 
>> On malformed requests, checkNormalize would throw an
>> ArrayIndexOutOfBoundsException leading to a 500 response. This
>> change fixes checkNormalize to return false instead of throwing
>> exception on those inputs, and adds a few tests to check the new
>> functionality.
> 
> That there are URI sequences that can trigger this is certainly 
> something that we need to fix.
> 
> On a slightly different topic, this got me thinking.
> 
> checkNormalize() is a test that runs on every request. It is
> essentially there to ensure that whatever character set has been
> specified for the Connector is "safe". In this case "safe" means
> when the bytes are converted to characters we don't get any
> unexpected "." or "/" characters in the result that make the
> character version of the URI non-normalized.
> 
> Rather than run this test on every request, we could: -
> pre-calculate the character sets we consider to be safe - throw an
> exception if the user attempts to configure the Connector to use
> one
> 
> I think we could safely exclude any character set where any of the 
> following were not true:
> 
> - 0x00 <-> '\u0000' - 0x2e <-> '.' - 0x2f <-> '/' - 0x5c <-> '\'
> 
> We could create a unit test that maintains/checks the list of
> "safe" character set canonical names. After creating the character
> set in Connector.setURIEncoding(), if the canonical name of the
> resulting character set is not in the safe list, throw an
> exception.
> 
> Then remove checkNormalize() and replace with a comment that
> explains why the conversion is known to be safe.
> 
> There are several loops through the URI in checkNormalize(). I
> think - I'd need to test to confirm - that removing them would
> provide a measurable performance benefit.

I'm a little worried about edge cases that might cause a misread of a
multibyte character into several single-byte characters that aren't
supposed to be treated as such. Remember that an attacker is not
obligated to follow the rules of sane e.g. UTF-8 or SHIFT-JS or
whatever. They can claim one encoding and then send complete garbage
down the line. We need to remain safe in all cases.

So I think checking the string AFTER decoding is the only truly safe
way to operate.

Note that I can't think of a specific way to break this off the top of
my head.

What kind of performance benefit are you looking at, here? Certainly,
every bit helps, but is the wire-to-servlet pipeline really that long
at this point?

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl0aIY8ACgkQHPApP6U8
pFhzrBAAwpIKVsiUIJiEQIEMsioeBSyShut/jncTjYNe1+JpdDWySDgQtuv6rzJm
ImMsi7/Iutm+OTvPyX/0ErkueYTwsdQTMS3dAGnyC+naViLQ+/C5XsOLIisHJwUy
suDJsVl5V0DbslXauhNiy+tsMzA8ipx87OneYO7gBMjcurLtMjvPiCnMvRS9eFah
C5bU6Emu9NrEC40MI7Jw+Il/loUIDlLbCDIqDjGxMb/tA8N0wXmIoIIbC1iAXPpy
fmSN4MyEUd0Xi4m/sv7kl71hmI09ivItHUtMdW89Tp8kvIEWUAnu3mDoLZGmw3+X
YZIxhW+95g4HydOh7ODuJa7Bqg+csa0ZpQl5u4jnnwGZrLWLGZ/w+tu5P83qHBp6
Yy1hdPVrhfAgLMV1Rmzj+5Heo0LGe2PbKvgpg4DMAFSed3SOnnW/w3NNPTjFhEBD
jiQi1TSXTZZQ6VRS/KWain4uQAWD2ouPipZl7XA8Uz+g0pEhh642s7LNUpbclH5y
JZ02GRL5GFAcb9ZTn0RlCQgDz/xK9Lm5eVbHmWkdtHw9CUgqWF9nD6ZhWXEEI39Z
vVQZboAZMlkijv3AJupjE6Q8PfPnXWaFdbtDqWMJQ+/c12xzPSGu+QcpgCYxyIfE
6kSig2FGR98lKXCE5bMMq6XuM1RNcHMyKUnOZ8NOxAhLfBbZcEw=
=zgdi
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to