-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Mark,
On 6/29/19 16:01, Mark Thomas wrote: > On 29/06/2019 02:26, GitBox wrote: >> alpire opened a new pull request #176: CoyoteAdapter: fix >> out-of-bounds read in checkNormalize URL: >> https://github.com/apache/tomcat/pull/176 >> >> >> On malformed requests, checkNormalize would throw an >> ArrayIndexOutOfBoundsException leading to a 500 response. This >> change fixes checkNormalize to return false instead of throwing >> exception on those inputs, and adds a few tests to check the new >> functionality. > > That there are URI sequences that can trigger this is certainly > something that we need to fix. > > On a slightly different topic, this got me thinking. > > checkNormalize() is a test that runs on every request. It is > essentially there to ensure that whatever character set has been > specified for the Connector is "safe". In this case "safe" means > when the bytes are converted to characters we don't get any > unexpected "." or "/" characters in the result that make the > character version of the URI non-normalized. > > Rather than run this test on every request, we could: - > pre-calculate the character sets we consider to be safe - throw an > exception if the user attempts to configure the Connector to use > one > > I think we could safely exclude any character set where any of the > following were not true: > > - 0x00 <-> '\u0000' - 0x2e <-> '.' - 0x2f <-> '/' - 0x5c <-> '\' > > We could create a unit test that maintains/checks the list of > "safe" character set canonical names. After creating the character > set in Connector.setURIEncoding(), if the canonical name of the > resulting character set is not in the safe list, throw an > exception. > > Then remove checkNormalize() and replace with a comment that > explains why the conversion is known to be safe. > > There are several loops through the URI in checkNormalize(). I > think - I'd need to test to confirm - that removing them would > provide a measurable performance benefit. I'm a little worried about edge cases that might cause a misread of a multibyte character into several single-byte characters that aren't supposed to be treated as such. Remember that an attacker is not obligated to follow the rules of sane e.g. UTF-8 or SHIFT-JS or whatever. They can claim one encoding and then send complete garbage down the line. We need to remain safe in all cases. So I think checking the string AFTER decoding is the only truly safe way to operate. Note that I can't think of a specific way to break this off the top of my head. What kind of performance benefit are you looking at, here? Certainly, every bit helps, but is the wire-to-servlet pipeline really that long at this point? - -chris -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl0aIY8ACgkQHPApP6U8 pFhzrBAAwpIKVsiUIJiEQIEMsioeBSyShut/jncTjYNe1+JpdDWySDgQtuv6rzJm ImMsi7/Iutm+OTvPyX/0ErkueYTwsdQTMS3dAGnyC+naViLQ+/C5XsOLIisHJwUy suDJsVl5V0DbslXauhNiy+tsMzA8ipx87OneYO7gBMjcurLtMjvPiCnMvRS9eFah C5bU6Emu9NrEC40MI7Jw+Il/loUIDlLbCDIqDjGxMb/tA8N0wXmIoIIbC1iAXPpy fmSN4MyEUd0Xi4m/sv7kl71hmI09ivItHUtMdW89Tp8kvIEWUAnu3mDoLZGmw3+X YZIxhW+95g4HydOh7ODuJa7Bqg+csa0ZpQl5u4jnnwGZrLWLGZ/w+tu5P83qHBp6 Yy1hdPVrhfAgLMV1Rmzj+5Heo0LGe2PbKvgpg4DMAFSed3SOnnW/w3NNPTjFhEBD jiQi1TSXTZZQ6VRS/KWain4uQAWD2ouPipZl7XA8Uz+g0pEhh642s7LNUpbclH5y JZ02GRL5GFAcb9ZTn0RlCQgDz/xK9Lm5eVbHmWkdtHw9CUgqWF9nD6ZhWXEEI39Z vVQZboAZMlkijv3AJupjE6Q8PfPnXWaFdbtDqWMJQ+/c12xzPSGu+QcpgCYxyIfE 6kSig2FGR98lKXCE5bMMq6XuM1RNcHMyKUnOZ8NOxAhLfBbZcEw= =zgdi -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org