On 01/03/2013 22:39, Konstantin Kolinko wrote: <snip/>
> 1. I have not tested, but as proposed rejection happens in > CoyoteAdapter, I think it will happen too early for ErrorReportValve > to work. > > As such, a user will receive a response that consists of HTTP status > line only, which browsers display as a blank page. > > I suspect that such errors can be triggered easily by a human user, > e.g. by mistyping an URL. I would not like to respond to those with a > blank page. Fair point. > 2. In many cases it would make sense to know some correct part of the > URL to choose a web application, and as such to handle the error in > webapp-specific manner. Also a fair point. > 3. I remember discussions on the mailing lists regarding whether it is > possible to make uri-encoding to be webapp-specific setting. That is never going to be possible because decoding has to happen before mapping. > Looking in Bugzilla, > https://issues.apache.org/bugzilla/show_bug.cgi?id=50504 That could be made to work although moving to UTF-8 for the default may reduce the need for this. > https://issues.apache.org/bugzilla/show_bug.cgi?id=48899 I'm not a fan if guessing character sets. I'm concerned about risks if the wrong charset is chosen. > At least it means that there are people that are interested in more > smart handling of URLs that use wrong encoding. Indeed. > 4. As such I think there are two ways to handle wrong and incomplete > characters: > > a) replace them with substitute character > b) throw exception and let fallback to ISO-8859-1 for the entire URL. > > Going with a) will provide slightly better handling if webapp name is > non-ASCII, as it can be selected more reliably. I was originally against this on the grounds that a broken URI should result in an immediate 400 response. However, your argument about blank responses and using custom error pages has convinced me to change my mind. Given that the decoders can be configured to provide this feature this is now my preferred option for URIs. > Going with b) should be easier to implement. b) This could easily fail if something like UTF-16 was used. Granted that is unlikely for the side-effects of falling back to ISO-8859-1 are pretty messy in this case. > If an application has an ASCII name, either a) or b) works. > > If the request is going to result in some error page, I think there is > no much difference between a) and b). For UTF-8, probably not. For some other encodings that are not supersets of ASCII or ISO-8859-1 then b) will fail. > If someone is going to process such broken URLs more smartly, I think > questions are > (1) whether one is able to detect broken URLs Only be trying and failing to decode it. > (2) whether one is able to recover original URL as submitted by client > > Using ISO-8859-1 might make recovery easier, but I think regardless of that > - one can use request.getRequestURI(), as that one is not affected by > encodings. Agreed. > - in a Valve one can access the bytes in the byte chunk and do custom > decoding? I believe so. In summary then, my new proposal (for URIs) looks like: - replace incomplete/invalid characters with the replacement character for the given encoding - drop the fallback to 'ASCII' - develop some more tests for UTF-8 decoding and likely switch to the Harmony decoder Handling for request bodies is still TBD. Mark --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org