On 01/03/2013 22:39, Konstantin Kolinko wrote:

<snip/>

> 1. I have not tested, but as proposed rejection happens in
> CoyoteAdapter, I think it will happen too early for ErrorReportValve
> to work.
> 
> As such, a user will receive a response that consists of HTTP status
> line only, which browsers display as a blank page.
> 
> I suspect that such errors can be triggered easily by a human user,
> e.g. by mistyping an URL. I would not like to respond to those with a
> blank page.

Fair point.

> 2. In many cases it would make sense to know some correct part of the
> URL to choose a web application, and as such to handle the error in
> webapp-specific manner.

Also a fair point.

> 3. I remember discussions on the mailing lists regarding whether it is
> possible to make uri-encoding to be webapp-specific setting.

That is never going to be possible because decoding has to happen before
mapping.

> Looking in Bugzilla,
> https://issues.apache.org/bugzilla/show_bug.cgi?id=50504

That could be made to work although moving to UTF-8 for the default may
reduce the need for this.

> https://issues.apache.org/bugzilla/show_bug.cgi?id=48899

I'm not a fan if guessing character sets. I'm concerned about risks if
the wrong charset is chosen.

> At least it means that there are people that are interested in more
> smart handling of URLs that use wrong encoding.

Indeed.

> 4. As such I think there are two ways to handle wrong and incomplete 
> characters:
> 
> a) replace them with substitute character
> b) throw exception and let fallback to ISO-8859-1 for the entire URL.
> 
> Going with a) will provide slightly better handling if webapp name is
> non-ASCII, as it can be selected more reliably.

I was originally against this on the grounds that a broken URI should
result in an immediate 400 response. However, your argument about blank
responses and using custom error pages has convinced me to change my
mind. Given that the decoders can be configured to provide this feature
this is now my preferred option for URIs.

> Going with b) should be easier to implement.

b) This could easily fail if something like UTF-16 was used. Granted
that is unlikely for the side-effects of falling back to ISO-8859-1 are
pretty messy in this case.

> If an application has an ASCII name, either a) or b) works.
> 
> If the request is going to result in some error page, I think there is
> no much difference between a) and b).

For UTF-8, probably not. For some other encodings that are not supersets
of ASCII or ISO-8859-1 then b) will fail.

> If someone is going to process such broken URLs more smartly, I think
> questions are
> (1) whether one is able to detect broken URLs

Only be trying and failing to decode it.

> (2) whether one is able to recover original URL as submitted by client
> 
> Using ISO-8859-1 might make recovery easier, but I think regardless of that
> - one can use request.getRequestURI(), as that one is not affected by 
> encodings.
Agreed.
> - in a Valve one can access the bytes in the byte chunk and do custom 
> decoding?
I believe so.

In summary then, my new proposal (for URIs) looks like:
- replace incomplete/invalid characters with the replacement character
for the given encoding
- drop the fallback to 'ASCII'
- develop some more tests for UTF-8 decoding and likely switch to the
Harmony decoder

Handling for request bodies is still TBD.

Mark



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to