Re: Bug 23929: ServletRequest.setCharacterEncoding()
Remy Maucherat wrote: Jess Holle wrote: Remy Maucherat wrote: This is a good question -- but one which only applies to POST. My bug case was explictly with GET. If there is an entity body encoding specified in the request, then I am not sure which should override. If there is not, then I would presume setCharacterEncoding() should win out. If the only issue is when these differ, then I believe that site designers should simply ensure they don't. I think you should read the HTTP RFC. content-type does not apply to the URI or the HTTP header. The fact that setCharacterEncoding would apply to (part of) the URI and/or the header violates the RFC on URIs. Anyway, to put it simply: in the next release, add useBodyEncodingForURI=true on the connector, and you're done. Please don't complain that it won't do what you want before trying it. You can also use the URIEncoding attribute to specify the path encoding. Rémy my 2 cents on this issue, Remy is for sure right stating that a) the HTTP RFC does not cover variable character encoding for query parameters for different requests b) it is (sounds ?) logical to assume that the whole URI (including paths, query parameters etc.) should be considered as being encoded with the same character encoding From a developers point of view however, applying the above two points a) brakes expected behaviour (setCharacterEncoding() method does not work the same as before) b) does not give an acceptable alternative (if all parameter passing could be solved with POST method, then the GET method would not be needed, would it?) c) a lot of web apps stopped working when an upgrade of the tomcat version was performed So I think it is legitimate to be upset when first confronted with this change of behaviour. As for how easy it is to NOT file duplicate bugs on this issue, having followed this debate, I have collected the following list of somehow related bugs bug 25360 bug 25231 bug 25235 bug 22666 bug 24557 bug 24345 bug 23929 bug 25848 and of course a bunch of messages in the developer list Speaking for myself and having reread these messages: Assuming I 've been working for some time with the old behaviour and experienced the new one, I would not be able to understand why this change was made, EVEN if someone gave me the above list of bugs. I propose the following: write a short summary of why this change was necessary and include the above list of bugs, as well as links to the related developer list threads. Then submit a link to this summary to all the above bugs. If not already done, port the useBodyEncodingForURI parameter to the next 4.1.x release. I volunteer to write the summary if the list thinks that the proposal is reasonable. Regards Stefanos - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Bug 23929: ServletRequest.setCharacterEncoding()
From a developers point of view however, applying the above two points a) brakes expected behaviour (setCharacterEncoding() method does not work the same as before) b) does not give an acceptable alternative (if all parameter passing could be solved with POST method, then the GET method would not be needed, would it?) c) a lot of web apps stopped working when an upgrade of the tomcat version was performed So I think it is legitimate to be upset when first confronted with this change of behaviour. I will not claim that I was reasonable when originally confronted with the change. I will say that: 1. Our existing (4.1.x) usage of setCharacterEncoding() works across all recent servlet engines tested [including 2 commercial servlet engines] -- and is thus some indication of a de facto standard. 2. It would seem from examples provided with setCharacterEncoding() by Sun that the intent is to include request parameters and that thus this should be the default operation of this API rather than requiring additional configuration to obtain this behavior. As for how easy it is to NOT file duplicate bugs on this issue, having followed this debate, I have collected the following list of somehow related bugs I did searches again after being scolded by Remy. I must admit that I must have crossed wires when doing searches and filing bugs and somehow managed to miss this search (which it is my habit to do). Speaking for myself and having reread these messages: Assuming I 've been working for some time with the old behaviour and experienced the new one, I would not be able to understand why this change was made, EVEN if someone gave me the above list of bugs. Agreed. Without a short summary attached to the bugs I would still have filed a new bug and argued to high hell... -- Jess Holle
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Stefanos Karasavvidis wrote: If not already done, port the useBodyEncodingForURI parameter to the next 4.1.x release. This new flag has been ported last month. Rémy - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Jess Holle wrote: Remmy, et al: The API is *not* optional. It is a required part of the servlet spec. Great. I didn't know that ;-) How about: - Not CCing me. I'm subscribed to tomcat-dev already. thanks. - There's big threads, commit messages (incl recent ones), and bugs on this issue. How about reading that before writing an email about how bad things are. BTW, there's no bug. Rémy It works just great in Tomcat 4.1 and is not an acceptable regression in Tomcat 5. I am thus one step away from re-opening this bug (http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23929) I cannot use the encoding setting on the connector as the standard handling of servlet parameters is ISO-8859-1 decoding unless setCharacterEncoding() is used to specify something else. All of our other code thus follows this standard carefully (and works across all servlet engines tested). [This includes handling multi-byte data in servlet parameters.] This does require some careful shuffling to workaround the fact that the wrong encoding was used by the servlet engine and to use the correct one (UTF-8 in most, but not all, cases). We do, however, have some code which leverages this new API to setCharacterEncoding(UTF-8) -- which is, in fact, very nice to have. I can see that it can be obnoxious for implementation -- but users of the API do not and should not care. Tomcat 5 has a lot of promising things over Tomcat 4.1 -- don't let spec non-compliance force those who are forced to care about rigorous i18n to tell our customers to use Tomcat 4.1 or pay for a commercial servlet engine if they want later spec compliance. -- Jess Holle - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Remy Maucherat wrote: Jess Holle wrote: Remmy, et al: The API is *not* optional. It is a required part of the servlet spec. Great. I didn't know that ;-) How about: - Not CCing me. I'm subscribed to tomcat-dev already. thanks. Sorry. - There's big threads, commit messages (incl recent ones), and bugs on this issue. How about reading that before writing an email about how bad things are. I did search the archives for such threads before even filing my duplicate bug, so apparently my searching is inept. I'll look again, but pointers would be appreciated. BTW, there's no bug. It would be nice if the bug comments described why it is not a bug. I understand Bugzilla is not a discussion forum, but it would really help future reporters of an issue not to resurrect old issues if the bug comments contained a final summary as to why the bug was closed as INVALID. Did I and the other reporter mis-use the API? The API presumably must work, so how are we misuing it so that it does not? If it does not work, then how does this meet the spec? -- Jess Holle - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Jess Holle wrote: - There's big threads, commit messages (incl recent ones), and bugs on this issue. How about reading that before writing an email about how bad things are. I did search the archives for such threads before even filing my duplicate bug, so apparently my searching is inept. I'll look again, but pointers would be appreciated. For example: remm2003/12/10 14:26:28 Modified:catalina/src/share/org/apache/coyote/tomcat5 CoyoteConnector.java CoyoteRequest.java mbeans-descriptors.xml Log: - Add a flag to allow using the encoding specified in the contentType for the URI paramters. This is disabled by default, not compliant with the standards, but present for compatibility. There's a query page in BZ, also, and as I said, many threads on tomcat-dev (use the archives). Rémy - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Remy Maucherat wrote: Jess Holle wrote: - There's big threads, commit messages (incl recent ones), and bugs on this issue. How about reading that before writing an email about how bad things are. I did search the archives for such threads before even filing my duplicate bug, so apparently my searching is inept. I'll look again, but pointers would be appreciated. For example: remm2003/12/10 14:26:28 Modified:catalina/src/share/org/apache/coyote/tomcat5 CoyoteConnector.java CoyoteRequest.java mbeans-descriptors.xml Log: - Add a flag to allow using the encoding specified in the contentType for the URI paramters. This is disabled by default, not compliant with the standards, but present for compatibility. But as per my previous message I /cannot /change this on a connector basis. I /must /make this determination on a per-request basis -- /and the servlet spec specifically allows me to do this via the setCharacterEncoding() API as I read it/. There's a query page in BZ, also, and as I said, many threads on tomcat-dev (use the archives). I queried both at some length -- especially BZ. I'll query the tomcat-dev archives further, but again a simple synopsis of how Tomcat's behavior satisfies the spec and is thus not a bug attached to the bug would save everyone a lot of trouble in cases like this. In other words, where a bug that from all indications appears to be a spec violation is closed as INVALID an explanation attached to the bug itself would be a *very* good idea. -- Jess Holle
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Jess Holle wrote: Remy Maucherat wrote: For example: remm2003/12/10 14:26:28 Modified:catalina/src/share/org/apache/coyote/tomcat5 CoyoteConnector.java CoyoteRequest.java mbeans-descriptors.xml Log: - Add a flag to allow using the encoding specified in the contentType for the URI paramters. This is disabled by default, not compliant with the standards, but present for compatibility. But as per my previous message I /cannot /change this on a connector basis. I /must /make this determination on a per-request basis -- /and the servlet spec specifically allows me to do this via the setCharacterEncoding() API as I read it/. The content-type header and your setCharacterEncoding call both control the request entity body character encoding. So if using the entity body encoding *also* for URI parameters, what would you think it would do ? There's a query page in BZ, also, and as I said, many threads on tomcat-dev (use the archives). I queried both at some length -- especially BZ. I'll query the tomcat-dev archives further, but again a simple synopsis of how Tomcat's behavior satisfies the spec and is thus not a bug attached to the bug would save everyone a lot of trouble in cases like this. In other words, where a bug that from all indications appears to be a spec violation is closed as INVALID an explanation attached to the bug itself would be a *very* good idea. Sorry, I'm not a broken record, and I will not go on repeating the same stuff over and over 20 times. Rémy - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Remy Maucherat wrote: Jess Holle wrote: Remy Maucherat wrote: For example: remm2003/12/10 14:26:28 Modified:catalina/src/share/org/apache/coyote/tomcat5 CoyoteConnector.java CoyoteRequest.java mbeans-descriptors.xml Log: - Add a flag to allow using the encoding specified in the contentType for the URI paramters. This is disabled by default, not compliant with the standards, but present for compatibility. But as per my previous message I /cannot /change this on a connector basis. I /must /make this determination on a per-request basis -- /and the servlet spec specifically allows me to do this via the setCharacterEncoding() API as I read it/. The content-type header and your setCharacterEncoding call both control the request entity body character encoding. So if using the entity body encoding *also* for URI parameters, what would you think it would do ? This is a good question -- but one which only applies to POST. My bug case was explictly with GET. If there is an entity body encoding specified in the request, then I am not sure which should override. If there is not, then I would presume setCharacterEncoding() should win out. If the only issue is when these differ, then I believe that site designers should simply ensure they don't. There's a query page in BZ, also, and as I said, many threads on tomcat-dev (use the archives). I queried both at some length -- especially BZ. I'll query the tomcat-dev archives further, but again a simple synopsis of how Tomcat's behavior satisfies the spec and is thus not a bug attached to the bug would save everyone a lot of trouble in cases like this. In other words, where a bug that from all indications appears to be a spec violation is closed as INVALID an explanation attached to the bug itself would be a *very* good idea. Sorry, I'm not a broken record, and I will not go on repeating the same stuff over and over 20 times. Just once on the one of the bug reports in the duplicate chain would suffice. [At least in my handling of our internal bug system it is common place to copy/paste the final status from e-mail threads and/or lists into the bugs attachments when closing the bug.] -- Jess Holle - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Jess Holle wrote: Remy Maucherat wrote: This is a good question -- but one which only applies to POST. My bug case was explictly with GET. If there is an entity body encoding specified in the request, then I am not sure which should override. If there is not, then I would presume setCharacterEncoding() should win out. If the only issue is when these differ, then I believe that site designers should simply ensure they don't. I think you should read the HTTP RFC. content-type does not apply to the URI or the HTTP header. The fact that setCharacterEncoding would apply to (part of) the URI and/or the header violates the RFC on URIs. Anyway, to put it simply: in the next release, add useBodyEncodingForURI=true on the connector, and you're done. Please don't complain that it won't do what you want before trying it. You can also use the URIEncoding attribute to specify the path encoding. Rémy - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]