Re: Bug 23929: ServletRequest.setCharacterEncoding()

2004-01-06 Thread Stefanos Karasavvidis
Remy Maucherat wrote:
Jess Holle wrote:

Remy Maucherat wrote:

This is a good question -- but one which only applies to POST.  My bug 
case was explictly with GET.

If there is an entity body encoding specified in the request, then I 
am not sure which should override.  If there is not, then I would 
presume setCharacterEncoding() should win out.  If the only issue is 
when these differ, then I believe that site designers should simply 
ensure they don't.


I think you should read the HTTP RFC. content-type does not apply to the 
URI or the HTTP header. The fact that setCharacterEncoding would apply 
to (part of) the URI and/or the header violates the RFC on URIs.

Anyway, to put it simply: in the next release, add 
useBodyEncodingForURI=true on the connector, and you're done.
Please don't complain that it won't do what you want before trying it.
You can also use the URIEncoding attribute to specify the path encoding.

Rémy

my 2 cents on this issue,

Remy is for sure right stating that
a) the HTTP RFC does not cover variable character encoding for query 
parameters for different requests
b) it is (sounds ?) logical to assume that the whole URI (including 
paths, query parameters etc.) should be considered as being encoded with 
the same character encoding

From a developers point of view however, applying the above two points
a) brakes expected behaviour (setCharacterEncoding() method does not 
work the same as before)
b) does not give an acceptable alternative (if all parameter passing 
could be solved with POST method, then the GET method would not be 
needed, would it?)
c) a lot of web apps stopped working when an upgrade of the tomcat 
version was performed

So I think it is legitimate to be upset when first confronted with this 
change of behaviour.

As for how easy it is to NOT file duplicate bugs on this issue, having 
followed this debate, I have collected the following list of somehow 
related bugs
bug 25360
bug 25231
bug 25235
bug 22666
bug 24557
bug 24345
bug 23929
bug 25848
and of course a bunch of messages in the developer list

Speaking for myself and having reread these messages:
Assuming I 've been working for some time with the old behaviour and 
experienced the new one, I would not be able to understand why this 
change was made, EVEN if someone gave me the above list of bugs.

I propose the following:
write a short summary of why this change was necessary and include the 
above list of bugs, as well as links to the related developer list 
threads. Then submit a link to this summary to all the above bugs.
If not already done, port the useBodyEncodingForURI parameter to the 
next 4.1.x release.

I volunteer to write the summary if the list thinks that the proposal is 
reasonable.

Regards

Stefanos

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Bug 23929: ServletRequest.setCharacterEncoding()

2004-01-06 Thread Jess Holle

From a developers point of view however, applying the above two points
a) brakes expected behaviour (setCharacterEncoding() method does not 
work the same as before)
b) does not give an acceptable alternative (if all parameter passing 
could be solved with POST method, then the GET method would not be 
needed, would it?)
c) a lot of web apps stopped working when an upgrade of the tomcat 
version was performed

So I think it is legitimate to be upset when first confronted with 
this change of behaviour.
I will not claim that I was reasonable when originally confronted with 
the change.

I will say that:

  1. Our existing (4.1.x) usage of setCharacterEncoding() works across
 all recent servlet engines tested [including 2 commercial servlet
 engines] -- and is thus some indication of a de facto standard.
  2. It would seem from examples provided with setCharacterEncoding()
 by Sun that the intent is to include request parameters and that
 thus this should be the default operation of this API rather than
 requiring additional configuration to obtain this behavior.
As for how easy it is to NOT file duplicate bugs on this issue, having 
followed this debate, I have collected the following list of somehow 
related bugs
I did searches again after being scolded by Remy.  I must admit that I 
must have crossed wires when doing searches and filing bugs and somehow 
managed to miss this search (which it is my habit to do).

Speaking for myself and having reread these messages:
Assuming I 've been working for some time with the old behaviour and 
experienced the new one, I would not be able to understand why this 
change was made, EVEN if someone gave me the above list of bugs.
Agreed.  Without a short summary attached to the bugs I would still have 
filed a new bug and argued to high hell...

--
Jess Holle


Re: Bug 23929: ServletRequest.setCharacterEncoding()

2004-01-06 Thread Remy Maucherat
Stefanos Karasavvidis wrote:
If not already done, port the useBodyEncodingForURI parameter to the 
next 4.1.x release.
This new flag has been ported last month.

Rémy

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Bug 23929: ServletRequest.setCharacterEncoding()

2004-01-05 Thread Remy Maucherat
Jess Holle wrote:

Remmy, et al:

The API is *not* optional.  It is a required part of the servlet spec.
Great. I didn't know that ;-)

How about:
- Not CCing me. I'm subscribed to tomcat-dev already. thanks.
- There's big threads, commit messages (incl recent ones), and bugs on 
this issue. How about reading that before writing an email about how bad 
things are.

BTW, there's no bug.

Rémy

It works just great in Tomcat 4.1 and is not an acceptable regression in 
Tomcat 5.  I am thus one step away from re-opening this bug 
(http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23929)

I cannot use the encoding setting on the connector as the standard 
handling of servlet parameters is ISO-8859-1 decoding unless 
setCharacterEncoding() is used to specify something else.  All of our 
other code thus follows this standard carefully (and works across all 
servlet engines tested).  [This includes handling multi-byte data in 
servlet parameters.]  This does require some careful shuffling to 
workaround the fact that the wrong encoding was used by the servlet 
engine and to use the correct one (UTF-8 in most, but not all, cases).

We do, however, have some code which leverages this new API to 
setCharacterEncoding(UTF-8) -- which is, in fact, very nice to have.  
I can see that it can be obnoxious for implementation -- but users of 
the API do not and should not care.

Tomcat 5 has a lot of promising things over Tomcat 4.1 -- don't let spec 
non-compliance force those who are forced to care about rigorous i18n to 
tell our customers to use Tomcat 4.1 or pay for a commercial servlet 
engine if they want later spec compliance.

--
Jess Holle


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Bug 23929: ServletRequest.setCharacterEncoding()

2004-01-05 Thread Jess Holle
Remy Maucherat wrote:

Jess Holle wrote:

Remmy, et al:

The API is *not* optional.  It is a required part of the servlet spec.
Great. I didn't know that ;-)

How about:
- Not CCing me. I'm subscribed to tomcat-dev already. thanks.
Sorry.

- There's big threads, commit messages (incl recent ones), and bugs on 
this issue. How about reading that before writing an email about how 
bad things are.
I did search the archives for such threads before even filing my 
duplicate bug, so apparently my searching is inept.  I'll look again, 
but pointers would be appreciated.

BTW, there's no bug.
It would be nice if the bug comments described why it is not a bug.  I 
understand Bugzilla is not a discussion forum, but it would really help 
future reporters of an issue not to resurrect old issues if the bug 
comments contained a final summary as to why the bug was closed as 
INVALID.

Did I and the other reporter mis-use the API?  The API presumably must 
work, so how are we misuing it so that it does not?  If it does not 
work, then how does this meet the spec?

--
Jess Holle


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Bug 23929: ServletRequest.setCharacterEncoding()

2004-01-05 Thread Remy Maucherat
Jess Holle wrote:
- There's big threads, commit messages (incl recent ones), and bugs on 
this issue. How about reading that before writing an email about how 
bad things are.


I did search the archives for such threads before even filing my 
duplicate bug, so apparently my searching is inept.  I'll look again, 
but pointers would be appreciated.
For example:
remm2003/12/10 14:26:28
  Modified:catalina/src/share/org/apache/coyote/tomcat5
CoyoteConnector.java CoyoteRequest.java
mbeans-descriptors.xml
  Log:
  - Add a flag to allow using the encoding specified in the contentType for
the URI paramters. This is disabled by default, not compliant with 
the standards,
but present for compatibility.

There's a query page in BZ, also, and as I said, many threads on 
tomcat-dev (use the archives).

Rémy



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Bug 23929: ServletRequest.setCharacterEncoding()

2004-01-05 Thread Jess Holle
Remy Maucherat wrote:

Jess Holle wrote:

- There's big threads, commit messages (incl recent ones), and bugs 
on this issue. How about reading that before writing an email about 
how bad things are.


I did search the archives for such threads before even filing my 
duplicate bug, so apparently my searching is inept.  I'll look again, 
but pointers would be appreciated.


For example:
remm2003/12/10 14:26:28
  Modified:catalina/src/share/org/apache/coyote/tomcat5
CoyoteConnector.java CoyoteRequest.java
mbeans-descriptors.xml
  Log:
  - Add a flag to allow using the encoding specified in the 
contentType for
the URI paramters. This is disabled by default, not compliant with 
the standards,
but present for compatibility.
But as per my previous message I /cannot /change this on a connector 
basis.  I /must /make this determination on a per-request basis -- /and 
the servlet spec specifically allows me to do this via the 
setCharacterEncoding() API as I read it/.

There's a query page in BZ, also, and as I said, many threads on 
tomcat-dev (use the archives).
I queried both at some length -- especially BZ.  I'll query the 
tomcat-dev archives further, but again a simple synopsis of how Tomcat's 
behavior satisfies the spec and is thus not a bug attached to the bug 
would save everyone a lot of trouble in cases like this.  In other 
words, where a bug that from all indications appears to be a spec 
violation is closed as INVALID an explanation attached to the bug 
itself would be a *very* good idea.

--
Jess Holle


Re: Bug 23929: ServletRequest.setCharacterEncoding()

2004-01-05 Thread Remy Maucherat
Jess Holle wrote:
Remy Maucherat wrote:
For example:
remm2003/12/10 14:26:28
  Modified:catalina/src/share/org/apache/coyote/tomcat5
CoyoteConnector.java CoyoteRequest.java
mbeans-descriptors.xml
  Log:
  - Add a flag to allow using the encoding specified in the 
contentType for
the URI paramters. This is disabled by default, not compliant with 
the standards,
but present for compatibility.
But as per my previous message I /cannot /change this on a connector 
basis.  I /must /make this determination on a per-request basis -- /and 
the servlet spec specifically allows me to do this via the 
setCharacterEncoding() API as I read it/.
The content-type header and your setCharacterEncoding call both control 
the request entity body character encoding. So if using the entity body 
encoding *also* for URI parameters, what would you think it would do ?

There's a query page in BZ, also, and as I said, many threads on 
tomcat-dev (use the archives).
I queried both at some length -- especially BZ.  I'll query the 
tomcat-dev archives further, but again a simple synopsis of how Tomcat's 
behavior satisfies the spec and is thus not a bug attached to the bug 
would save everyone a lot of trouble in cases like this.  In other 
words, where a bug that from all indications appears to be a spec 
violation is closed as INVALID an explanation attached to the bug 
itself would be a *very* good idea.
Sorry, I'm not a broken record, and I will not go on repeating the same 
stuff over and over 20 times.

Rémy

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Bug 23929: ServletRequest.setCharacterEncoding()

2004-01-05 Thread Jess Holle
Remy Maucherat wrote:

Jess Holle wrote:

Remy Maucherat wrote:

For example:
remm2003/12/10 14:26:28
  Modified:catalina/src/share/org/apache/coyote/tomcat5
CoyoteConnector.java CoyoteRequest.java
mbeans-descriptors.xml
  Log:
  - Add a flag to allow using the encoding specified in the 
contentType for
the URI paramters. This is disabled by default, not compliant 
with the standards,
but present for compatibility.
But as per my previous message I /cannot /change this on a connector 
basis.  I /must /make this determination on a per-request basis -- 
/and the servlet spec specifically allows me to do this via the 
setCharacterEncoding() API as I read it/.
The content-type header and your setCharacterEncoding call both 
control the request entity body character encoding. So if using the 
entity body encoding *also* for URI parameters, what would you think 
it would do ?
This is a good question -- but one which only applies to POST.  My bug 
case was explictly with GET.

If there is an entity body encoding specified in the request, then I am 
not sure which should override.  If there is not, then I would presume 
setCharacterEncoding() should win out.  If the only issue is when these 
differ, then I believe that site designers should simply ensure they don't.

There's a query page in BZ, also, and as I said, many threads on 
tomcat-dev (use the archives).
I queried both at some length -- especially BZ.  I'll query the 
tomcat-dev archives further, but again a simple synopsis of how 
Tomcat's behavior satisfies the spec and is thus not a bug attached 
to the bug would save everyone a lot of trouble in cases like this.  
In other words, where a bug that from all indications appears to be a 
spec violation is closed as INVALID an explanation attached to the 
bug itself would be a *very* good idea.
Sorry, I'm not a broken record, and I will not go on repeating the 
same stuff over and over 20 times.
Just once on the one of the bug reports in the duplicate chain would 
suffice.  [At least in my handling of our internal bug system it is 
common place to copy/paste the final status from e-mail threads and/or 
lists into the bugs attachments when closing the bug.]

--
Jess Holle


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Bug 23929: ServletRequest.setCharacterEncoding()

2004-01-05 Thread Remy Maucherat
Jess Holle wrote:
Remy Maucherat wrote:

This is a good question -- but one which only applies to POST.  My bug 
case was explictly with GET.

If there is an entity body encoding specified in the request, then I am 
not sure which should override.  If there is not, then I would presume 
setCharacterEncoding() should win out.  If the only issue is when these 
differ, then I believe that site designers should simply ensure they don't.
I think you should read the HTTP RFC. content-type does not apply to the 
URI or the HTTP header. The fact that setCharacterEncoding would apply 
to (part of) the URI and/or the header violates the RFC on URIs.

Anyway, to put it simply: in the next release, add 
useBodyEncodingForURI=true on the connector, and you're done.
Please don't complain that it won't do what you want before trying it.
You can also use the URIEncoding attribute to specify the path encoding.

Rémy



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]