On 13.10.2017 19:29, Mark Thomas wrote:
On 13/10/2017 18:15, André Warnier (tomcat) wrote:
On 13.10.2017 18:17, Mark Thomas wrote:
On 13/10/2017 17:09, James H. H. Lampert wrote:
Thanks to all of you who responded.

I found a web page that explains it in ways that I can wrap my
55-year-old brain around, and has an easy-to-read reference chart.

https://perishablepress.com/stop-using-unsafe-characters-in-urls/

Question: the problem first showed up on a web service that takes a
"bodyless" POST operation, and I assume it also applies to GET
operations, and to the URL portion of a POST with a body.

But what about the body of a POST?

  From an HTTP specification point of view, anything goes.

With respect, I believe that "anything goes" is a bit imprecise here.

Nope.

You can POST anything. You are talking specifically about form data.

Mmm. You are being a bit casuistic here. (Granted, not that I wasn't.)
In the real world, I would expect that 99% of what is ever POSTed, /is/ form 
data.
Not you ?

 In
that case, as I said, the body has to conform to what the component
processing it expects.

And that component would be .. ?
I don't really know, but I would guess that in most webservers, the component parsing the body of a POST with Content-type = application/x-www-form-urlencoded, may be the same as the one which is parsing the query-string of a URI, no ? Considering the similarity of these two things, it would seem that the temptation would be hard to resist.


And yes, unicode in form data is 'interesting'...

Mark


See e.g. https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4

There are 2 ways for a user agent to send the content of a HTTP POST :
1) with Content-type header = application/x-www-form-urlencoded
or
2) with Content-type header = multipart/form-data

and while it is true that in the case (2), any submitted key=value pair
would be sent separately 'as is', this would not necessarily be so in
case (1), because then all key=value pairs would be concatenated into
one long string, in which the different key=value pairs would be
separated by (unescaped) "&" signs.
(Apart from other required encodings, see the page above)
So if the client is not a browser, and "composes" itself the POST body
before sending it, and sends it with a Content-type (1), it had better
encode the individual parameter pairs as described, before concatenating
them, because that is what the server would expect.

As an additional note, if it so happened that the data in the client
could contain Unicode text, do not forget that this is (still) not the
standard in HTTP (and URI's, and thus query-string-like things), and
make sure that you use the proper method to encode any printable
characters which are not purely US-ASCII.  Again, browsers generally do
this correctly, but custom clients not necessarily. (And a "custom
client" in this case, could even be a bit of javascript which is
embedded in one of your own pages, but does its own calls to the server
on the side).

I just recently got bitten by this, even in a quite recent browser,
where some javascript function was composing a POST to a server (using
type (1) above), and was NOT doing it correctly, even though the page
containing and calling this function was itself declared as Unicode/UTF-8.
(that was with (and I am too sorely tempted to add "of course" to resist
it) some revision of IE-11 - although other revisions of the same
browser did not exhibit that same issue).

[...]


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to