On 13.10.2017 18:17, Mark Thomas wrote:
On 13/10/2017 17:09, James H. H. Lampert wrote:
Thanks to all of you who responded.

I found a web page that explains it in ways that I can wrap my
55-year-old brain around, and has an easy-to-read reference chart.

https://perishablepress.com/stop-using-unsafe-characters-in-urls/

Question: the problem first showed up on a web service that takes a
"bodyless" POST operation, and I assume it also applies to GET
operations, and to the URL portion of a POST with a body.

But what about the body of a POST?

 From an HTTP specification point of view, anything goes.

With respect, I believe that "anything goes" is a bit imprecise here.

See e.g. https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4

There are 2 ways for a user agent to send the content of a HTTP POST :
1) with Content-type header = application/x-www-form-urlencoded
or
2) with Content-type header = multipart/form-data

and while it is true that in the case (2), any submitted key=value pair would be sent separately 'as is', this would not necessarily be so in case (1), because then all key=value pairs would be concatenated into one long string, in which the different key=value pairs would be separated by (unescaped) "&" signs.
(Apart from other required encodings, see the page above)
So if the client is not a browser, and "composes" itself the POST body before sending it, and sends it with a Content-type (1), it had better encode the individual parameter pairs as described, before concatenating them, because that is what the server would expect.

As an additional note, if it so happened that the data in the client could contain Unicode text, do not forget that this is (still) not the standard in HTTP (and URI's, and thus query-string-like things), and make sure that you use the proper method to encode any printable characters which are not purely US-ASCII. Again, browsers generally do this correctly, but custom clients not necessarily. (And a "custom client" in this case, could even be a bit of javascript which is embedded in one of your own pages, but does its own calls to the server on the side).

I just recently got bitten by this, even in a quite recent browser, where some javascript function was composing a POST to a server (using type (1) above), and was NOT doing it correctly, even though the page containing and calling this function was itself declared as Unicode/UTF-8. (that was with (and I am too sorely tempted to add "of course" to resist it) some revision of IE-11 - although other revisions of the same browser did not exhibit that same issue).

[...]


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to