On 13/10/2017 18:15, André Warnier (tomcat) wrote: > On 13.10.2017 18:17, Mark Thomas wrote: >> On 13/10/2017 17:09, James H. H. Lampert wrote: >>> Thanks to all of you who responded. >>> >>> I found a web page that explains it in ways that I can wrap my >>> 55-year-old brain around, and has an easy-to-read reference chart. >>> >>> https://perishablepress.com/stop-using-unsafe-characters-in-urls/ >>> >>> Question: the problem first showed up on a web service that takes a >>> "bodyless" POST operation, and I assume it also applies to GET >>> operations, and to the URL portion of a POST with a body. >>> >>> But what about the body of a POST? >> >> From an HTTP specification point of view, anything goes. > > With respect, I believe that "anything goes" is a bit imprecise here.
Nope. You can POST anything. You are talking specifically about form data. In that case, as I said, the body has to conform to what the component processing it expects. And yes, unicode in form data is 'interesting'... Mark > See e.g. https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4 > > There are 2 ways for a user agent to send the content of a HTTP POST : > 1) with Content-type header = application/x-www-form-urlencoded > or > 2) with Content-type header = multipart/form-data > > and while it is true that in the case (2), any submitted key=value pair > would be sent separately 'as is', this would not necessarily be so in > case (1), because then all key=value pairs would be concatenated into > one long string, in which the different key=value pairs would be > separated by (unescaped) "&" signs. > (Apart from other required encodings, see the page above) > So if the client is not a browser, and "composes" itself the POST body > before sending it, and sends it with a Content-type (1), it had better > encode the individual parameter pairs as described, before concatenating > them, because that is what the server would expect. > > As an additional note, if it so happened that the data in the client > could contain Unicode text, do not forget that this is (still) not the > standard in HTTP (and URI's, and thus query-string-like things), and > make sure that you use the proper method to encode any printable > characters which are not purely US-ASCII. Again, browsers generally do > this correctly, but custom clients not necessarily. (And a "custom > client" in this case, could even be a bit of javascript which is > embedded in one of your own pages, but does its own calls to the server > on the side). > > I just recently got bitten by this, even in a quite recent browser, > where some javascript function was composing a POST to a server (using > type (1) above), and was NOT doing it correctly, even though the page > containing and calling this function was itself declared as Unicode/UTF-8. > (that was with (and I am too sorely tempted to add "of course" to resist > it) some revision of IE-11 - although other revisions of the same > browser did not exhibit that same issue). > > [...] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org