Re: URL-encoding and "#"

Mark Thomas Fri, 13 Oct 2017 10:29:44 -0700

On 13/10/2017 18:15, André Warnier (tomcat) wrote:
> On 13.10.2017 18:17, Mark Thomas wrote:
>> On 13/10/2017 17:09, James H. H. Lampert wrote:
>>> Thanks to all of you who responded.
>>>
>>> I found a web page that explains it in ways that I can wrap my
>>> 55-year-old brain around, and has an easy-to-read reference chart.
>>>
>>> https://perishablepress.com/stop-using-unsafe-characters-in-urls/
>>>
>>> Question: the problem first showed up on a web service that takes a
>>> "bodyless" POST operation, and I assume it also applies to GET
>>> operations, and to the URL portion of a POST with a body.
>>>
>>> But what about the body of a POST?
>>
>>  From an HTTP specification point of view, anything goes.
> 
> With respect, I believe that "anything goes" is a bit imprecise here.


Nope.

You can POST anything. You are talking specifically about form data. In
that case, as I said, the body has to conform to what the component
processing it expects.

And yes, unicode in form data is 'interesting'...

Mark


> See e.g. https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4
> 
> There are 2 ways for a user agent to send the content of a HTTP POST :
> 1) with Content-type header = application/x-www-form-urlencoded
> or
> 2) with Content-type header = multipart/form-data
> 
> and while it is true that in the case (2), any submitted key=value pair
> would be sent separately 'as is', this would not necessarily be so in
> case (1), because then all key=value pairs would be concatenated into
> one long string, in which the different key=value pairs would be
> separated by (unescaped) "&" signs.
> (Apart from other required encodings, see the page above)
> So if the client is not a browser, and "composes" itself the POST body
> before sending it, and sends it with a Content-type (1), it had better
> encode the individual parameter pairs as described, before concatenating
> them, because that is what the server would expect.
> 
> As an additional note, if it so happened that the data in the client
> could contain Unicode text, do not forget that this is (still) not the
> standard in HTTP (and URI's, and thus query-string-like things), and
> make sure that you use the proper method to encode any printable
> characters which are not purely US-ASCII.  Again, browsers generally do
> this correctly, but custom clients not necessarily. (And a "custom
> client" in this case, could even be a bit of javascript which is
> embedded in one of your own pages, but does its own calls to the server
> on the side).
> 
> I just recently got bitten by this, even in a quite recent browser,
> where some javascript function was composing a POST to a server (using
> type (1) above), and was NOT doing it correctly, even though the page
> containing and calling this function was itself declared as Unicode/UTF-8.
> (that was with (and I am too sorely tempted to add "of course" to resist
> it) some revision of IE-11 - although other revisions of the same
> browser did not exhibit that same issue).
> 
> [...]
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: URL-encoding and "#"

Reply via email to