Re: URL-encoding and "#"

tomcat Fri, 13 Oct 2017 10:15:38 -0700

On 13.10.2017 18:17, Mark Thomas wrote:

On 13/10/2017 17:09, James H. H. Lampert wrote:

Thanks to all of you who responded.


I found a web page that explains it in ways that I can wrap my
55-year-old brain around, and has an easy-to-read reference chart.

https://perishablepress.com/stop-using-unsafe-characters-in-urls/

Question: the problem first showed up on a web service that takes a
"bodyless" POST operation, and I assume it also applies to GET
operations, and to the URL portion of a POST with a body.

But what about the body of a POST?


 From an HTTP specification point of view, anything goes.


With respect, I believe that "anything goes" is a bit imprecise here.

See e.g. https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4

There are 2 ways for a user agent to send the content of a HTTP POST :
1) with Content-type header = application/x-www-form-urlencoded
or
2) with Content-type header = multipart/form-data

and while it is true that in the case (2), any submitted key=value pair would be sentseparately 'as is', this would not necessarily be so in case (1), because then allkey=value pairs would be concatenated into one long string, in which the differentkey=value pairs would be separated by (unescaped) "&" signs.

(Apart from other required encodings, see the page above)

So if the client is not a browser, and "composes" itself the POST body before sending it,and sends it with a Content-type (1), it had better encode the individual parameter pairsas described, before concatenating them, because that is what the server would expect.

As an additional note, if it so happened that the data in the client could contain Unicodetext, do not forget that this is (still) not the standard in HTTP (and URI's, and thusquery-string-like things), and make sure that you use the proper method to encode anyprintable characters which are not purely US-ASCII. Again, browsers generally do thiscorrectly, but custom clients not necessarily. (And a "custom client" in this case, couldeven be a bit of javascript which is embedded in one of your own pages, but does its owncalls to the server on the side).

I just recently got bitten by this, even in a quite recent browser, where some javascriptfunction was composing a POST to a server (using type (1) above), and was NOT doing itcorrectly, even though the page containing and calling this function was itself declaredas Unicode/UTF-8.(that was with (and I am too sorely tempted to add "of course" to resist it) some revisionof IE-11 - although other revisions of the same browser did not exhibit that same issue).


[...]


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: URL-encoding and "#"

Reply via email to