Re: The GET request encounters 400 Bad Request from a URL with Chinese words on Tomcat 8.0.43

Bruce Huang Fri, 04 Aug 2017 01:59:30 -0700

Mark Thomas <ma...@apache.org> 於 2017年8月1日 週二 下午7:37寫道：


> On 01/08/17 03:26, Bruce Huang wrote:
> > Hi all,
> >
> > We have placed a file named 檔名.txt into
> > the \apache-tomcat-8.0.43\webapps\Apps folder. And our client app can
> > retrieve the file by an HTTP GET request from the URL, for example,
> > http://192.168.1.1/Apps/檔名.txt (The 檔名 are two Chinese words)
> >
> > When it was on tomcat v8.0.23, everything works fine. However, after we
> > have migrated to the v8.0.43, the client app will receive response with
> > HTTP 400 Bad Request. The code that our client app used as below. Looks
> > like that it didn't encode the URL path and only translate the whitespace
> > to %20.
> >
> > Is there any solution that we can configure the tomcat 8.0.43 to
> make this
> > case works as usual(On tomcat v8.0.23), since there are lots of client
> > app deployed?
>
> Sorry, no. This is part of the fix for CVE-2016-6816.
>
> Options have since been added to allow some illegal characters through
> but they will not be sufficient to allow the full range of UTF-8 bytes.
>
> The fix was added to 8.0.39 so any version up to 8.0.38 should work for
> you.
>
> You might be able to put a more lenient reverse proxy in front of Tomcat
> which will accept these characters and then pass the request (correctly
> encoded) to Tomcat. However that depends on finding a suitable reverse
> proxy.
>
> Mark
>

Hi Mark,

Thanks for the reply. We will try to stay with the version 8.0.38 before we
migrate all our app clients.

For those who search for this, the configuration property is  tomcat.
util.http.parser.HttpParser. requestTargetAllow
<https://tomcat.apache.org/tomcat-8.5-doc/config/systemprops.html#Other>


>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>

André Warnier (tomcat) <a...@ice-sa.com> 於 2017年8月1日 週二 下午8:10寫道：

> On 01.08.2017 04:26, Bruce Huang wrote:
> > Hi all,
> >
> > We have placed a file named 檔名.txt into
> > the \apache-tomcat-8.0.43\webapps\Apps folder. And our client app can
> > retrieve the file by an HTTP GET request from the URL, for example,
> > http://192.168.1.1/Apps/檔名.txt (The 檔名 are two Chinese words)
>
> This is one of those cases where it can get very confusing very quickly,
> because of the
> multiple opportunities for things to get encoded/decoded or not, and to be
> /seen/ as
> encoded/decoded or not. (Such as : are we really seeing the above URL as
> you meant to send
> it, or are we seeing some other form, as encoded by the email systems
> in-between ?)
>
> Strictly speaking, according to the relevant Internet HTTP RFCs (and which
> ones are
> relevant can be yet another confusing matter), you MAY NOT include the
> above Chinese
> characters directly in a URL string. The set of characters/bytes allowed
> in a URL string
> is very restrictive, and in any case does not include even the individual
> bytes which
> would result from encoding the above Unicode characters as UTF-8.
> (See : https://tools.ietf.org/html/rfc3986#section-2)
>
> Before you send out this URL from the client, you would have to :
> - encode the above Chinese characters as a UTF-8 byte sequence. This would
> probably result
> in 3 bytes or more per character, so let's say 6 bytes in total.
> - then, for each of the 6 bytes, you would have to check if they are
> within the range of
> bytes allowed in a URL, and if not, /that/ byte should be encoded/escaped
> as a "%xy"
> 3-character ASCII byte sequence. (There are many existing functions to do
> that).
>
> Then on the server side receiving this URL, the opposite transformation
> should take place :
> - the first step would be to "%-decode" the URL string, to restore the
> original bytes
> which the client wanted to send. To my knowledge, all HTTP servers do that.
>
> - then, the server and the application would have to /assume/ that URLs
> received from your
> clients are always Unicode, UTF-8 encoded.  That is (still) not the
> default in HTTP (the
> default is still ISO-8859-1). (And there is no mechanism in the current
> RFCs, that allow
> either client or server to indicate, in the request itself, what character
> set the request
> URL really is written in, or should be).
> But you can force Tomcat to assume this, see :
> http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Common_Attributes
> --> URIEncoding
> (and there is also "useBodyEncodingForURI", but that does not apply in
> your particular case)
> - the next step would thus be for the application (e.g. the default
> servlet), to /assume/
> that this URL is Unicode/UTF-8, and decode this into a corresponding
> internal Unicode string.
> - and then comes the step of looking for the corresponding file in the
> filesystem, by the
> name you got from the previous step. And depending on the OS and the
> filesystem, this may
> be character-set-agnostic or not, and may be case-agnostic or not.
> (But your problem is currently not that it does not find the file; it is
> that the HTTP
> request itself gets rejected as invalid. So your request URI contains
> bytes which the
> server considers - rightly or not - as invalid in a URL.)
>
> [rant]
> In other words and basically, no wonder that developers (of servers as
> well as of
> applications) get confused from time to time, and maybe unwittingly
> introduce bugs when
> trying to handle URLs and/or content that is anything else than English.
> In that respect, the HTTP protocols are still hopelessly outdated and
> obnoxious when
> handling the vast amounts of languages which are in use in today's
> real-life Internet.
>
> And it is a never-ending wonder to me why whoever are in charge of these
> things, have
> apparently not yet made a serious attempt at publishing a new set of
> coordinated HTTP (and
> HTML, and CGI, and Javascript etc.) versions which would make
> Unicode/UTF-8 the default
> charset/encoding (for URLs as well as for text content), instead of the
> long-obsolete
> ASCII and ISO-8859-1 character sets. I would bet that millions of useless
> work-hours would
> be saved worldwide every year by such a change.
> [end of rant]
>
>
Hi André,

Thanks for the clear and helpful explanation. It seems that our
application developers have unwittingly introduced the bug this time as you
said.:(


>
> >
> > When it was on tomcat v8.0.23, everything works fine. However, after we
> > have migrated to the v8.0.43, the client app will receive response with
> > HTTP 400 Bad Request.
>
> Most probably, that was a correction in Tomcat, which previously did not
> properly reject
> some URLs which are invalid according to the existing (deficient) RFCs.
>
> The code that our client app used as below. Looks
> > like that it didn't encode the URL path and only translate the whitespace
> > to %20.
>
> Exactly. You app has to encode that URL properly before issuing the
> request.
>
> >
> > Is there any solution that we can configure the tomcat 8.0.43 to make
> this
> > case works as usual(On tomcat v8.0.23), since there are lots of client
> > app deployed?
> >
>
> If "as usual" was wrong and/or could cause security issues, your chances
> are slim, and you
> will have to update your app.
>
>
>
>

Re: The GET request encounters 400 Bad Request from a URL with Chinese words on Tomcat 8.0.43

Reply via email to