Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

tomcat Thu, 20 Oct 2016 01:27:07 -0700

On 19.10.2016 20:42, Mark Juszczec wrote:

On Tue, Oct 18, 2016 at 4:45 PM, Mark Juszczec <mark.juszc...@gmail.com>
wrote:



On Tue, Oct 18, 2016 at 2:58 PM, Mark Juszczec <mark.juszc...@gmail.com>
wrote:



Some questions (if these are not relevant, please disregard):

I'm loading a whole bunch of modules.  Could some of them be incompatible?

DocumentRoot refers to a directory that does not exist.  Is that a
problem?

What does AddLanguage do?

Is AddDefaultCharset redundant?

Are +ForwardKeySize and -ForwardDirectories somehow disabling what
+ForwardURIEscaped does?

I have verified the data coming out of Shibboleth is what we expect.


I think I've found where the byte data is coming in.

AjpAprProcessor.java's method:

protected boolean read(byte[] buf, int pos, int n, boolean block) throws
IOException

This ultimately gives me a great big buffer of bytes. Spring Tool Suite
shows me the relevant ones:

74 79 -61 -117 76

I think I have found where these bytes are interpreted improperly and my
problems start.

In AbstractAjpProcessor.java there is a method named  protected void
prepareRequest()

         // Decode extra attributes
         boolean secret = false;
         byte attributeCode;
         while ((attributeCode = requestHeaderMessage.getByte())
                 != Constants.SC_A_ARE_DONE) {

             switch (attributeCode) {

             case Constants.SC_A_REQ_ATTRIBUTE :
                 requestHeaderMessage.getBytes(tmpMB);
                 String n = tmpMB.toString();
                 requestHeaderMessage.getBytes(tmpMB);
                 String v = tmpMB.toString();

I have debugged and gotten to the point where n="FirstName" - the bit of
data giving me fits

After  requestHeaderMessage.getBytes(tmpMB); (the one after String n =
....) tmpMB shows "JOÃ‹L"

tmpMB is a MessageByte.  It contains a ByteChunk.which is the array of
bytes I posted yesterday.

The ByteChunk has a start=1049 and an end=1054.  That is bytes

1049: 5
1050: 74        J
1051: 79        O
1052: -61        0xF....C3
1053: -117      0xF....8B
1054: 76       L

The ByteChunk has a charset and it is set to ISO-8859-1

So, that explains - at least to me - where things go wrong.

Now, the question is why.

Looking at ByteChunk.java, I see it has the following:

     /** Default encoding used to convert to strings. It should be UTF8,
         as most standards seem to converge, but the servlet API requires
         8859_1, and this object is used mostly for servlets.
     */
     public static final Charset DEFAULT_CHARSET =
StandardCharsets.ISO_8859_1;

     private Charset charset;

     public void setCharset(Charset charset) {
         this.charset = charset;
     }

     public Charset getCharset() {
         if (charset == null) {
             charset = DEFAULT_CHARSET;
         }
         return charset;
     }

I set a breakpoint on ByteChunk.setCharset(Charset) and it is never
executed.

ByteChunk.getCharset() is called from MessageBytes.toBytes() which is
called from AjpMessage.appendBytes(MessageBytes)

So, I think this explains why my data is being interpreted incorrectly.

Now, the question becomes why isn't this line in server.xml:

  <Connector port="XXXX"
                   emptySessionPath="true"
                   enableLookups="false"
                   redirectPort="YYYY"
                   protocol="AJP/1.3"
                   maxThreads="300"
                   URIEncoding="UTF-8"
                   connectionTimeout="600000" />

enough to cause ByteChunk.charset to be set to "UTF-8"

Does anyone have any thoughts as to how to proceed?

Can you tell us (or remind us) exactly how the browser is sending this request for theparameter "JOEL" (with dieraesis on the E) to the server ?

Is it a part of the query-string of the URL, or is it in the body of a POST 
request ?

The following on-line documentation describes precisely how this should work :
http://tomcat.apache.org/tomcat-8.0-doc/config/ajp.html#Attributes

(See "URIEncoding", but also "useBodyEncodingForURI", and follow the link provided to thesame attributes in the HTTP Connector :http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Common_Attributes)


So check exactly what you are doing, and if that matches these rules somehow.

Personal rant :
Unfortunately, this is is still a big mess in the HTTP protocol.

And the people in charge of the design of the protocol missed a golden opportunity ofcleaning this up in HTTP 2.x and making Unicode/UTF-8 the default, instead of clinging toiso-8859-1. Thus condemning all web programmers worldwide to another 20 years of obscurebugs and clunky work-arounds.


(s) Andr%C3%A9




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Reply via email to