Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-20 Thread tomcat

On 20.10.2016 15:55, Mark Juszczec wrote:

On Thu, Oct 20, 2016 at 4:21 AM, André Warnier (tomcat) 
wrote:



Can you tell us (or remind us) exactly how the browser is sending this
request for the parameter "JOEL" (with dieraesis on the E) to the server ?
Is it a part of the query-string of the URL, or is it in the body of a
POST request ?

The following on-line documentation describes precisely how this should
work :
http://tomcat.apache.org/tomcat-8.0-doc/config/ajp.html#Attributes
(See "URIEncoding", but also "useBodyEncodingForURI", and follow the link
provided to the same attributes in the HTTP Connector :
http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Common_Attributes
)

So check exactly what you are doing, and if that matches these rules
somehow.

Personal rant :
Unfortunately, this is is still a big mess in the HTTP protocol.
And the people in charge of the design of the protocol missed a golden
opportunity of cleaning this up in HTTP 2.x and making Unicode/UTF-8 the
default, instead of clinging to iso-8859-1. Thus condemning all web
programmers worldwide to another 20 years of obscure bugs and clunky
work-arounds.

(s) Andr%C3%A9



The data is being returned by Shibboleth and passed to Tomcat in the body
of an HTTP GET request.


Nitpick : that is a contradiction in terms. A GET request, per RFC, has no 
"body".
See : https://tools.ietf.org/html/rfc7231#section-4  4.3.1 GET

I don't know Shibboleth, and I do not know how it works exactly, but based on what you 
seem to imply here, I will assume that the "joel" in question is being passed as part of 
the GET request URL (like "..?givenName=joel=xxx..").

(Technically, that part is the "query-string" part of the URI).

Based on what else you indicate below about Shibbolet, I would also assume that the "e 
with dieresis" (sorry, can't type it on my German keyboard), is passed in that 
query-string, as iso-8859-1, perhaps percent-encoded as %CB or %EB.


Receiving this, recent Tomcats would decode this either as iso-8859-1 (latin-1) (if 
STRICT_SERVLET_COMPLIANCE is enforced), or as UTF-8 (by default), or according to what you 
set as "URIEncoding" and/or "useBodyEncodingForURI".
If it tries UTF-8, that may or may not generate a valid Java Unicode character, but it 
would in any case not be the character that you expect.
If you set it to decode the URIs using iso-8859-1, then it would decode this correctly 
(and generate the correct java Unicode character in your application), but it would decode 
*all* further request URIs using iso-8859-1, which would most probably have adverse 
effects on the rest of your application.


So it would seem that you are stuck somewhere in-between.
But it is not a Tomcat issue, it is a Shibbolet issue.
(Or rather, a Shibbolet-and-HTTP-defaulting-to-iso-8859-1 issue).



This is by design of the application and there's nothing I can do about it.



Neither can we.


As such, my only options for enforcing UTF-8 are by using "URIEncoding"
and/or "useBodyEncodingForURI" as described in the links.

I've done this and it has not had any impact on the problem.

Last night, I found these bits of information:

https://issues.shibboleth.net/jira/browse/SSPCPP-2

My interpretation (and PLEASE tell me if I'm wrong) is, since at least
2007, headers have been locked in to the ISO-8859-1 charset due to specs
that govern how the world wide web is going to work.



Well yes, see my previous rant.
See : https://tools.ietf.org/html/rfc7230#section-3.2
3.2.4.  Field Parsing (at the end)


This:

https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPAttributeAccess


I am sorry, but I do not really have the time right now (nor the setup) to investigate 
further into what Shibbolet is doing, or what they are really explaining in that article.
But while reading this "in diagonal", I have a suspicion that maybe the following may help 
you, in the case of a mod_jk Connector to Tomcat :


http://tomcat.apache.org/connectors-doc/reference/apache.html

JkEnvVar

"Adds a name and an optional default value of environment variable that should be sent to 
servlet-engine as a request attribute. If the default value is not given explicitly, the 
variable will only be send, if it is set during runtime.

The default is empty, so no additional variables will be sent.
This directive can be used multiple times per virtual server. The settings will be merged 
between the global server and any virtual server.
You can retrieve the variables on Tomcat as request attributes via 
request.getAttribute(attributeName). Note that the variables send via JkEnvVar will not be 
listed in request.getAttributeNames().
Empty default values are supported since version 1.2.20. Not sending variables with empty 
defaults and empty runtime value has been introduced in version 1.2.21. "


In other words : if Shibbolet can send this value in the form of a HTTP header, and you 
can configure the Apache httpd front-end to pick up the 

Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-20 Thread Mark Juszczec
On Thu, Oct 20, 2016 at 4:21 AM, André Warnier (tomcat) 
wrote:

>
> Can you tell us (or remind us) exactly how the browser is sending this
> request for the parameter "JOEL" (with dieraesis on the E) to the server ?
> Is it a part of the query-string of the URL, or is it in the body of a
> POST request ?
>
> The following on-line documentation describes precisely how this should
> work :
> http://tomcat.apache.org/tomcat-8.0-doc/config/ajp.html#Attributes
> (See "URIEncoding", but also "useBodyEncodingForURI", and follow the link
> provided to the same attributes in the HTTP Connector :
> http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Common_Attributes
> )
>
> So check exactly what you are doing, and if that matches these rules
> somehow.
>
> Personal rant :
> Unfortunately, this is is still a big mess in the HTTP protocol.
> And the people in charge of the design of the protocol missed a golden
> opportunity of cleaning this up in HTTP 2.x and making Unicode/UTF-8 the
> default, instead of clinging to iso-8859-1. Thus condemning all web
> programmers worldwide to another 20 years of obscure bugs and clunky
> work-arounds.
>
> (s) Andr%C3%A9
>
>
The data is being returned by Shibboleth and passed to Tomcat in the body
of an HTTP GET request.

This is by design of the application and there's nothing I can do about it.

As such, my only options for enforcing UTF-8 are by using "URIEncoding"
and/or "useBodyEncodingForURI" as described in the links.

I've done this and it has not had any impact on the problem.

Last night, I found these bits of information:

https://issues.shibboleth.net/jira/browse/SSPCPP-2

My interpretation (and PLEASE tell me if I'm wrong) is, since at least
2007, headers have been locked in to the ISO-8859-1 charset due to specs
that govern how the world wide web is going to work.

This:

https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPAttributeAccess

goes on to reiterate what the first link says and propose a workaround (see
the Java link at the end of the page)

"Shibboleth attributes are by default UTF-8 encoded. However, depending on
the servlet contaner configuration they are interpreted as ISO-8859-1
values. This causes problems with non-ASCII characters. The solution is to
re-encode attributes, e.g. with:

String value= request.getHeader("givenName");
value= new String( value.getBytes("ISO-8859-1"), "UTF-8");"


Although MY data is delivered as attributes (so I have to use
request.getAttribute("FirstName") )  this works

ISO-8859-1 is the default used by ByteChunk and I've verified it is not
reset/changed to UTF-8 despite having specified it in server.xml per Tomcat
documentation.

I found this:

https://issues.shibboleth.net/jira/browse/SSPCPP-2

which says this problem has been around since at least 2007

Then I found this:

https://wiki.shibboleth.net/confluence/plugins/servlet/mobil
e#content/view/4358180

which suggests the following solution:

String value= request.getHeader("givenName");
value= new String( value.getBytes("ISO-8859-1"), "UTF-8");

I have to get my data via request.getAttribute("key")

Is the solution appropriate for data delivered as attributes?
I have read the information that says its a dangerous hack and is the main
reason I have not implemented it.

However, given the Shibboleth forum posts and what I've discovered about
ByteChunk seems to cast this in a different light.

Any thoughts, comments would be greatly appreciated.


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-20 Thread tomcat

On 19.10.2016 20:42, Mark Juszczec wrote:

On Tue, Oct 18, 2016 at 4:45 PM, Mark Juszczec 
wrote:




On Tue, Oct 18, 2016 at 2:58 PM, Mark Juszczec 
wrote:



Some questions (if these are not relevant, please disregard):

I'm loading a whole bunch of modules.  Could some of them be incompatible?

DocumentRoot refers to a directory that does not exist.  Is that a
problem?

What does AddLanguage do?

Is AddDefaultCharset redundant?

Are +ForwardKeySize and -ForwardDirectories somehow disabling what
+ForwardURIEscaped does?

I have verified the data coming out of Shibboleth is what we expect.



I think I've found where the byte data is coming in.

AjpAprProcessor.java's method:

protected boolean read(byte[] buf, int pos, int n, boolean block) throws
IOException

This ultimately gives me a great big buffer of bytes. Spring Tool Suite
shows me the relevant ones:

74 79 -61 -117 76



I think I have found where these bytes are interpreted improperly and my
problems start.

In AbstractAjpProcessor.java there is a method named  protected void
prepareRequest()

 // Decode extra attributes
 boolean secret = false;
 byte attributeCode;
 while ((attributeCode = requestHeaderMessage.getByte())
 != Constants.SC_A_ARE_DONE) {

 switch (attributeCode) {

 case Constants.SC_A_REQ_ATTRIBUTE :
 requestHeaderMessage.getBytes(tmpMB);
 String n = tmpMB.toString();
 requestHeaderMessage.getBytes(tmpMB);
 String v = tmpMB.toString();

I have debugged and gotten to the point where n="FirstName" - the bit of
data giving me fits

After  requestHeaderMessage.getBytes(tmpMB); (the one after String n =
) tmpMB shows "JOËL"

tmpMB is a MessageByte.  It contains a ByteChunk.which is the array of
bytes I posted yesterday.

The ByteChunk has a start=1049 and an end=1054.  That is bytes

1049: 5
1050: 74J
1051: 79O
1052: -610xFC3
1053: -117  0xF8B
1054: 76   L

The ByteChunk has a charset and it is set to ISO-8859-1

So, that explains - at least to me - where things go wrong.

Now, the question is why.

Looking at ByteChunk.java, I see it has the following:

 /** Default encoding used to convert to strings. It should be UTF8,
 as most standards seem to converge, but the servlet API requires
 8859_1, and this object is used mostly for servlets.
 */
 public static final Charset DEFAULT_CHARSET =
StandardCharsets.ISO_8859_1;

 private Charset charset;

 public void setCharset(Charset charset) {
 this.charset = charset;
 }

 public Charset getCharset() {
 if (charset == null) {
 charset = DEFAULT_CHARSET;
 }
 return charset;
 }

I set a breakpoint on ByteChunk.setCharset(Charset) and it is never
executed.

ByteChunk.getCharset() is called from MessageBytes.toBytes() which is
called from AjpMessage.appendBytes(MessageBytes)

So, I think this explains why my data is being interpreted incorrectly.

Now, the question becomes why isn't this line in server.xml:

  

enough to cause ByteChunk.charset to be set to "UTF-8"

Does anyone have any thoughts as to how to proceed?



Can you tell us (or remind us) exactly how the browser is sending this request for the 
parameter "JOEL" (with dieraesis on the E) to the server ?

Is it a part of the query-string of the URL, or is it in the body of a POST 
request ?

The following on-line documentation describes precisely how this should work :
http://tomcat.apache.org/tomcat-8.0-doc/config/ajp.html#Attributes
(See "URIEncoding", but also "useBodyEncodingForURI", and follow the link provided to the 
same attributes in the HTTP Connector : 
http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Common_Attributes)


So check exactly what you are doing, and if that matches these rules somehow.

Personal rant :
Unfortunately, this is is still a big mess in the HTTP protocol.
And the people in charge of the design of the protocol missed a golden opportunity of 
cleaning this up in HTTP 2.x and making Unicode/UTF-8 the default, instead of clinging to 
iso-8859-1. Thus condemning all web programmers worldwide to another 20 years of obscure 
bugs and clunky work-arounds.


(s) Andr%C3%A9




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-19 Thread Mark Juszczec
On Tue, Oct 18, 2016 at 4:45 PM, Mark Juszczec 
wrote:

>
>
> On Tue, Oct 18, 2016 at 2:58 PM, Mark Juszczec 
> wrote:
>>
>>
>> Some questions (if these are not relevant, please disregard):
>>
>> I'm loading a whole bunch of modules.  Could some of them be incompatible?
>>
>> DocumentRoot refers to a directory that does not exist.  Is that a
>> problem?
>>
>> What does AddLanguage do?
>>
>> Is AddDefaultCharset redundant?
>>
>> Are +ForwardKeySize and -ForwardDirectories somehow disabling what
>> +ForwardURIEscaped does?
>>
>> I have verified the data coming out of Shibboleth is what we expect.
>>
>
> I think I've found where the byte data is coming in.
>
> AjpAprProcessor.java's method:
>
> protected boolean read(byte[] buf, int pos, int n, boolean block) throws
> IOException
>
> This ultimately gives me a great big buffer of bytes. Spring Tool Suite
> shows me the relevant ones:
>
> 74 79 -61 -117 76
>
>
I think I have found where these bytes are interpreted improperly and my
problems start.

In AbstractAjpProcessor.java there is a method named  protected void
prepareRequest()

// Decode extra attributes
boolean secret = false;
byte attributeCode;
while ((attributeCode = requestHeaderMessage.getByte())
!= Constants.SC_A_ARE_DONE) {

switch (attributeCode) {

case Constants.SC_A_REQ_ATTRIBUTE :
requestHeaderMessage.getBytes(tmpMB);
String n = tmpMB.toString();
requestHeaderMessage.getBytes(tmpMB);
String v = tmpMB.toString();

I have debugged and gotten to the point where n="FirstName" - the bit of
data giving me fits

After  requestHeaderMessage.getBytes(tmpMB); (the one after String n =
) tmpMB shows "JOËL"

tmpMB is a MessageByte.  It contains a ByteChunk.which is the array of
bytes I posted yesterday.

The ByteChunk has a start=1049 and an end=1054.  That is bytes

1049: 5
1050: 74J
1051: 79O
1052: -610xFC3
1053: -117  0xF8B
1054: 76   L

The ByteChunk has a charset and it is set to ISO-8859-1

So, that explains - at least to me - where things go wrong.

Now, the question is why.

Looking at ByteChunk.java, I see it has the following:

/** Default encoding used to convert to strings. It should be UTF8,
as most standards seem to converge, but the servlet API requires
8859_1, and this object is used mostly for servlets.
*/
public static final Charset DEFAULT_CHARSET =
StandardCharsets.ISO_8859_1;

private Charset charset;

public void setCharset(Charset charset) {
this.charset = charset;
}

public Charset getCharset() {
if (charset == null) {
charset = DEFAULT_CHARSET;
}
return charset;
}

I set a breakpoint on ByteChunk.setCharset(Charset) and it is never
executed.

ByteChunk.getCharset() is called from MessageBytes.toBytes() which is
called from AjpMessage.appendBytes(MessageBytes)

So, I think this explains why my data is being interpreted incorrectly.

Now, the question becomes why isn't this line in server.xml:

 

enough to cause ByteChunk.charset to be set to "UTF-8"

Does anyone have any thoughts as to how to proceed?


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread tomcat

On 19.10.2016 00:21, Mark Thomas wrote:

On 18/10/2016 23:10, Mark Juszczec wrote:

On Oct 18, 2016 5:37 PM, "Mark Thomas"  wrote:



Java handles bytes as signed (-128 to 127) but the data in the input
stream is unsigned. The additional Fs are an artefact of whatever those
bytes were cast to.

It looks normal to me.


That's what i thought but didn't think it would hurt to double check.

What's interesting is the next level up, in CoyoteAdapter (I'll have to
double check that) in HttpServletRequest the data appears as the String

JOÃ[CTL-CHAR]L

In that String, the Ã[CTL-CHAR] are bytes 0xc3 0x83 0xc2 0x8b and is a
corruption of  Ë (0xc3 0x8b)

I'm not sure how we go from the correct bytes to 0xc3 0x8b 0xc2 0x8b.


Nor me. For the record I did test this and it worked as expected - no
corruption.

I wonder if it is worth a clean install of httpd, mod_jk and Tomcat and
then running a simple test.



I was going to suggest the same thing : a standard simple installation of httpd, mod_jk 
and Tomcat (without Shibbolet), and a simple test.

Justification :
a) I run several international (French, German, Spanish, English) applications with httpd 
+ mod_jk + tomcat, quasi "out of the box", with mostly the default parameters for mod_jk 
(e.g. no special JkOptions), since years, multiple versions of all the above, and I have 
never seen such corruption.

b) I do not use Shibbolet
c) having had a look at the httpd configuration that the OP posted, I cannot see anything 
definitely wrong, but it is certainly not an out-of-the-box from-the-apache-website 
configuration, so it is bit hard to figure out what is really going on there.

(It looks more like some pre-packaged setup for one particular application or 
framework)

d) some of the above characters/bytes sequences look quite like a double UTF-8 encoding 
took place :
- an "Ë", would be encoded in UTF-8 as the 2 bytes 0xc3 0x8b (which seen as ISO-8859-1 
bytes/characters, would look like "A tilde" followed by an unprintable control character)
- then if you considered these 2 bytes again as 2 ISO-8859-1 characters, and re-encoded 
them in UTF-8, you might indeed get something like 0xc3 0x8b 0xc2 0x8b.

(0xc3 0x8b for the "A tilde", and 0xc2 0x8b for the "control character").
(I have not really checked the exact bytes sequences, but at least they look 
plausible)

Here is a link to a great tool for that kind of thing :
http://unicode.scarfboy.com/?s=U%2b00cb


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread Mark Juszczec
On Oct 18, 2016 6:22 PM, "Mark Thomas"  wrote:
>
> I wonder if it is worth a clean install of httpd, mod_jk and Tomcat and
> then running a simple test.
>
> Mark
>

That would be difficult to justify without more evidence than ive got.

Do you know if apache has a test suite I can run against an existing
installation?


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread Mark Thomas
On 18/10/2016 23:10, Mark Juszczec wrote:
> On Oct 18, 2016 5:37 PM, "Mark Thomas"  wrote:
>>
>>
>> Java handles bytes as signed (-128 to 127) but the data in the input
>> stream is unsigned. The additional Fs are an artefact of whatever those
>> bytes were cast to.
>>
>> It looks normal to me.
> 
> That's what i thought but didn't think it would hurt to double check.
> 
> What's interesting is the next level up, in CoyoteAdapter (I'll have to
> double check that) in HttpServletRequest the data appears as the String
> 
> JOÃ[CTL-CHAR]L
> 
> In that String, the Ã[CTL-CHAR] are bytes 0xc3 0x83 0xc2 0x8b and is a
> corruption of  Ë (0xc3 0x8b)
> 
> I'm not sure how we go from the correct bytes to 0xc3 0x8b 0xc2 0x8b.

Nor me. For the record I did test this and it worked as expected - no
corruption.

I wonder if it is worth a clean install of httpd, mod_jk and Tomcat and
then running a simple test.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread Mark Juszczec
On Oct 18, 2016 5:37 PM, "Mark Thomas"  wrote:
>
>
> Java handles bytes as signed (-128 to 127) but the data in the input
> stream is unsigned. The additional Fs are an artefact of whatever those
> bytes were cast to.
>
> It looks normal to me.

That's what i thought but didn't think it would hurt to double check.

What's interesting is the next level up, in CoyoteAdapter (I'll have to
double check that) in HttpServletRequest the data appears as the String

JOÃ[CTL-CHAR]L

In that String, the Ã[CTL-CHAR] are bytes 0xc3 0x83 0xc2 0x8b and is a
corruption of  Ë (0xc3 0x8b)

I'm not sure how we go from the correct bytes to 0xc3 0x8b 0xc2 0x8b.


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread Mark Thomas
On 18/10/2016 22:29, Mark Juszczec wrote:
> On Oct 18, 2016 4:45 PM, "Mark Juszczec"  wrote:
>>
>>
>>
>> On Tue, Oct 18, 2016 at 2:58 PM, Mark Juszczec 
> wrote:
>>
>> Converting them to hex I see
>>
>> -61 = FFC3
>>
>> -117 = FF8B
>>
>> I know
>>
>> Ë = 0xC3 0x8B
>>
>> so I'm assuming the Fs are extraneous.
>>
>> However, I'd be much more comfortable with all this if I got
>>
>> C3
>> 8B
>>
>> Is this a weird debugger anomaly or is it an indication something has
>> already screwed up my data?
> 
> I don't know if this is relevant, but I when I saw the values I emailed I
> was in a remote debugging session.
> 
> I was running Spring Tool Suite on Windows (7, I think) and hooking up to
> my app running on Linux centos (7, I think).
> 
> Tomcat and apache are also running on the Linux system.

Java handles bytes as signed (-128 to 127) but the data in the input
stream is unsigned. The additional Fs are an artefact of whatever those
bytes were cast to.

It looks normal to me.

Mark


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread Mark Juszczec
On Oct 18, 2016 4:45 PM, "Mark Juszczec"  wrote:
>
>
>
> On Tue, Oct 18, 2016 at 2:58 PM, Mark Juszczec 
wrote:
>
> Converting them to hex I see
>
> -61 = FFC3
>
> -117 = FF8B
>
> I know
>
> Ë = 0xC3 0x8B
>
> so I'm assuming the Fs are extraneous.
>
> However, I'd be much more comfortable with all this if I got
>
> C3
> 8B
>
> Is this a weird debugger anomaly or is it an indication something has
already screwed up my data?

I don't know if this is relevant, but I when I saw the values I emailed I
was in a remote debugging session.

I was running Spring Tool Suite on Windows (7, I think) and hooking up to
my app running on Linux centos (7, I think).

Tomcat and apache are also running on the Linux system.


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread Mark Juszczec
On Tue, Oct 18, 2016 at 2:58 PM, Mark Juszczec 
wrote:
>
>
> Some questions (if these are not relevant, please disregard):
>
> I'm loading a whole bunch of modules.  Could some of them be incompatible?
>
> DocumentRoot refers to a directory that does not exist.  Is that a problem?
>
> What does AddLanguage do?
>
> Is AddDefaultCharset redundant?
>
> Are +ForwardKeySize and -ForwardDirectories somehow disabling what
> +ForwardURIEscaped does?
>
> I have verified the data coming out of Shibboleth is what we expect.
>

I think I've found where the byte data is coming in.

AjpAprProcessor.java's method:

protected boolean read(byte[] buf, int pos, int n, boolean block) throws
IOException

This ultimately gives me a great big buffer of bytes. Spring Tool Suite
shows me the relevant ones:

74 79 -61 -117 76

and i'm assuming they are the integer representations.

I am looking for the string JOËL

The negative numbers are a bit perplexing.

Converting them to hex I see

-61 = FFC3

-117 = FF8B

I know

Ë = 0xC3 0x8B

so I'm assuming the Fs are extraneous.

However, I'd be much more comfortable with all this if I got

C3
8B

Is this a weird debugger anomaly or is it an indication something has
already screwed up my data?


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread Mark Juszczec
On Tue, Oct 18, 2016 at 10:28 AM, Mark Juszczec 
wrote:

>
>
> On Tue, Oct 18, 2016 at 10:23 AM, André Warnier (tomcat) 
> wrote:
>>
>>
>> Good. That our goal here. We live to help :-)
>>
>>
> You all have been helpful beyond description.
>
>
>> I don't think that there is a need for a formal "petition". This being a
>> Tomcat list, and the mod_jk Connector being part of the Tomcat project
>> (despite being an Apache httpd add-on), I believe that this is in scope of
>> the list, as long as we do not stray too far in httpd-specific things.
>> One problem is that this list strips most attachments, and that posting
>> your whole configuration setup may be a bit much for pasting it into your
>> next message.
>> Is there a way by which you can post your configuration to some
>> publicly-accessible place, and provide a link ?
>>
>>
>
httpd.conf:

ServerRoot "/apache2.4.6/install"
DocumentRoot "/apache2.4.6/install/proj"
PidFile bin/someFile.pid
ServerTokens Prod
Timeout 60
KeepAlive Off
MaxKeepAliveRequests 100
KeepAliveTimeout 15
ExtendedStatus On
UseCanonicalName On
HostnameLookups Off
ServerSignature Off

ServerName my.server.name:9001

ServerAdmin root@localhost
Listen 9001
TraceEnable off
AddDefaultCharset On


#  parms deleted for space



#  parms deleted for space


LoadModule unixd_module modules/mod_unixd.so
LoadModule access_compat_module modules/mod_access_compat.so
LoadModule authz_host_module modules/mod_authz_host.so
LoadModule authz_core_module modules/mod_authz_core.so
LoadModule status_module modules/mod_status.so
LoadModule info_module modules/mod_info.so
LoadModule mpm_worker_module modules/mod_mpm_worker.so
LoadModule dir_module modules/mod_dir.so
LoadModule userdir_module modules/mod_userdir.so
LoadModule log_config_module modules/mod_log_config.so
LoadModule log_forensic_module modules/mod_log_forensic.so

LoadModule deflate_module modules/mod_deflate.so
LoadModule filter_module modules/mod_filter.so
LoadModule headers_module modules/mod_headers.so
LoadModule mime_module modules/mod_mime.so
LoadModule mime_magic_module modules/mod_mime_magic.so
LoadModule autoindex_module modules/mod_autoindex.so
LoadModule setenvif_module modules/mod_setenvif.so
LoadModule negotiation_module modules/mod_negotiation.so
LoadModule rewrite_module modules/mod_rewrite.so
LoadModule authn_core_module modules/mod_authn_core.so
LoadModule authn_file_module modules/mod_authn_file.so
LoadModule auth_basic_module  modules/mod_auth_basic.so

User myUser
Group myUser


Options FollowSymLinks
AllowOverride None
Require all denied



 ForensicLog logs/forensic_log



UserDir disabled



AllowOverride None
Require all granted



Options FollowSymLinks
AllowOverride None
Require all granted



DirectoryIndex index.html



Require all denied


ErrorLog "logs/error_log"


LogLevel mod_rewrite.c:trace8



CustomLog logs/mj_debug "%{canonical}p %{local}p %{remote}p %r %q
%{FirstName}e %{FirstName}n %{FirstName}o"


IndexOptions FancyIndexing VersionSort NameWidth=* HTMLTable

# removed icon stuff

ReadmeName README.html
HeaderName HEADER.html

IndexIgnore .??* *~ *# HEADER* README* RCS CVS *,v *,t

AddLanguage ca .ca
AddLanguage cs .cz .cs
AddLanguage da .dk
AddLanguage de .de
AddLanguage el .el
AddLanguage en .en
AddLanguage eo .eo
AddLanguage es .es
AddLanguage et .et
AddLanguage fr .fr
AddLanguage he .he
AddLanguage hr .hr
AddLanguage it .it
AddLanguage ja .ja
AddLanguage ko .ko
AddLanguage ltz .ltz
AddLanguage nl .nl
AddLanguage nn .nn
AddLanguage no .no
AddLanguage pl .po
AddLanguage pt .pt
AddLanguage pt-BR .pt-br
AddLanguage ru .ru
AddLanguage sv .sv
AddLanguage zh-CN .zh-cn
AddLanguage zh-TW .zh-tw

LanguagePriority en ca cs da de el eo es et fr he hr it ja ko ltz nl nn no
pl pt pt-BR ru sv zh-CN zh-TW

ForceLanguagePriority Prefer Fallback


TypesConfig /etc/mime.types
DefaultType None
AddType application/x-compress .Z
AddType application/x-gzip .gz .tgz
AddType text/html .shtml
AddOutputFilter INCLUDES .shtml


AddDefaultCharset UTF-8


MIMEMagicFile conf/magic


# removed a bunch of BrowserMatch directives for space

IncludeOptional conf.d/*.conf

AddOutputFilterByType DEFLATE text/html text/xml text/javascript text/css
image/bmp application/x-amf application/pdf

DeflateCompressionLevel 6


SetHandler server-status
#  parms deleted for space



 SetHandler server-info
#  parms deleted for space
 


JkMount jkstatus
#  parms deleted for space


--


located in conf.d/*.conf


LoadModule jk_module /apache2.4.6/install/modules/mod_jk.so

JkWorkersFile /apache2.4.6/install/conf.d/workers.properties

JkLogFile /apache2.4.6/install/logs/mod_jk.log

JkLogLevel trace

JkLogStampFormat "[%a %b %d %H:%M:%S %Y] "

JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories

JkRequestLogFormat "%w %V %T"


DocumentRoot 

Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread Mark Juszczec
On Tue, Oct 18, 2016 at 10:23 AM, André Warnier (tomcat) 
wrote:
>
>
> Good. That our goal here. We live to help :-)
>
>
You all have been helpful beyond description.


> I don't think that there is a need for a formal "petition". This being a
> Tomcat list, and the mod_jk Connector being part of the Tomcat project
> (despite being an Apache httpd add-on), I believe that this is in scope of
> the list, as long as we do not stray too far in httpd-specific things.
> One problem is that this list strips most attachments, and that posting
> your whole configuration setup may be a bit much for pasting it into your
> next message.
> Is there a way by which you can post your configuration to some
> publicly-accessible place, and provide a link ?
>
>
I was going to post skeletons of the conf files in the text of the emails.
I didn't plan to send them as attachements.

I recognize I'll have to edit them and strip out the parts not significant
to the conversation and i'm completely happy to do so.


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread tomcat

On 18.10.2016 16:16, Mark Juszczec wrote:

On Tue, Oct 18, 2016 at 10:10 AM, André Warnier (tomcat) 
wrote:



This being a list dedicated to Tomcat, maybe we are going a bit deep in
the Apache httpd configuration and precedence rules here.
It is anyway difficult to answer your questions, without seeing the whole
of the Apache httpd configuration files.



I certainly don't want to run afoul of the forum rules.

This is the most progress I've made in 3 weeks.



Good. That our goal here. We live to help :-)


Is there a way I can petition to be allowed to post my Apache config files
here OR is there a more suitable forum and can I invite the people
contributing to this thread to continue comment in the more suitable forum?



I don't think that there is a need for a formal "petition". This being a Tomcat list, and 
the mod_jk Connector being part of the Tomcat project (despite being an Apache httpd 
add-on), I believe that this is in scope of the list, as long as we do not stray too far 
in httpd-specific things.
One problem is that this list strips most attachments, and that posting your whole 
configuration setup may be a bit much for pasting it into your next message.
Is there a way by which you can post your configuration to some publicly-accessible place, 
and provide a link ?



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread Mark Juszczec
On Tue, Oct 18, 2016 at 10:10 AM, André Warnier (tomcat) 
wrote:

>
> This being a list dedicated to Tomcat, maybe we are going a bit deep in
> the Apache httpd configuration and precedence rules here.
> It is anyway difficult to answer your questions, without seeing the whole
> of the Apache httpd configuration files.
>
>
I certainly don't want to run afoul of the forum rules.

This is the most progress I've made in 3 weeks.

Is there a way I can petition to be allowed to post my Apache config files
here OR is there a more suitable forum and can I invite the people
contributing to this thread to continue comment in the more suitable forum?


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread tomcat

On 18.10.2016 15:22, Mark Juszczec wrote:

On Tue, Oct 18, 2016 at 9:13 AM, Mark Juszczec 
wrote:




   
 DocumentRoot /some/dir/thatDoesNotExist/
 JkEnvVar nameWithIntlChar
 JkMount /myService/* lbAjpWorker
 JkMount /myService lbAjpWorker

   



I forgot to ask something.

The above DocumentRoot does not exist.  There is another DocumentRoot
defined outside of the  if posted but it does not exist
either.

Could this have anything to do with my problem?

What should these values be set to?



This being a list dedicated to Tomcat, maybe we are going a bit deep in the Apache httpd 
configuration and precedence rules here.
It is anyway difficult to answer your questions, without seeing the whole of the Apache 
httpd configuration files.

Generally speaking :
- whatever configuration directive is outside a  section, acts as a 
"default", which is inherited by all  sections.
- if a  section contains a similar directive to one of these default values, 
then for requests to this , the directive that is inside the  
section overrides the one that is outside.


But there are some total or partial exceptions to the above rules, some of which apply to 
mod_jk (see for example JkMountCopy).


About the VirtualHost section which you list above : it looks strange to me, 
because :
- it would apply only to HTTP requests directed to port 9001, which is a bit 
unusual
- it does not seem to have a ServerName, which for name-based VirtualHosts is 
quite essential
- and without a valid DocumentRoot, any request for something other than the "JkMounted" 
URIs "/myService*" would have nowhere to be served from, and would thus return a "Not Found"


But again, without seeing the whole of the Apache httpd configuration, and without knowing 
exactly how browsers access this server, it is difficult to make a final call.

On wich platform (OS) is this running ?

Separate note : if you are more familiar or at ease with Apache httpd configuration 
sections than with the JkMount/JkUnmount directives, you may want to have a look at an 
alternative way of configuring the httpd -> Tomcat forwarding.

See this page : http://tomcat.apache.org/connectors-doc/reference/apache.html
and scroll down to the section :
Using SetHandler and Environment Variables

This method *replaces* the usage of JkMount/JkUnmount directives, by directives enclosed 
in Apache httpd  sections.  Personally, I find that (for someone familiar with 
Apache httpd) this configuration method is clearer than JkMount/JkUnmount, because with 
JkMount/JkUnmount, it is sometimes unclear what the precedence rules are with respect to 
Alias, rewrite, proxy etc..


With respect to your (later) question about JkOptions : the same above page, in the 
initial "Configuration Directives" section, clearly specifies where each "Jk*" directive 
can be used and to what scope it applies. (The "global" term means : in the part of the 
configuration which is /not/ inside a  section).




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread Mark Juszczec
On Tue, Oct 18, 2016 at 9:13 AM, Mark Juszczec 
wrote:

>
>
>   
> DocumentRoot /some/dir/thatDoesNotExist/
> JkEnvVar nameWithIntlChar
> JkMount /myService/* lbAjpWorker
> JkMount /myService lbAjpWorker
>
>   
>
>
I forgot to ask something.

The above DocumentRoot does not exist.  There is another DocumentRoot
defined outside of the  if posted but it does not exist
either.

Could this have anything to do with my problem?

What should these values be set to?


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread Mark Juszczec
On Tue, Oct 18, 2016 at 8:36 AM, André Warnier (tomcat) 
wrote:

> On 18.10.2016 13:03, Mark Juszczec wrote:
>>
>>
>> No, the following line:
>>
>> JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories
>>
>> is in an Apache conf file, but not in a VirtualHost entry.
>>
>> Can this directive go in multiple places?
>>
>>
> See here : http://tomcat.apache.org/connectors-doc/reference/apache.html
> -->Forwarding
>
> JkOptions are generally inherited by the  sections, from the
> "main" (or "default") Apache configuration (aka outside 
> entries).
> But there are exceptions and additional rules, so check carefully.
>
>
Ok.  My JkOptions directive appears outside the  entry so I'm
going to assume its being inherited.

I will experiment with placing it in the  section to see what
happens.

One quick question, do you or anyone reading know if something special
needs to be done to force % encoding when you have a workers.properties
file?

My workers.properties contains the following:

  worker.template.type=ajp13
  worker.template.ping_mode=A
  worker.template.socket_timeout=30
  worker.template.retries=2
  worker.template.host=localhost

  worker.list= someWorker, lbAjpWorker,

  worker.lbAjpWorker.reference=worker.template
  worker.lbAjpWorker.port=8045

In my conf file, someone has tied them together as follows:

  LoadModule jk_module /somepath/modules/mod_jk.so

  JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories

  
DocumentRoot /some/dir/thatDoesNotExist/
JkEnvVar nameWithIntlChar
JkMount /myService/* lbAjpWorker
JkMount /myService lbAjpWorker

  

I don't see anything that says "make sure everything going to lbAjpWorker
is % encoded"

Is the JkOptions appearing outside  supposed to take
care of that?


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread tomcat

On 18.10.2016 13:03, Mark Juszczec wrote:

On Tue, Oct 18, 2016 at 1:14 AM, Rainer Jung 
wrote:


Am 17.10.2016 um 22:38 schrieb Mark Juszczec:




I've tried adding +ForwardURIEscaped in my conf file as follows:

# JkOptions indicate to send SSL KEY SIZE,
JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories

I would have expected mod_jk log to show the data % encoded, but it does
not:

text: J O Ë ‹ L
hex: 0x4a 0x4f 0xc3 0x8b 0x4c

I had expected to see something like:

JO%C3%8BL

Is that reasonable?  Does it make sense?



Yes.

Could something be turning off the encoding?  Do the headers values need to

be set to something specific?



Did you put the directive into the correct VirtualHost?


Regards,

Rainer



No, the following line:

JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories

is in an Apache conf file, but not in a VirtualHost entry.

Can this directive go in multiple places?



See here : http://tomcat.apache.org/connectors-doc/reference/apache.html
-->Forwarding

JkOptions are generally inherited by the  sections, from the "main" (or 
"default") Apache configuration (aka outside  entries).

But there are exceptions and additional rules, so check carefully.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-18 Thread Mark Juszczec
On Tue, Oct 18, 2016 at 1:14 AM, Rainer Jung 
wrote:

> Am 17.10.2016 um 22:38 schrieb Mark Juszczec:
>
>>
>>
>> I've tried adding +ForwardURIEscaped in my conf file as follows:
>>
>> # JkOptions indicate to send SSL KEY SIZE,
>> JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories
>>
>> I would have expected mod_jk log to show the data % encoded, but it does
>> not:
>>
>> text: J O Ë ‹ L
>> hex: 0x4a 0x4f 0xc3 0x8b 0x4c
>>
>> I had expected to see something like:
>>
>> JO%C3%8BL
>>
>> Is that reasonable?  Does it make sense?
>>
>
> Yes.
>
> Could something be turning off the encoding?  Do the headers values need to
>> be set to something specific?
>>
>
> Did you put the directive into the correct VirtualHost?
>
>
> Regards,
>
> Rainer
>

No, the following line:

JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories

is in an Apache conf file, but not in a VirtualHost entry.

Can this directive go in multiple places?


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-17 Thread Rainer Jung

Am 17.10.2016 um 22:38 schrieb Mark Juszczec:

On Mon, Oct 17, 2016 at 8:20 AM, Rainer Jung 
wrote:


Am 17.10.2016 um 12:35 schrieb Mark Juszczec:


On Mon, Oct 17, 2016 at 4:29 AM, Mark Thomas  wrote:



A small hint. I'd expect those to be % encoded.



Thank you very much for your reply.

I've been thinking the problem is lack of % encoding after reading:

*"Default encoding for GET*
The character set for HTTP query strings (that's the technical term for
'GET parameters') can be found in sections 2 and 2.1 the "URI Syntax"
specification. The character set is defined to be US-ASCII
. Any character that does not map to
US-ASCII must be encoded in some way. Section 2.1 of the URI Syntax
specification says that characters outside of US-ASCII must be encoded
using
 % escape sequences: each character is encoded as a literal % followed by
the two hexadecimal codes which indicate its character code. Thus, a
(US-ASCII
character code 97 = 0x61) is equivalent to %61. There *is no default
encoding for URIs* specified anywhere, which is why there is a lot of
confusion when it comes to decoding these values. "

from http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8

Do you know if there's a way to force something (mod_jk, mod_rewrite or
something else) to % encode the data being fed into the AJP port?



You can force nod_jk to %-encode the URI before forwarding:

JkOptions +ForwardURIEscaped



I've tried adding +ForwardURIEscaped in my conf file as follows:

# JkOptions indicate to send SSL KEY SIZE,
JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories

I would have expected mod_jk log to show the data % encoded, but it does
not:

text: J O Ë ‹ L
hex: 0x4a 0x4f 0xc3 0x8b 0x4c

I had expected to see something like:

JO%C3%8BL

Is that reasonable?  Does it make sense?


Yes.


Could something be turning off the encoding?  Do the headers values need to
be set to something specific?


Did you put the directive into the correct VirtualHost?

Regards,

Rainer

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-17 Thread Mark Juszczec
On Mon, Oct 17, 2016 at 8:20 AM, Rainer Jung 
wrote:

> Am 17.10.2016 um 12:35 schrieb Mark Juszczec:
>
>> On Mon, Oct 17, 2016 at 4:29 AM, Mark Thomas  wrote:
>>
>>
>>> A small hint. I'd expect those to be % encoded.
>>>
>>>
>> Thank you very much for your reply.
>>
>> I've been thinking the problem is lack of % encoding after reading:
>>
>> *"Default encoding for GET*
>> The character set for HTTP query strings (that's the technical term for
>> 'GET parameters') can be found in sections 2 and 2.1 the "URI Syntax"
>> specification. The character set is defined to be US-ASCII
>> . Any character that does not map to
>> US-ASCII must be encoded in some way. Section 2.1 of the URI Syntax
>> specification says that characters outside of US-ASCII must be encoded
>> using
>>  % escape sequences: each character is encoded as a literal % followed by
>> the two hexadecimal codes which indicate its character code. Thus, a
>> (US-ASCII
>> character code 97 = 0x61) is equivalent to %61. There *is no default
>> encoding for URIs* specified anywhere, which is why there is a lot of
>> confusion when it comes to decoding these values. "
>>
>> from http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8
>>
>> Do you know if there's a way to force something (mod_jk, mod_rewrite or
>> something else) to % encode the data being fed into the AJP port?
>>
>
> You can force nod_jk to %-encode the URI before forwarding:
>
> JkOptions +ForwardURIEscaped
>
>
I've tried adding +ForwardURIEscaped in my conf file as follows:

# JkOptions indicate to send SSL KEY SIZE,
JkOptions +ForwardKeySize +ForwardURIEscaped -ForwardDirectories

I would have expected mod_jk log to show the data % encoded, but it does
not:

text: J O Ë ‹ L
hex: 0x4a 0x4f 0xc3 0x8b 0x4c

I had expected to see something like:

JO%C3%8BL

Is that reasonable?  Does it make sense?

Could something be turning off the encoding?  Do the headers values need to
be set to something specific?


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-17 Thread Rainer Jung

Am 17.10.2016 um 12:35 schrieb Mark Juszczec:

On Mon, Oct 17, 2016 at 4:29 AM, Mark Thomas  wrote:


On 17/10/2016 08:30, Mark Thomas wrote:

On 16/10/2016 19:09, Mark Juszczec wrote:

Hello

I have Tomcat 8.0.28 running on CentOS Linux 7.2.1511 behind Apache

2.4.6


I'm using AJP 1.3 for communication between Apache and Tomcat

Its all powered by Java 1.8

I'm having a problem with international characters when I send them as

the

request *URI* (which is used by GET requests and this is a GET request).

Let's say I get the string AOËL

mod_jk log  logs the bytes with the message

 "ajp_connection_tcp_send_message::jk_ajp_common.c (1208): sending to

ajp13

pos=4 len=1411 max=8192" (at
ajp_connection_tcp_send_message::jk_ajp_common.c) shows them to be:

  41 4f c3 8b 4c

AFAIK this means the correct bytes are being sent to AJP.  Is that

correct?


That is the correct UTF-8 byte encoding for the characters AOËL.


A small hint. I'd expect those to be % encoded.



Thank you very much for your reply.

I've been thinking the problem is lack of % encoding after reading:

*"Default encoding for GET*
The character set for HTTP query strings (that's the technical term for
'GET parameters') can be found in sections 2 and 2.1 the "URI Syntax"
specification. The character set is defined to be US-ASCII
. Any character that does not map to
US-ASCII must be encoded in some way. Section 2.1 of the URI Syntax
specification says that characters outside of US-ASCII must be encoded using
 % escape sequences: each character is encoded as a literal % followed by
the two hexadecimal codes which indicate its character code. Thus, a (US-ASCII
character code 97 = 0x61) is equivalent to %61. There *is no default
encoding for URIs* specified anywhere, which is why there is a lot of
confusion when it comes to decoding these values. "

from http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8

Do you know if there's a way to force something (mod_jk, mod_rewrite or
something else) to % encode the data being fed into the AJP port?


You can force nod_jk to %-encode the URI before forwarding:

JkOptions +ForwardURIEscaped

(see http://tomcat.apache.org/connectors-doc/webserver_howto/apache.html)

You might need to experiment whether that really fixes your issues, e.g. 
when parts of the URI are already %-encoded etc.


Regards,

Rainer

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-17 Thread Mark Juszczec
On Mon, Oct 17, 2016 at 4:29 AM, Mark Thomas  wrote:

> On 17/10/2016 08:30, Mark Thomas wrote:
> > On 16/10/2016 19:09, Mark Juszczec wrote:
> >> Hello
> >>
> >> I have Tomcat 8.0.28 running on CentOS Linux 7.2.1511 behind Apache
> 2.4.6
> >>
> >> I'm using AJP 1.3 for communication between Apache and Tomcat
> >>
> >> Its all powered by Java 1.8
> >>
> >> I'm having a problem with international characters when I send them as
> the
> >> request *URI* (which is used by GET requests and this is a GET request).
> >>
> >> Let's say I get the string AOËL
> >>
> >> mod_jk log  logs the bytes with the message
> >>
> >>  "ajp_connection_tcp_send_message::jk_ajp_common.c (1208): sending to
> ajp13
> >> pos=4 len=1411 max=8192" (at
> >> ajp_connection_tcp_send_message::jk_ajp_common.c) shows them to be:
> >>
> >>   41 4f c3 8b 4c
> >>
> >> AFAIK this means the correct bytes are being sent to AJP.  Is that
> correct?
> >
> > That is the correct UTF-8 byte encoding for the characters AOËL.
>
> A small hint. I'd expect those to be % encoded.
>

Thank you very much for your reply.

I've been thinking the problem is lack of % encoding after reading:

*"Default encoding for GET*
The character set for HTTP query strings (that's the technical term for
'GET parameters') can be found in sections 2 and 2.1 the "URI Syntax"
specification. The character set is defined to be US-ASCII
. Any character that does not map to
US-ASCII must be encoded in some way. Section 2.1 of the URI Syntax
specification says that characters outside of US-ASCII must be encoded using
 % escape sequences: each character is encoded as a literal % followed by
the two hexadecimal codes which indicate its character code. Thus, a (US-ASCII
character code 97 = 0x61) is equivalent to %61. There *is no default
encoding for URIs* specified anywhere, which is why there is a lot of
confusion when it comes to decoding these values. "

from http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8

Do you know if there's a way to force something (mod_jk, mod_rewrite or
something else) to % encode the data being fed into the AJP port?

Mark


Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-17 Thread Mark Thomas
On 17/10/2016 08:30, Mark Thomas wrote:
> On 16/10/2016 19:09, Mark Juszczec wrote:
>> Hello
>>
>> I have Tomcat 8.0.28 running on CentOS Linux 7.2.1511 behind Apache 2.4.6
>>
>> I'm using AJP 1.3 for communication between Apache and Tomcat
>>
>> Its all powered by Java 1.8
>>
>> I'm having a problem with international characters when I send them as the
>> request *URI* (which is used by GET requests and this is a GET request).
>>
>> Let's say I get the string AOËL
>>
>> mod_jk log  logs the bytes with the message
>>
>>  "ajp_connection_tcp_send_message::jk_ajp_common.c (1208): sending to ajp13
>> pos=4 len=1411 max=8192" (at
>> ajp_connection_tcp_send_message::jk_ajp_common.c) shows them to be:
>>
>>   41 4f c3 8b 4c
>>
>> AFAIK this means the correct bytes are being sent to AJP.  Is that correct?
> 
> That is the correct UTF-8 byte encoding for the characters AOËL.

A small hint. I'd expect those to be % encoded.

Mark


> 
> 
>> Running remote debugging via Spring Tool Suite to hook up to my code shows
>> me I receive:
>>
>> 41 4f c3 c3 83 c2 c2 8b 4c
> 
> That is not valid UTF-8. If the UTF-8 bytes had been treated as
> ISO-8859-1 and then re-encoded as UTF-8 I'd expect to see:
> 
> 41 4f c3 83 c2 8b 4c
> 
>> I have verified the incorrect bytes appear as early in the call stack as
>> when CoyoteAdapter.process() is invoked
> 
> I think you need to go a little further up the stack to track this down.
> 
>> I have UTF-8 specified as URIEncoding in ajp  and it has had no
>> effect.
> 
> That is the change I would have expected was required.
> 
>> Ive also specified  useBodyEncodingForURI as true with no effect.
> 
> That won't help for a GET request.
> 
>> Conventional wisdom says the data is getting inadvertently as ISO-8859-1
>> somewhere along the line. Since the data is correct (per mod_jk.log)
>> heading into AJP and incorrect once CoyoteAdapter.java starts handling it
>> somehow, something is going wrong when the data is interpreted after being
>> read from the AJP port.
>>
>> Is that correct?
> 
> It looks to be something like that.
> 
>> I am at a loss as to how to correct this.  The only 2 things the docs say
>> are to use URIEnocding="UTF-8" and  useBodyEncodingForURI="true".  I'm
>> doing that and its not working.
>>
>> I am at a loss about what else to try or where to look.
>>
>> If you were faced with this, what would you try?  Any advice or suggestions
>> will be greatly appreciated.
> 
> I'd dig into the connector code. You need to figure out where those
> bytes are being transformed and why.
> 
> Mark
> 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
> 


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-17 Thread Mark Thomas
On 16/10/2016 19:09, Mark Juszczec wrote:
> Hello
> 
> I have Tomcat 8.0.28 running on CentOS Linux 7.2.1511 behind Apache 2.4.6
> 
> I'm using AJP 1.3 for communication between Apache and Tomcat
> 
> Its all powered by Java 1.8
> 
> I'm having a problem with international characters when I send them as the
> request *URI* (which is used by GET requests and this is a GET request).
> 
> Let's say I get the string AOËL
> 
> mod_jk log  logs the bytes with the message
> 
>  "ajp_connection_tcp_send_message::jk_ajp_common.c (1208): sending to ajp13
> pos=4 len=1411 max=8192" (at
> ajp_connection_tcp_send_message::jk_ajp_common.c) shows them to be:
> 
>   41 4f c3 8b 4c
> 
> AFAIK this means the correct bytes are being sent to AJP.  Is that correct?

That is the correct UTF-8 byte encoding for the characters AOËL.


> Running remote debugging via Spring Tool Suite to hook up to my code shows
> me I receive:
> 
> 41 4f c3 c3 83 c2 c2 8b 4c

That is not valid UTF-8. If the UTF-8 bytes had been treated as
ISO-8859-1 and then re-encoded as UTF-8 I'd expect to see:

41 4f c3 83 c2 8b 4c

> I have verified the incorrect bytes appear as early in the call stack as
> when CoyoteAdapter.process() is invoked

I think you need to go a little further up the stack to track this down.

> I have UTF-8 specified as URIEncoding in ajp  and it has had no
> effect.

That is the change I would have expected was required.

> Ive also specified  useBodyEncodingForURI as true with no effect.

That won't help for a GET request.

> Conventional wisdom says the data is getting inadvertently as ISO-8859-1
> somewhere along the line. Since the data is correct (per mod_jk.log)
> heading into AJP and incorrect once CoyoteAdapter.java starts handling it
> somehow, something is going wrong when the data is interpreted after being
> read from the AJP port.
> 
> Is that correct?

It looks to be something like that.

> I am at a loss as to how to correct this.  The only 2 things the docs say
> are to use URIEnocding="UTF-8" and  useBodyEncodingForURI="true".  I'm
> doing that and its not working.
> 
> I am at a loss about what else to try or where to look.
> 
> If you were faced with this, what would you try?  Any advice or suggestions
> will be greatly appreciated.

I'd dig into the connector code. You need to figure out where those
bytes are being transformed and why.

Mark


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

2016-10-16 Thread Mark Juszczec
Hello

I have Tomcat 8.0.28 running on CentOS Linux 7.2.1511 behind Apache 2.4.6

I'm using AJP 1.3 for communication between Apache and Tomcat

Its all powered by Java 1.8

I'm having a problem with international characters when I send them as the
request *URI* (which is used by GET requests and this is a GET request).

Let's say I get the string AOËL

mod_jk log  logs the bytes with the message

 "ajp_connection_tcp_send_message::jk_ajp_common.c (1208): sending to ajp13
pos=4 len=1411 max=8192" (at
ajp_connection_tcp_send_message::jk_ajp_common.c) shows them to be:

  41 4f c3 8b 4c

AFAIK this means the correct bytes are being sent to AJP.  Is that correct?

Running remote debugging via Spring Tool Suite to hook up to my code shows
me I receive:

41 4f c3 c3 83 c2 c2 8b 4c

I have verified the incorrect bytes appear as early in the call stack as
when CoyoteAdapter.process() is invoked

I have UTF-8 specified as URIEncoding in ajp  and it has had no
effect.

Ive also specified  useBodyEncodingForURI as true with no effect.

Conventional wisdom says the data is getting inadvertently as ISO-8859-1
somewhere along the line. Since the data is correct (per mod_jk.log)
heading into AJP and incorrect once CoyoteAdapter.java starts handling it
somehow, something is going wrong when the data is interpreted after being
read from the AJP port.

Is that correct?

I am at a loss as to how to correct this.  The only 2 things the docs say
are to use URIEnocding="UTF-8" and  useBodyEncodingForURI="true".  I'm
doing that and its not working.

I am at a loss about what else to try or where to look.

If you were faced with this, what would you try?  Any advice or suggestions
will be greatly appreciated.


Mark