form encoding issues

2010-09-29 Thread Ron Van den Branden

 Hi,

I'm stumbling on a character encoding issue (cocoon-2.1.10) and really 
can't see why. Apparently, text input in a form is passed on in a wrong 
encoding. I've set Cocoon's default encoding in all thinkable places as 
UTF-8:


web.xml:

servlet
servlet-nameCocoon/servlet-name
!-- .. --
init-param
param-namecontainer-encoding/param-name
param-valueUTF-8/param-value
/init-param
init-param
param-nameform-encoding/param-name
param-valueUTF-8/param-value
/init-param
!-- ... --
/servlet

sitemap.xmap

map:serializer logger=sitemap.serializer.xhtml mime-type=text/html 
name=xhtml
pool-max=${xhtml-serializer.pool-max} 
src=org.apache.cocoon.serialization.XMLSerializer

doctype-public-//W3C//DTD XHTML 1.0 Transitional//EN/doctype-public
doctype-systemhttp://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd/doctype-system
encodingUTF-8/encoding
/map:serializer

Yet, when I execute following pipeline:

map:match pattern=test
map:generate src=test.xml/
map:transform src=test.xsl
map:parameter name=use-request-parameters value=true/
/map:transform
map:serialize type=xhtml/
/map:match

...with following minimal source files:

test.xml
===
?xml version=1.0 encoding=UTF-8?
test/

test.xsl (which will mainly echo the previous input)
==
?xml version=1.0 encoding=UTF-8?
xsl:stylesheet xmlns:xsl=http://www.w3.org/1999/XSL/Transform; 
version=2.0

xsl:param name=input/
xsl:template match=/
html
head
meta http-equiv=Content-type content=text/html; charset=UTF-8 /
/head
body
form action=test accept-charset=UTF-8 method=get
input type=text value={$input} name=input/
input type=submit/
/form
pcurrent input: xsl:value-of select=$input//p
/body
/html
/xsl:template
/xsl:stylesheet

Yet, entering a string with accented characters, like e.g. 'très 
annoying', this comes out as: 'très annoying'...
On the other hand, when entering the according URL 
(http://localhost:/test?input=tr%C3%A8s+annoying) directly, the 
characters are passed on correctly. Does anyone know how this can be fixed?


Any hints much appreciated!

Ron Van den Branden

-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



RE: form encoding issues

2010-09-29 Thread Robby Pelssers

Not sure as for how to do this with Cocoon2.1.x but with Cocoon2.2 you need to 
set following properties in the

META-INF/cocoon.properties
-
org.apache.cocoon.containerencoding=utf-8
org.apache.cocoon.formencoding=utf-8


Hope this gets you looking in the right direction.

Cheers,
Robby Pelssers


-Oorspronkelijk bericht-
Van: Ron Van den Branden [mailto:ron.vandenbran...@kantl.be]
Verzonden: wo 29-9-2010 11:11
Aan: users@cocoon.apache.org
Onderwerp: form encoding issues
 

  Hi,

I'm stumbling on a character encoding issue (cocoon-2.1.10) and really 
can't see why. Apparently, text input in a form is passed on in a wrong 
encoding. I've set Cocoon's default encoding in all thinkable places as 
UTF-8:

web.xml:

servlet
servlet-nameCocoon/servlet-name
!-- .. --
init-param
param-namecontainer-encoding/param-name
param-valueUTF-8/param-value
/init-param
init-param
param-nameform-encoding/param-name
param-valueUTF-8/param-value
/init-param
!-- ... --
/servlet

sitemap.xmap

map:serializer logger=sitemap.serializer.xhtml mime-type=text/html 
name=xhtml
 pool-max=${xhtml-serializer.pool-max} 
src=org.apache.cocoon.serialization.XMLSerializer
doctype-public-//W3C//DTD XHTML 1.0 Transitional//EN/doctype-public
doctype-systemhttp://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd/doctype-system
encodingUTF-8/encoding
/map:serializer

Yet, when I execute following pipeline:

map:match pattern=test
map:generate src=test.xml/
map:transform src=test.xsl
map:parameter name=use-request-parameters value=true/
/map:transform
map:serialize type=xhtml/
/map:match

...with following minimal source files:

test.xml
===
?xml version=1.0 encoding=UTF-8?
test/

test.xsl (which will mainly echo the previous input)
==
?xml version=1.0 encoding=UTF-8?
xsl:stylesheet xmlns:xsl=http://www.w3.org/1999/XSL/Transform; 
version=2.0
xsl:param name=input/
xsl:template match=/
html
head
meta http-equiv=Content-type content=text/html; charset=UTF-8 /
/head
body
form action=test accept-charset=UTF-8 method=get
input type=text value={$input} name=input/
input type=submit/
/form
pcurrent input: xsl:value-of select=$input//p
/body
/html
/xsl:template
/xsl:stylesheet

Yet, entering a string with accented characters, like e.g. 'très 
annoying', this comes out as: 'très annoying'...
On the other hand, when entering the according URL 
(http://localhost:/test?input=tr%C3%A8s+annoying) directly, the 
characters are passed on correctly. Does anyone know how this can be fixed?

Any hints much appreciated!

Ron Van den Branden

-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org


winmail.dat
-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org

Re: form encoding issues

2010-09-29 Thread Thomas Markus

 Hi,

check out request character encoding. For tomcat look at 
http://confluence.atlassian.com/display/DOC/Configuring+Tomcat%27s+URI+encoding 
and in your tomcat installation at 
webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java


that worked for me

regards
Thomas


Am 29.09.2010 11:11, schrieb Ron Van den Branden:

Hi,

I'm stumbling on a character encoding issue (cocoon-2.1.10) and really 
can't see why. Apparently, text input in a form is passed on in a 
wrong encoding. I've set Cocoon's default encoding in all thinkable 
places as UTF-8:


web.xml:

servlet
servlet-nameCocoon/servlet-name
!-- .. --
init-param
param-namecontainer-encoding/param-name
param-valueUTF-8/param-value
/init-param
init-param
param-nameform-encoding/param-name
param-valueUTF-8/param-value
/init-param
!-- ... --
/servlet

sitemap.xmap

map:serializer logger=sitemap.serializer.xhtml 
mime-type=text/html name=xhtml
pool-max=${xhtml-serializer.pool-max} 
src=org.apache.cocoon.serialization.XMLSerializer

doctype-public-//W3C//DTD XHTML 1.0 Transitional//EN/doctype-public
doctype-systemhttp://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd/doctype-system 


encodingUTF-8/encoding
/map:serializer

Yet, when I execute following pipeline:

map:match pattern=test
map:generate src=test.xml/
map:transform src=test.xsl
map:parameter name=use-request-parameters value=true/
/map:transform
map:serialize type=xhtml/
/map:match

...with following minimal source files:

test.xml
===
?xml version=1.0 encoding=UTF-8?
test/

test.xsl (which will mainly echo the previous input)
==
?xml version=1.0 encoding=UTF-8?
xsl:stylesheet xmlns:xsl=http://www.w3.org/1999/XSL/Transform; 
version=2.0

xsl:param name=input/
xsl:template match=/
html
head
meta http-equiv=Content-type content=text/html; charset=UTF-8 /
/head
body
form action=test accept-charset=UTF-8 method=get
input type=text value={$input} name=input/
input type=submit/
/form
pcurrent input: xsl:value-of select=$input//p
/body
/html
/xsl:template
/xsl:stylesheet

Yet, entering a string with accented characters, like e.g. 'très 
annoying', this comes out as: 'très annoying'...
On the other hand, when entering the according URL 
(http://localhost:/test?input=tr%C3%A8s+annoying) directly, the 
characters are passed on correctly. Does anyone know how this can be 
fixed?


Any hints much appreciated!

Ron Van den Branden

-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org




-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



Re: form encoding issues

2010-09-29 Thread Ron Van den Branden

 Hi again,

Thank you very much for the quick help; meanwhile I think I found an 
answer in a post on cocoon-dev: 
http://markmail.org/message/nm6bnvqztbee4s5o. There is stated that 
apparently (and counter-intuitively, IMO), 'request parameters are 
always decoded using ISO-8859-1 ',  and that consequently 
'container_encoding should always be ISO-8859-1 (unless you have a 
broken servlet container), and form_encoding should be the same one as 
on your serializer.'.


And lo: changing the  (over-eager?) container-encoding parameter in 
web.xml back to the default:

init-param
param-namecontainer-encoding/param-name
param-valueISO-8859-1/param-value
/init-param

...seems to do the trick!
(phew!)

(note: I found this info also at 
http://wiki.apache.org/cocoon/RequestParameterEncoding#A3._Decoding_incoming_requests:_Servlet_Container) 



Thanks anyway,

Ron

-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



Re: form encoding issues

2010-09-29 Thread Andre Juffer

On 09/29/2010 12:43 PM, Ron Van den Branden wrote:

 Hi again,

Thank you very much for the quick help; meanwhile I think I found an 
answer in a post on cocoon-dev: 
http://markmail.org/message/nm6bnvqztbee4s5o. There is stated that 
apparently (and counter-intuitively, IMO), 'request parameters are 
always decoded using ISO-8859-1 ',  and that consequently 
'container_encoding should always be ISO-8859-1 (unless you have a 
broken servlet container), and form_encoding should be the same one as 
on your serializer.'.


Actually, Tomcat does, but Jetty does not (by default, UTF8). According 
to specification, servlet engine are suppose to decode using ISO-8859-1 
by default.




And lo: changing the  (over-eager?) container-encoding parameter in 
web.xml back to the default:

init-param
param-namecontainer-encoding/param-name
param-valueISO-8859-1/param-value
/init-param


Do I understand this correctly: you have encoded everything in UTF8, but 
to able to read your input fields (UTF8) you need to decode their value 
with ISO-8859-1 on the server?


I have had cases where the browser was encoding in ISO-8859-1 despite 
the presence of Content-type set to text/html; charset=UTF-8 (it 
simply ignored the HTTP header value).




...seems to do the trick!
(phew!)

(note: I found this info also at 
http://wiki.apache.org/cocoon/RequestParameterEncoding#A3._Decoding_incoming_requests:_Servlet_Container) 



Thanks anyway,

Ron

-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org




--
Andre H. Juffer  | Phone: +358-8-553 1161
Biocenter Oulu and   | Fax: +358-8-553-1141
Department of Biochemistry   | Email: andre.juf...@oulu.fi
University of Oulu, Finland  | WWW: www.biochem.oulu.fi/Biocomputing/
StruBioCat   | WWW: www.strubiocat.oulu.fi
NordProt | WWW: www.nordprot.org
Triacle Biocomputing | WWW: www.triacle-bc.com


-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



Re: form encoding issues

2010-09-29 Thread Thomas Markus

 thats right but you are bound to ISO-8895-1

we use UTF-8 in all stages with my comments.

regards
Thomas

Am 29.09.2010 11:43, schrieb Ron Van den Branden:

 Hi again,

Thank you very much for the quick help; meanwhile I think I found an 
answer in a post on cocoon-dev: 
http://markmail.org/message/nm6bnvqztbee4s5o. There is stated that 
apparently (and counter-intuitively, IMO), 'request parameters are 
always decoded using ISO-8859-1 ',  and that consequently 
'container_encoding should always be ISO-8859-1 (unless you have a 
broken servlet container), and form_encoding should be the same one as 
on your serializer.'.


And lo: changing the  (over-eager?) container-encoding parameter in 
web.xml back to the default:

init-param
param-namecontainer-encoding/param-name
param-valueISO-8859-1/param-value
/init-param

...seems to do the trick!
(phew!)

(note: I found this info also at 
http://wiki.apache.org/cocoon/RequestParameterEncoding#A3._Decoding_incoming_requests:_Servlet_Container) 



Thanks anyway,

Ron

-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org




-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



Re: form encoding issues

2010-09-29 Thread Ron Van den Branden

 Hi Thomas,

I'm not much of an expert in encoding matters, and could indeed be happy 
with ISO-8859-1 instead of UTF-8.


However, testing with ISO-8859-1 set as container-encoding, even Arabic 
input is passed through correctly: ص (Arabic letter 'sad' - 
http://www.fileformat.info/info/unicode/char/0635/index.htm) comes out 
as it has been entered.


Does this mean that this (default) ISO-8859-1 container encoding does 
cater for UTF-8 correctly? Otherwise, would you mind expanding on your 
webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java 
suggestion (I'm not much of a Java expert, either ;-))?


OTOH, I don't see any difference between cocoon running in either Tomcat 
or the shipped Jetty.


Kind regards,

Ron

On 29/09/2010 12:11, Thomas Markus wrote:

thats right but you are bound to ISO-8895-1

we use UTF-8 in all stages with my comments.

regards
Thomas




-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



Re: form encoding issues

2010-09-29 Thread Ron Van den Branden

 Hi Andre,

On 29/09/2010 12:01, Andre Juffer wrote:
Actually, Tomcat does, but Jetty does not (by default, UTF8). 
According to specification, servlet engine are suppose to decode using 
ISO-8859-1 by default.


I don't see any difference between both.




And lo: changing the  (over-eager?) container-encoding parameter in 
web.xml back to the default:

init-param
param-namecontainer-encoding/param-name
param-valueISO-8859-1/param-value
/init-param


Do I understand this correctly: you have encoded everything in UTF8, 
but to able to read your input fields (UTF8) you need to decode their 
value with ISO-8859-1 on the server?


Apparently: even Arabic text comes out fine with ISO-8859-1, not with 
UTF-8 (as I've mentioned in another reply on the ML).


I have had cases where the browser was encoding in ISO-8859-1 despite 
the presence of Content-type set to text/html; charset=UTF-8 (it 
simply ignored the HTTP header value).


All my browsers interpret my test case as UTF-8 (with container-encoding 
set to ISO-8859-1)...


Kind regards,

Ron

-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



Re: form encoding issues

2010-09-29 Thread Thomas Markus

 hi,

that arabic character should fail with latin1.

we see a difference between jetty and tomcat (6.0). tomcat follows specs 
(see Andre's mail) and uses iso per default. you can switch completely 
to UTF-8 with:

- send html content in utf-8
- set container-encoding to utf-8
- set form-encoding to utf-8
- set URIEncoding to utf-8
- and include a class like SetCharacterEncodingFilter to set request 
character encoding


regards
Thomas

Am 29.09.2010 12:36, schrieb Ron Van den Branden:

Hi Thomas,

I'm not much of an expert in encoding matters, and could indeed be 
happy with ISO-8859-1 instead of UTF-8.


However, testing with ISO-8859-1 set as container-encoding, even 
Arabic input is passed through correctly: ص (Arabic letter 'sad' - 
http://www.fileformat.info/info/unicode/char/0635/index.htm) comes out 
as it has been entered.


Does this mean that this (default) ISO-8859-1 container encoding does 
cater for UTF-8 correctly? Otherwise, would you mind expanding on your 
webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java 
suggestion (I'm not much of a Java expert, either ;-))?


OTOH, I don't see any difference between cocoon running in either 
Tomcat or the shipped Jetty.


Kind regards,

Ron

On 29/09/2010 12:11, Thomas Markus wrote:

thats right but you are bound to ISO-8895-1

we use UTF-8 in all stages with my comments.

regards
Thomas




-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org




-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



Re: form encoding issues

2010-09-29 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ron,

On 9/29/2010 5:43 AM, Ron Van den Branden wrote:
 There is stated that
 apparently (and counter-intuitively, IMO), 'request parameters are
 always decoded using ISO-8859-1 ',  and that consequently
 'container_encoding should always be ISO-8859-1 (unless you have a
 broken servlet container), and form_encoding should be the same one as
 on your serializer.'.

Note that it's not /all/ parameters that are decoded using ISO-8859-1:
it's only GET parameters. If you use POST, you will likely have better
results.

Note that this means you can't send anything with non-ISO-8859-1
characters in GET parameters safely. There are three solutions:

1. Always use POST (not really a bad idea, but not always practical)
2. Force your container to use UTF-8 to decode GET parameters
   (in Tomcat, this can be accomplished using the URIEncoding
attribute of the Connector element: see your own container's
documentation for similar capabilities)
3. Never send strings as GET parameters (similar to #1, but somewhat
   different: perhaps use HttpSession or other strategies to avoid
   passing strings through the URL

Good luck,
- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkyjRP0ACgkQ9CaO5/Lv0PCwEgCZAXF/2nyM3qyQN4twApw1uvM7
IRsAoJiI91NyLyMIJ30kT3pMf/KHRB7B
=9sJ3
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



Re: form encoding issues

2010-09-29 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Thomas,

On 9/29/2010 7:05 AM, Thomas Markus wrote:
  hi,
 
 that arabic character should fail with latin1.
 
 we see a difference between jetty and tomcat (6.0). tomcat follows specs
 (see Andre's mail) and uses iso per default. you can switch completely
 to UTF-8 with:
 - send html content in utf-8
 - set container-encoding to utf-8
 - set form-encoding to utf-8
 - set URIEncoding to utf-8
 - and include a class like SetCharacterEncodingFilter to set request
 character encoding

Note that this item sets the character encoding for reading request
/bodies/ and not GET parameters from the URL. It also only sets the
request character encoding if the client has not set it.

All these issues are covered in this Tomcat document, though the content
is generally applicable to all containers:

http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkyjUBgACgkQ9CaO5/Lv0PCSUwCfan2R1diQzmoMj6s6Aohgyvw8
Lx0AnA7jrQeEoQjbum7rEzEhHI/iuvEm
=23lE
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



Re: form encoding issues

2010-09-29 Thread Barbara Slupik

Hello

I followed the instruction here http://cocoon.apache.org/2.2/1366_1_1.html 
. For cocoon-2.1.11 I set


init-param
  param-namecontainer-encoding/param-name
  param-valueUTF-8/param-value
/init-param

init-param
  param-nameform-encoding/param-name
  param-valueUTF-8/param-value
/init-param

in my web.xml instead of org.apache.cocoon.containerencoding=utf-8 and  
org.apache.cocoon.formencoding=utf-8. I had to create  
SetCharacterEncodingFilter as well. All works fine in utf-8.


Barbara


Hi,

I'm stumbling on a character encoding issue (cocoon-2.1.10) and  
really can't see why. Apparently, text input in a form is passed on  
in a wrong encoding. I've set Cocoon's default encoding in all  
thinkable places as UTF-8:


web.xml:

servlet
servlet-nameCocoon/servlet-name
!-- .. --
init-param
param-namecontainer-encoding/param-name
param-valueUTF-8/param-value
/init-param
init-param
param-nameform-encoding/param-name
param-valueUTF-8/param-value
/init-param
!-- ... --
/servlet

sitemap.xmap

map:serializer logger=sitemap.serializer.xhtml mime-type=text/ 
html name=xhtml
   pool-max=${xhtml-serializer.pool-max}  
src=org.apache.cocoon.serialization.XMLSerializer
doctype-public-//W3C//DTD XHTML 1.0 Transitional//EN/doctype- 
public
doctype-systemhttp://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd 
/doctype-system

encodingUTF-8/encoding
/map:serializer

Yet, when I execute following pipeline:

map:match pattern=test
map:generate src=test.xml/
map:transform src=test.xsl
map:parameter name=use-request-parameters value=true/
/map:transform
map:serialize type=xhtml/
/map:match

...with following minimal source files:

test.xml
===
?xml version=1.0 encoding=UTF-8?
test/

test.xsl (which will mainly echo the previous input)
==
?xml version=1.0 encoding=UTF-8?
xsl:stylesheet xmlns:xsl=http://www.w3.org/1999/XSL/Transform;  
version=2.0

xsl:param name=input/
xsl:template match=/
html
head
meta http-equiv=Content-type content=text/html; charset=UTF-8 /
/head
body
form action=test accept-charset=UTF-8 method=get
input type=text value={$input} name=input/
input type=submit/
/form
pcurrent input: xsl:value-of select=$input//p
/body
/html
/xsl:template
/xsl:stylesheet

Yet, entering a string with accented characters, like e.g. 'très  
annoying', this comes out as: 'très annoying'...
On the other hand, when entering the according URL (http://localhost:/test?input=tr%C3%A8s+annoying 
) directly, the characters are passed on correctly. Does anyone  
know how this can be fixed?


Any hints much appreciated!

Ron Van den Branden

-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org