form encoding issues
Hi, I'm stumbling on a character encoding issue (cocoon-2.1.10) and really can't see why. Apparently, text input in a form is passed on in a wrong encoding. I've set Cocoon's default encoding in all thinkable places as UTF-8: web.xml: servlet servlet-nameCocoon/servlet-name !-- .. -- init-param param-namecontainer-encoding/param-name param-valueUTF-8/param-value /init-param init-param param-nameform-encoding/param-name param-valueUTF-8/param-value /init-param !-- ... -- /servlet sitemap.xmap map:serializer logger=sitemap.serializer.xhtml mime-type=text/html name=xhtml pool-max=${xhtml-serializer.pool-max} src=org.apache.cocoon.serialization.XMLSerializer doctype-public-//W3C//DTD XHTML 1.0 Transitional//EN/doctype-public doctype-systemhttp://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd/doctype-system encodingUTF-8/encoding /map:serializer Yet, when I execute following pipeline: map:match pattern=test map:generate src=test.xml/ map:transform src=test.xsl map:parameter name=use-request-parameters value=true/ /map:transform map:serialize type=xhtml/ /map:match ...with following minimal source files: test.xml === ?xml version=1.0 encoding=UTF-8? test/ test.xsl (which will mainly echo the previous input) == ?xml version=1.0 encoding=UTF-8? xsl:stylesheet xmlns:xsl=http://www.w3.org/1999/XSL/Transform; version=2.0 xsl:param name=input/ xsl:template match=/ html head meta http-equiv=Content-type content=text/html; charset=UTF-8 / /head body form action=test accept-charset=UTF-8 method=get input type=text value={$input} name=input/ input type=submit/ /form pcurrent input: xsl:value-of select=$input//p /body /html /xsl:template /xsl:stylesheet Yet, entering a string with accented characters, like e.g. 'très annoying', this comes out as: 'très annoying'... On the other hand, when entering the according URL (http://localhost:/test?input=tr%C3%A8s+annoying) directly, the characters are passed on correctly. Does anyone know how this can be fixed? Any hints much appreciated! Ron Van den Branden - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
RE: form encoding issues
Not sure as for how to do this with Cocoon2.1.x but with Cocoon2.2 you need to set following properties in the META-INF/cocoon.properties - org.apache.cocoon.containerencoding=utf-8 org.apache.cocoon.formencoding=utf-8 Hope this gets you looking in the right direction. Cheers, Robby Pelssers -Oorspronkelijk bericht- Van: Ron Van den Branden [mailto:ron.vandenbran...@kantl.be] Verzonden: wo 29-9-2010 11:11 Aan: users@cocoon.apache.org Onderwerp: form encoding issues Hi, I'm stumbling on a character encoding issue (cocoon-2.1.10) and really can't see why. Apparently, text input in a form is passed on in a wrong encoding. I've set Cocoon's default encoding in all thinkable places as UTF-8: web.xml: servlet servlet-nameCocoon/servlet-name !-- .. -- init-param param-namecontainer-encoding/param-name param-valueUTF-8/param-value /init-param init-param param-nameform-encoding/param-name param-valueUTF-8/param-value /init-param !-- ... -- /servlet sitemap.xmap map:serializer logger=sitemap.serializer.xhtml mime-type=text/html name=xhtml pool-max=${xhtml-serializer.pool-max} src=org.apache.cocoon.serialization.XMLSerializer doctype-public-//W3C//DTD XHTML 1.0 Transitional//EN/doctype-public doctype-systemhttp://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd/doctype-system encodingUTF-8/encoding /map:serializer Yet, when I execute following pipeline: map:match pattern=test map:generate src=test.xml/ map:transform src=test.xsl map:parameter name=use-request-parameters value=true/ /map:transform map:serialize type=xhtml/ /map:match ...with following minimal source files: test.xml === ?xml version=1.0 encoding=UTF-8? test/ test.xsl (which will mainly echo the previous input) == ?xml version=1.0 encoding=UTF-8? xsl:stylesheet xmlns:xsl=http://www.w3.org/1999/XSL/Transform; version=2.0 xsl:param name=input/ xsl:template match=/ html head meta http-equiv=Content-type content=text/html; charset=UTF-8 / /head body form action=test accept-charset=UTF-8 method=get input type=text value={$input} name=input/ input type=submit/ /form pcurrent input: xsl:value-of select=$input//p /body /html /xsl:template /xsl:stylesheet Yet, entering a string with accented characters, like e.g. 'très annoying', this comes out as: 'très annoying'... On the other hand, when entering the according URL (http://localhost:/test?input=tr%C3%A8s+annoying) directly, the characters are passed on correctly. Does anyone know how this can be fixed? Any hints much appreciated! Ron Van den Branden - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org winmail.dat - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
Re: form encoding issues
Hi, check out request character encoding. For tomcat look at http://confluence.atlassian.com/display/DOC/Configuring+Tomcat%27s+URI+encoding and in your tomcat installation at webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java that worked for me regards Thomas Am 29.09.2010 11:11, schrieb Ron Van den Branden: Hi, I'm stumbling on a character encoding issue (cocoon-2.1.10) and really can't see why. Apparently, text input in a form is passed on in a wrong encoding. I've set Cocoon's default encoding in all thinkable places as UTF-8: web.xml: servlet servlet-nameCocoon/servlet-name !-- .. -- init-param param-namecontainer-encoding/param-name param-valueUTF-8/param-value /init-param init-param param-nameform-encoding/param-name param-valueUTF-8/param-value /init-param !-- ... -- /servlet sitemap.xmap map:serializer logger=sitemap.serializer.xhtml mime-type=text/html name=xhtml pool-max=${xhtml-serializer.pool-max} src=org.apache.cocoon.serialization.XMLSerializer doctype-public-//W3C//DTD XHTML 1.0 Transitional//EN/doctype-public doctype-systemhttp://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd/doctype-system encodingUTF-8/encoding /map:serializer Yet, when I execute following pipeline: map:match pattern=test map:generate src=test.xml/ map:transform src=test.xsl map:parameter name=use-request-parameters value=true/ /map:transform map:serialize type=xhtml/ /map:match ...with following minimal source files: test.xml === ?xml version=1.0 encoding=UTF-8? test/ test.xsl (which will mainly echo the previous input) == ?xml version=1.0 encoding=UTF-8? xsl:stylesheet xmlns:xsl=http://www.w3.org/1999/XSL/Transform; version=2.0 xsl:param name=input/ xsl:template match=/ html head meta http-equiv=Content-type content=text/html; charset=UTF-8 / /head body form action=test accept-charset=UTF-8 method=get input type=text value={$input} name=input/ input type=submit/ /form pcurrent input: xsl:value-of select=$input//p /body /html /xsl:template /xsl:stylesheet Yet, entering a string with accented characters, like e.g. 'très annoying', this comes out as: 'très annoying'... On the other hand, when entering the according URL (http://localhost:/test?input=tr%C3%A8s+annoying) directly, the characters are passed on correctly. Does anyone know how this can be fixed? Any hints much appreciated! Ron Van den Branden - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
Re: form encoding issues
Hi again, Thank you very much for the quick help; meanwhile I think I found an answer in a post on cocoon-dev: http://markmail.org/message/nm6bnvqztbee4s5o. There is stated that apparently (and counter-intuitively, IMO), 'request parameters are always decoded using ISO-8859-1 ', and that consequently 'container_encoding should always be ISO-8859-1 (unless you have a broken servlet container), and form_encoding should be the same one as on your serializer.'. And lo: changing the (over-eager?) container-encoding parameter in web.xml back to the default: init-param param-namecontainer-encoding/param-name param-valueISO-8859-1/param-value /init-param ...seems to do the trick! (phew!) (note: I found this info also at http://wiki.apache.org/cocoon/RequestParameterEncoding#A3._Decoding_incoming_requests:_Servlet_Container) Thanks anyway, Ron - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
Re: form encoding issues
On 09/29/2010 12:43 PM, Ron Van den Branden wrote: Hi again, Thank you very much for the quick help; meanwhile I think I found an answer in a post on cocoon-dev: http://markmail.org/message/nm6bnvqztbee4s5o. There is stated that apparently (and counter-intuitively, IMO), 'request parameters are always decoded using ISO-8859-1 ', and that consequently 'container_encoding should always be ISO-8859-1 (unless you have a broken servlet container), and form_encoding should be the same one as on your serializer.'. Actually, Tomcat does, but Jetty does not (by default, UTF8). According to specification, servlet engine are suppose to decode using ISO-8859-1 by default. And lo: changing the (over-eager?) container-encoding parameter in web.xml back to the default: init-param param-namecontainer-encoding/param-name param-valueISO-8859-1/param-value /init-param Do I understand this correctly: you have encoded everything in UTF8, but to able to read your input fields (UTF8) you need to decode their value with ISO-8859-1 on the server? I have had cases where the browser was encoding in ISO-8859-1 despite the presence of Content-type set to text/html; charset=UTF-8 (it simply ignored the HTTP header value). ...seems to do the trick! (phew!) (note: I found this info also at http://wiki.apache.org/cocoon/RequestParameterEncoding#A3._Decoding_incoming_requests:_Servlet_Container) Thanks anyway, Ron - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org -- Andre H. Juffer | Phone: +358-8-553 1161 Biocenter Oulu and | Fax: +358-8-553-1141 Department of Biochemistry | Email: andre.juf...@oulu.fi University of Oulu, Finland | WWW: www.biochem.oulu.fi/Biocomputing/ StruBioCat | WWW: www.strubiocat.oulu.fi NordProt | WWW: www.nordprot.org Triacle Biocomputing | WWW: www.triacle-bc.com - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
Re: form encoding issues
thats right but you are bound to ISO-8895-1 we use UTF-8 in all stages with my comments. regards Thomas Am 29.09.2010 11:43, schrieb Ron Van den Branden: Hi again, Thank you very much for the quick help; meanwhile I think I found an answer in a post on cocoon-dev: http://markmail.org/message/nm6bnvqztbee4s5o. There is stated that apparently (and counter-intuitively, IMO), 'request parameters are always decoded using ISO-8859-1 ', and that consequently 'container_encoding should always be ISO-8859-1 (unless you have a broken servlet container), and form_encoding should be the same one as on your serializer.'. And lo: changing the (over-eager?) container-encoding parameter in web.xml back to the default: init-param param-namecontainer-encoding/param-name param-valueISO-8859-1/param-value /init-param ...seems to do the trick! (phew!) (note: I found this info also at http://wiki.apache.org/cocoon/RequestParameterEncoding#A3._Decoding_incoming_requests:_Servlet_Container) Thanks anyway, Ron - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
Re: form encoding issues
Hi Thomas, I'm not much of an expert in encoding matters, and could indeed be happy with ISO-8859-1 instead of UTF-8. However, testing with ISO-8859-1 set as container-encoding, even Arabic input is passed through correctly: ص (Arabic letter 'sad' - http://www.fileformat.info/info/unicode/char/0635/index.htm) comes out as it has been entered. Does this mean that this (default) ISO-8859-1 container encoding does cater for UTF-8 correctly? Otherwise, would you mind expanding on your webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java suggestion (I'm not much of a Java expert, either ;-))? OTOH, I don't see any difference between cocoon running in either Tomcat or the shipped Jetty. Kind regards, Ron On 29/09/2010 12:11, Thomas Markus wrote: thats right but you are bound to ISO-8895-1 we use UTF-8 in all stages with my comments. regards Thomas - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
Re: form encoding issues
Hi Andre, On 29/09/2010 12:01, Andre Juffer wrote: Actually, Tomcat does, but Jetty does not (by default, UTF8). According to specification, servlet engine are suppose to decode using ISO-8859-1 by default. I don't see any difference between both. And lo: changing the (over-eager?) container-encoding parameter in web.xml back to the default: init-param param-namecontainer-encoding/param-name param-valueISO-8859-1/param-value /init-param Do I understand this correctly: you have encoded everything in UTF8, but to able to read your input fields (UTF8) you need to decode their value with ISO-8859-1 on the server? Apparently: even Arabic text comes out fine with ISO-8859-1, not with UTF-8 (as I've mentioned in another reply on the ML). I have had cases where the browser was encoding in ISO-8859-1 despite the presence of Content-type set to text/html; charset=UTF-8 (it simply ignored the HTTP header value). All my browsers interpret my test case as UTF-8 (with container-encoding set to ISO-8859-1)... Kind regards, Ron - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
Re: form encoding issues
hi, that arabic character should fail with latin1. we see a difference between jetty and tomcat (6.0). tomcat follows specs (see Andre's mail) and uses iso per default. you can switch completely to UTF-8 with: - send html content in utf-8 - set container-encoding to utf-8 - set form-encoding to utf-8 - set URIEncoding to utf-8 - and include a class like SetCharacterEncodingFilter to set request character encoding regards Thomas Am 29.09.2010 12:36, schrieb Ron Van den Branden: Hi Thomas, I'm not much of an expert in encoding matters, and could indeed be happy with ISO-8859-1 instead of UTF-8. However, testing with ISO-8859-1 set as container-encoding, even Arabic input is passed through correctly: ص (Arabic letter 'sad' - http://www.fileformat.info/info/unicode/char/0635/index.htm) comes out as it has been entered. Does this mean that this (default) ISO-8859-1 container encoding does cater for UTF-8 correctly? Otherwise, would you mind expanding on your webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java suggestion (I'm not much of a Java expert, either ;-))? OTOH, I don't see any difference between cocoon running in either Tomcat or the shipped Jetty. Kind regards, Ron On 29/09/2010 12:11, Thomas Markus wrote: thats right but you are bound to ISO-8895-1 we use UTF-8 in all stages with my comments. regards Thomas - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
Re: form encoding issues
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ron, On 9/29/2010 5:43 AM, Ron Van den Branden wrote: There is stated that apparently (and counter-intuitively, IMO), 'request parameters are always decoded using ISO-8859-1 ', and that consequently 'container_encoding should always be ISO-8859-1 (unless you have a broken servlet container), and form_encoding should be the same one as on your serializer.'. Note that it's not /all/ parameters that are decoded using ISO-8859-1: it's only GET parameters. If you use POST, you will likely have better results. Note that this means you can't send anything with non-ISO-8859-1 characters in GET parameters safely. There are three solutions: 1. Always use POST (not really a bad idea, but not always practical) 2. Force your container to use UTF-8 to decode GET parameters (in Tomcat, this can be accomplished using the URIEncoding attribute of the Connector element: see your own container's documentation for similar capabilities) 3. Never send strings as GET parameters (similar to #1, but somewhat different: perhaps use HttpSession or other strategies to avoid passing strings through the URL Good luck, - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkyjRP0ACgkQ9CaO5/Lv0PCwEgCZAXF/2nyM3qyQN4twApw1uvM7 IRsAoJiI91NyLyMIJ30kT3pMf/KHRB7B =9sJ3 -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
Re: form encoding issues
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thomas, On 9/29/2010 7:05 AM, Thomas Markus wrote: hi, that arabic character should fail with latin1. we see a difference between jetty and tomcat (6.0). tomcat follows specs (see Andre's mail) and uses iso per default. you can switch completely to UTF-8 with: - send html content in utf-8 - set container-encoding to utf-8 - set form-encoding to utf-8 - set URIEncoding to utf-8 - and include a class like SetCharacterEncodingFilter to set request character encoding Note that this item sets the character encoding for reading request /bodies/ and not GET parameters from the URL. It also only sets the request character encoding if the client has not set it. All these issues are covered in this Tomcat document, though the content is generally applicable to all containers: http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8 - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkyjUBgACgkQ9CaO5/Lv0PCSUwCfan2R1diQzmoMj6s6Aohgyvw8 Lx0AnA7jrQeEoQjbum7rEzEhHI/iuvEm =23lE -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
Re: form encoding issues
Hello I followed the instruction here http://cocoon.apache.org/2.2/1366_1_1.html . For cocoon-2.1.11 I set init-param param-namecontainer-encoding/param-name param-valueUTF-8/param-value /init-param init-param param-nameform-encoding/param-name param-valueUTF-8/param-value /init-param in my web.xml instead of org.apache.cocoon.containerencoding=utf-8 and org.apache.cocoon.formencoding=utf-8. I had to create SetCharacterEncodingFilter as well. All works fine in utf-8. Barbara Hi, I'm stumbling on a character encoding issue (cocoon-2.1.10) and really can't see why. Apparently, text input in a form is passed on in a wrong encoding. I've set Cocoon's default encoding in all thinkable places as UTF-8: web.xml: servlet servlet-nameCocoon/servlet-name !-- .. -- init-param param-namecontainer-encoding/param-name param-valueUTF-8/param-value /init-param init-param param-nameform-encoding/param-name param-valueUTF-8/param-value /init-param !-- ... -- /servlet sitemap.xmap map:serializer logger=sitemap.serializer.xhtml mime-type=text/ html name=xhtml pool-max=${xhtml-serializer.pool-max} src=org.apache.cocoon.serialization.XMLSerializer doctype-public-//W3C//DTD XHTML 1.0 Transitional//EN/doctype- public doctype-systemhttp://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd /doctype-system encodingUTF-8/encoding /map:serializer Yet, when I execute following pipeline: map:match pattern=test map:generate src=test.xml/ map:transform src=test.xsl map:parameter name=use-request-parameters value=true/ /map:transform map:serialize type=xhtml/ /map:match ...with following minimal source files: test.xml === ?xml version=1.0 encoding=UTF-8? test/ test.xsl (which will mainly echo the previous input) == ?xml version=1.0 encoding=UTF-8? xsl:stylesheet xmlns:xsl=http://www.w3.org/1999/XSL/Transform; version=2.0 xsl:param name=input/ xsl:template match=/ html head meta http-equiv=Content-type content=text/html; charset=UTF-8 / /head body form action=test accept-charset=UTF-8 method=get input type=text value={$input} name=input/ input type=submit/ /form pcurrent input: xsl:value-of select=$input//p /body /html /xsl:template /xsl:stylesheet Yet, entering a string with accented characters, like e.g. 'très annoying', this comes out as: 'très annoying'... On the other hand, when entering the according URL (http://localhost:/test?input=tr%C3%A8s+annoying ) directly, the characters are passed on correctly. Does anyone know how this can be fixed? Any hints much appreciated! Ron Van den Branden - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org