In my experience the biggest problem with encoding are the browsers, especially the older ones. For example, most browsers handle URI encoding differently. Some issues are historically and some issues are just plain wrong (for example some browsers mix encodings: they do the url part in latin-1 and the query part in utf-8). The only way I usually get everything working is to *force* everywhere where it's possible everything to utf-8. This includes setting the URIEncoding in tomcat/jetty, adding the spring encoding filter, setting the jsp page encoding in the web.xml, setting the meta tags in html, specifying form encodings for forms, etc. etc.
On a side note: don't enable mod_php when your using mod_jk. It will cause your urls to get double encoded :-( Regards, Bart On Fri, Oct 9, 2009 at 9:14 AM, Dennis Dam <[email protected]> wrote: >> >> >> According to a javadoc document of spring framework, it says, "current >> browsers typically do not set a character encoding even if specified >> in the HTML page or form." [1] >> So, I think we need to assume that the request encoding is one >> specific one. Currently we have a good alternative one: UTF-8. >> Before UTF-8 is not popular one, I used to determine the encoding >> based on the user's language. (e.g., "ko" : KSC5601 or EUC-KR, "en" : >> ISO-8859-1, "ja" : Shift_JIS, ...) >> However, I think you don't have any problem with the assumption of >> UTF-8 today in most cases. >> >> [1] >> http://static.springsource.org/spring/docs/2.5.x/api/org/springframework/web/filter/CharacterEncodingFilter.html >> >> > Yes, I agree you have to assume something, but I guess the assumptions > differ according to how requests are submitted. The browser sometimes sends > it in utf-8 (GET methods), and sometimes in latin-1 (POST method). In case > of the POST parameters, the browser didn't set an encoding (encoding is > null), in case of the GET parameters, the correct character set is set by > the browser (encoding is utf-8). A third case is POST submits executed > javascript ("ajax"), which also set the correct character set. > > I did a little experiment with a test filter, which converts parameters from > latin-1 to utf-8, in case the encoding is null, and then prints those > parameters, with some debug code. POST parameters submitted from regular > forms (no ajax) are shown correctly once they are converted to utf-8. > > > >> > >> > Ofcourse I can fix it with a workaround: implement a filter that converts >> > only POST parameters, if the incoming encoding is NULL or anything else >> than >> > utf-8. But I'd rather like to solve the cause of the problem :) >> >> You don't have to implement new filter. As Ard mentioned, you can use >> "CharacterEncodingFilter" of Spring Framework. If >> request.setCharacterEncoding() has been ever invoked, then >> request.getParameter() returns a converted string from the container >> encoding to target encoding. The "CharacterEncodingFilter" is doing >> this. >> >> > I'm not sure if the filter will work in my case .. it only sends the > encoding if the request encoding is *not* null. In the case of FORM posts, > the encoding is null. I will give a try though! > > >> >> Regards, >> >> Woonsan >> >> > >> > regards >> > Dennis >> > >> > >> > On Wed, Oct 7, 2009 at 2:24 PM, Dennis Dam <[email protected]> wrote: >> > >> >> >> >> >> >> On Wed, Oct 7, 2009 at 2:09 PM, Bartosz Oudekerk < >> [email protected]>wrote: >> >> >> >>> Ard Schrijvers wrote: >> >>> >> >>>> 1. added system property -Dfile.encoding=UTF-8 to catalina.sh >> >>>>> 2. added URIEncoding=utf-8 to the 8080 connector in conf/server.xml >> >>>>> 3. the container-encoding init parameter for the Cocoon servlet is >> set >> >>>>> to >> >>>>> "ISO-8859-1". >> >>>>> 4. the form-encoding init parameter for the Cocoon servlet is set to >> >>>>> "utf-8" >> >>>>> >> >>>> >> >>>> why is (3) not utf-8?? >> >>>> >> >>> >> >>> Because if it's set to UTF-8, then things tend to get doubly encoded. >> >>> >> >> >> >> >> >> not exactly.. it depends on the value of "form-encoding", if >> >> container-encoding + form-encoding are identical (e.g. both set to >> utf-8), >> >> then Cocoon does not perform encoding conversions. >> >> >> >> Why is (3) not set to UTF-8?? Because then the form POST encoding is >> broken >> >> :)) With ISO the form GETs are broken .. :) >> >> >> >> >> >>> Regards, >> >>> -- >> >>> Bartosz Oudekerk >> >>> .---------------------------------.-----------------------------------. >> >>> | Hippo B.V. | Hippo USA Inc. | >> >>> | Oosteinde 11 | 101 H Street, suite Q Petaluma CA | >> >>> | 1017 WT Amsterdam | 94952-5100 San Francisco | >> >>> | The Netherlands | United States | >> >>> | Tel +31 (0)20 5224466 | +1 (707) 773-4646 | >> >>> +---------------------------------+-----------------------------------+ >> >>> | [email protected] | http://www.onehippo.com >> | >> >>> >> >>> `---------------------------------^-----------------------------------' >> >>> ******************************************** >> >>> Hippocms-dev: Hippo CMS development public mailinglist >> >>> >> >>> Searchable archives can be found at: >> >>> MarkMail: http://hippocms-dev.markmail.org >> >>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >> >>> >> >>> >> >> >> >> >> >> -- >> >> Hippo B.V. - Amsterdam >> >> Oosteinde 11, 1017 WT, Amsterdam, +31(0)20-5224466 >> >> >> >> Hippo USA Inc. - San Francisco >> >> 101 H Street, Suite Q, Petaluma CA, 94952-3329, +1 (707) 773-4646 >> >> ----------------------------------------------------------------- >> >> http://www.onehippo.com - [email protected] >> >> ----------------------------------------------------------------- >> >> >> >> >> > >> > >> > -- >> > Hippo B.V. - Amsterdam >> > Oosteinde 11, 1017 WT, Amsterdam, +31(0)20-5224466 >> > >> > Hippo USA Inc. - San Francisco >> > 101 H Street, Suite Q, Petaluma CA, 94952-3329, +1 (707) 773-4646 >> > ----------------------------------------------------------------- >> > http://www.onehippo.com - [email protected] >> > ----------------------------------------------------------------- >> > ******************************************** >> > Hippocms-dev: Hippo CMS development public mailinglist >> > >> > Searchable archives can be found at: >> > MarkMail: http://hippocms-dev.markmail.org >> > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >> > >> > >> >> >> >> -- >> [email protected] www.onehippo.com >> EUROPE • AMSTERDAM - Hippo B.V. Oosteinde 11 1017 WT Amsterdam >> +31(0)20-5224466 >> NORTH AMERICA • SAN FRANCISCO - Hippo USA Inc. 185 H Street, Suite B >> Petaluma CA 94952 +1 (877) 414-4776 >> ******************************************** >> Hippocms-dev: Hippo CMS development public mailinglist >> >> Searchable archives can be found at: >> MarkMail: http://hippocms-dev.markmail.org >> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >> >> > > > -- > Hippo B.V. - Amsterdam > Oosteinde 11, 1017 WT, Amsterdam, +31(0)20-5224466 > > Hippo USA Inc. - San Francisco > 101 H Street, Suite Q, Petaluma CA, 94952-3329, +1 (707) 773-4646 > ----------------------------------------------------------------- > http://www.onehippo.com - [email protected] > ----------------------------------------------------------------- > ******************************************** > Hippocms-dev: Hippo CMS development public mailinglist > > Searchable archives can be found at: > MarkMail: http://hippocms-dev.markmail.org > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > > -- Hippo B.V. - Amsterdam Oosteinde 11, 1017 WT, Amsterdam, +31(0)20-5224466 Hippo USA Inc. - San Francisco 101 H Street, Suite Q, Petaluma CA, 94952-3329, +1 (707) 773-4646 ----------------------------------------------------------------- http://www.onehippo.com - [email protected] ----------------------------------------------------------------- ******************************************** Hippocms-dev: Hippo CMS development public mailinglist Searchable archives can be found at: MarkMail: http://hippocms-dev.markmail.org Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
