Re: Migrating to tomcat 6 gives formatted currency amounts problem
Johnny Kewl wrote: If you do decide to look at this link... http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale The above link seems to be extremely informative, right on the spot for this thread. Thanks. Among other things, it points out that changing the default locale for the Tomcat JVM (as I am forced to do to make this servlet work properly) may be unsafe, see the What is the default encoding? paragraph. André - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Caldarale, Charles R wrote: From: Christopher Schultz [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem so, Java is still 16-bit Unicode in its char primitive, but you can use ints to hold UTF-16 values using 21-bits? The 21-bit values are represented by pairs of Java chars, the first from the UTF-16 high-surrogate range, the second from the low-surrogate range. The 21-bit code point can be accessed as an int by some of the java.lang.Character methods introduced in 1.5. especially since java.lang.Character only takes a char as a constructor parameter :( Yes, I think all the new Character methods related to code points are static; there are corresponding instance methods in java.lang.String though. There is some information about this in the link that Johnny pointed out (excellent and very readable document in general (for a change)) : http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale paragraph : How is text represented in the Java platform? And there is more here : http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html#unicode It's amazing what one finds, when one knows what one is looking for.. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
- Original Message - From: André Warnier [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Saturday, September 13, 2008 2:01 PM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem Johnny Kewl wrote: If you do decide to look at this link... http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale The above link seems to be extremely informative, right on the spot for this thread. Thanks. Among other things, it points out that changing the default locale for the Tomcat JVM (as I am forced to do to make this servlet work properly) may be unsafe, see the What is the default encoding? paragraph. André Chuck knows his stuff... I treat char sets as black box's, the things work or dont ;) pain in the butt... UTF8 just seems to work... pretty safe bet. I'm not very scientific when it comes to this stuff... but I get the need for unicode... America forgot about the rest of the world when they made ascii... and we been suffering ever since... ha ha Diff perspective ;) Chuck talks code points and all that good stuff... but I think that when they made double byte codes and microsoft did their thing and then Java tried to fix it for the rest of the world... that they must have made that original ascii a subset of other codes... So I think... if a client is expecting ISO, or UTF and it is in fact ascii because of those locale issues... I dont think it will break down completely... it will still get that ascii... but german chars and all the rest, will just be ?... high chars as you call them will fail. Its actually amazing how clever these guys are that moved US into a system that included the forgotten world ;) without breaking the whole thing... Trouble is... us humans see the fonts and have to guess whats going on underneath... Its not over... the opera browser already talks... and just wait for it... musical mood and emphasis will become a thing and then they going to need more bytes... so we can have singing sentences in 150 languages... and some clever guy is going to map the past into the future again... When Gate's off spring have taken over the planet ;) Our kids will be saying... how come the my budgie sounds like a dog barking... in IE 306 Someone will say ...are you using EUTF256 Will be another long thread... its going to get worse ;) --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Caldarale, Charles R wrote: I'm not sure these days what the normal web character set really is. If you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling symbol is not present. However, for any of the ISO-8859-x variants, it is present, using the 163 (0xA3) value you noted (same as the Unicode code point). It's also in UTF-8 of course, but requires two bytes (0xC2 0xA3) to represent the code point. I love these discussions about character sets. They seem to confuse so many people; even I, who have been involved in them for 30 years... Anyway, I have a related question, which I don't think constitutes a hijack of this thread, because the underlying cause is probably similar. Here it goes : Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x) The above Tomcat's running under the same Linux or Solaris, essentially set up the same way. The JVM may vary, but I don't think that is the problem, because of the consistency of the problem as explained below. I am running a webapp from an external supplier, always the same binary version. I don't have the code, can't see what's in it. The pages served by that webapp are the same html pages, all of them having a declaration meta http-equiv=Content-Type content=text/html; charset=iso-8859-1. The pages also *are* properly encoded as iso-8859-1 (100% positive, I know the difference). The browser receiving the pages is always the same one, same settings. Now, case a) in the Tomcat startup files, I do nothing, meaning I just take Tomcat out-of-the-box and run the webapp. Result : in any such html page that contains characters with an ISO-8859 codepoint above \xA0 (meaning the displayable characters of the high part of the table, where one finds things like uppercase A with umlaut), these characters - appear in the browser display as ? (minus the quotes) - also if I save the page from the browser to disk, and look at them with an iso-8859-1 capable editor, they are effectively ?. (So it's not the browser misunderstanding them, it is Tomcat sending them that way). case b) In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or even in /etc/init.d/tomcat5.5), I add the following line LC_CTYPE=en_us.iso88591 (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE) (before the actual start of Tomcat) and restart Tomcat then the same page displays properly in the browser, and also is correct iso-8859-1 when saved to disk and examined with the editor. (In other words, what previously were ? characters, are now the correct iso-8859-1 character bytes). Now my question is : How can it matter which LC_CTYPE Tomcat is started under, that would have the result above ? The behaviour above is consistent across different hosts, across the same or different Tomcat versions, it is always the same webapp, always the same html pages, always the same browser, etc. Only that LC_CTYPE line changes the behaviour. On the face of it, the only thing I can think of that would explain this, is that the webapp in question does something wrong, but what exactly could it be doing ? Any ideas ? Thanks in advance, André - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
- Original Message - From: Caldarale, Charles R [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Friday, September 12, 2008 6:01 AM Subject: RE: Migrating to tomcat 6 gives formatted currency amounts problem From: Johnny Kewl [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem http://www.kewlstuff.co.za/test/test.htm What do you see in this test page? Depends on which character encoding I choose to view the page in. For the declared UTF-8, FF3 shows the invalid hex value at that spot in your page. If I override that with say ISO-8859-15, the R in a circle appears. Note that no font is involved here, just the encoding declaration. You need to get over this fixation with fonts - they have absolutely nothing to do with this issue. A font is just a graphical description of how to draw one or more code points on an output device, based on the font designer's take on what each code point should look like. It's the character encoding that tells the message recipient what code point to generate for a given bit pattern; only after the code point is determined does any font get involved to create the visible symbol. This is a great site to get lost in for a few days: http://www.unicode.org/ - Chuck Yes, I do that, mix terminology But can I just get your opinion on this... If this locale stuff is in fact defaulting to an ISO char set that can do these symbols... and say you where making a non english page, say Japanese... do you think that its possible to use it? I've actually now seen examples on the web that are doing it Wil's way, they using the getCurrencyInstance to make the currency symbols. And it is the most natural thing in the world for a coder to want to do... the functions are synonymous with internationalization. Its probably in the Java manaul... But I'm thinking its a US/Eng only methodology... when applied to a web page. Do you think using getCurrencyInstance is generalizable in other languages? When you say If I override that with say ISO-8859-15, is that the whole page you talking about, or it possible to have different character encoding sections in a web page thats another area thats confusing me now, because if I do look at that test page in a MS tool... it displays correctly with mixed encodings? You see... people are saying in a well designed web page... its a suggestion, I get that. But when you choose a font in a text editor like Swing or Word, you are also picking some character set... and thats whats been injected into the page as its been formed... Or in a MS localization panel, if you choose you want Verdana as a default font... these systems dont throw character sets at users, it just picks one in the background... thus my analogy... and its the cross over between these systems thats got me confused ;) I screw up terminology... ok we all know that but Does Wil need to worry about the way he is doing it?... thats all I'm asking... I think so... Thanks... --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
2008/9/12 André Warnier [EMAIL PROTECTED]: Caldarale, Charles R wrote: I'm not sure these days what the normal web character set really is. If you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling symbol is not present. However, for any of the ISO-8859-x variants, it is present, using the 163 (0xA3) value you noted (same as the Unicode code point). It's also in UTF-8 of course, but requires two bytes (0xC2 0xA3) to represent the code point. I love these discussions about character sets. They seem to confuse so many people; even I, who have been involved in them for 30 years... Anyway, I have a related question, which I don't think constitutes a hijack of this thread, because the underlying cause is probably similar. Here it goes : Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x) The above Tomcat's running under the same Linux or Solaris, essentially set up the same way. The JVM may vary, but I don't think that is the problem, because of the consistency of the problem as explained below. I am running a webapp from an external supplier, always the same binary version. I don't have the code, can't see what's in it. The pages served by that webapp are the same html pages, all of them having a declaration meta http-equiv=Content-Type content=text/html; charset=iso-8859-1. The pages also *are* properly encoded as iso-8859-1 (100% positive, I know the difference). The browser receiving the pages is always the same one, same settings. Now, case a) in the Tomcat startup files, I do nothing, meaning I just take Tomcat out-of-the-box and run the webapp. Result : in any such html page that contains characters with an ISO-8859 codepoint above \xA0 (meaning the displayable characters of the high part of the table, where one finds things like uppercase A with umlaut), these characters - appear in the browser display as ? (minus the quotes) - also if I save the page from the browser to disk, and look at them with an iso-8859-1 capable editor, they are effectively ?. (So it's not the browser misunderstanding them, it is Tomcat sending them that way). case b) In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or even in /etc/init.d/tomcat5.5), I add the following line LC_CTYPE=en_us.iso88591 (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE) (before the actual start of Tomcat) and restart Tomcat then the same page displays properly in the browser, and also is correct iso-8859-1 when saved to disk and examined with the editor. (In other words, what previously were ? characters, are now the correct iso-8859-1 character bytes). Now my question is : How can it matter which LC_CTYPE Tomcat is started under, that would have the result above ? The behaviour above is consistent across different hosts, across the same or different Tomcat versions, it is always the same webapp, always the same html pages, always the same browser, etc. Only that LC_CTYPE line changes the behaviour. On the face of it, the only thing I can think of that would explain this, is that the webapp in question does something wrong, but what exactly could it be doing ? Any ideas ? It is [EMAIL PROTECTED] pageEncoding=... % that is missing from those pages. Thus JSP compiler does not know what encoding they are using for their source and messes them at compilation time. AFAIK (but never tried) it can be configured without modifying the sources using the jsp-config element in web.xml. It can be done in the default one in conf/web.xml. The configuration element is described in JSP.3.3.4 of JSP2.0 spec. By the way: in my pages I usually declare [EMAIL PROTECTED] contentType=text/html; charset=... pageEncoding=... % and add META http-equiv=Content-type content=%=response.getContentType() % Thus both HTTP Content-Type: header and the META tag are present in my response and are always in sync. Best regards, Konstantin Kolinko - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
OK, Wil you made me do some homework... got it sorted for you You must not guess the Charset... as we been doing. Use this function System.out.print(CharSet : + Charset.defaultCharset().toString()); and thats what you HAVE TO set your page at On my system it tells me its. windows-1252 On Solaris if you running in a C Locale... it will be US-ASCII If you running in a US locale it will be ISO-8859-1 Now you doing Ajax, so I imagine you may want to inject this stuff in DIV statements... I'll let someone else try answer that... mission impossible... I think. So... you have to convert character sets from what the locale is using... from the looks of things different on every single machine and OS... to what you using in the web page proper... probably UTF8 if you are internationalizing ... it a headache... rather refactor your code so the pages are all the same charset of your choosing and work with pound, yen dollar anyway use that function to get the decoding that is actually been used... they all changed from outside Java... in linux itself by the user... so you cannot guess... and then how you going to try get that Ajax into DIV's and tables using Javascript and DHtml or whatever only you know ;) . Dont do it.. --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Then one last thing before I put this in my little black book of things I'm never going to do... and forget about it forever ;) This is what windows does If the machine is on US English... Regardless of the local I set... German, English, Japanese I set in Java the charset is always windows-1252... which is basically ISO with differences... But if I switch the machine back to Japanese... then its windows-32j So thats what you injecting into your web pages... when using Java locale functions... in a web page... Maybe thats what a person wants and in a company, using these local functions and every user is on Windows... it may just work ... thats actually scary... Nice question --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Konstantin Kolinko wrote: 2008/9/12 André Warnier [EMAIL PROTECTED]: Caldarale, Charles R wrote: I'm not sure these days what the normal web character set really is. If you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling symbol is not present. However, for any of the ISO-8859-x variants, it is present, using the 163 (0xA3) value you noted (same as the Unicode code point). It's also in UTF-8 of course, but requires two bytes (0xC2 0xA3) to represent the code point. I love these discussions about character sets. They seem to confuse so many people; even I, who have been involved in them for 30 years... Anyway, I have a related question, which I don't think constitutes a hijack of this thread, because the underlying cause is probably similar. Here it goes : Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x) The above Tomcat's running under the same Linux or Solaris, essentially set up the same way. The JVM may vary, but I don't think that is the problem, because of the consistency of the problem as explained below. I am running a webapp from an external supplier, always the same binary version. I don't have the code, can't see what's in it. The pages served by that webapp are the same html pages, all of them having a declaration meta http-equiv=Content-Type content=text/html; charset=iso-8859-1. The pages also *are* properly encoded as iso-8859-1 (100% positive, I know the difference). The browser receiving the pages is always the same one, same settings. Now, case a) in the Tomcat startup files, I do nothing, meaning I just take Tomcat out-of-the-box and run the webapp. Result : in any such html page that contains characters with an ISO-8859 codepoint above \xA0 (meaning the displayable characters of the high part of the table, where one finds things like uppercase A with umlaut), these characters - appear in the browser display as ? (minus the quotes) - also if I save the page from the browser to disk, and look at them with an iso-8859-1 capable editor, they are effectively ?. (So it's not the browser misunderstanding them, it is Tomcat sending them that way). case b) In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or even in /etc/init.d/tomcat5.5), I add the following line LC_CTYPE=en_us.iso88591 (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE) (before the actual start of Tomcat) and restart Tomcat then the same page displays properly in the browser, and also is correct iso-8859-1 when saved to disk and examined with the editor. (In other words, what previously were ? characters, are now the correct iso-8859-1 character bytes). Now my question is : How can it matter which LC_CTYPE Tomcat is started under, that would have the result above ? The behaviour above is consistent across different hosts, across the same or different Tomcat versions, it is always the same webapp, always the same html pages, always the same browser, etc. Only that LC_CTYPE line changes the behaviour. On the face of it, the only thing I can think of that would explain this, is that the webapp in question does something wrong, but what exactly could it be doing ? Any ideas ? It is [EMAIL PROTECTED] pageEncoding=... % that is missing from those pages. Thus JSP compiler does not know what encoding they are using for their source and messes them at compilation time. [...] But these pages, as far as Tomcat and the webapp are concerned, are not dynamic in any way. They are sraight static html pages. So is the JSP stuff relevant ? (I'm genuinely asking, since I know nothing about JSP pages) - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
2008/9/12 André Warnier [EMAIL PROTECTED] Konstantin Kolinko wrote: 2008/9/12 André Warnier [EMAIL PROTECTED]: Caldarale, Charles R wrote: I'm not sure these days what the normal web character set really is. If you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling symbol is not present. However, for any of the ISO-8859-x variants, it is present, using the 163 (0xA3) value you noted (same as the Unicode code point). It's also in UTF-8 of course, but requires two bytes (0xC2 0xA3) to represent the code point. I love these discussions about character sets. They seem to confuse so many people; even I, who have been involved in them for 30 years... Anyway, I have a related question, which I don't think constitutes a hijack of this thread, because the underlying cause is probably similar. Here it goes : Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x) The above Tomcat's running under the same Linux or Solaris, essentially set up the same way. The JVM may vary, but I don't think that is the problem, because of the consistency of the problem as explained below. I am running a webapp from an external supplier, always the same binary version. I don't have the code, can't see what's in it. The pages served by that webapp are the same html pages, all of them having a declaration meta http-equiv=Content-Type content=text/html; charset=iso-8859-1. The pages also *are* properly encoded as iso-8859-1 (100% positive, I know the difference). The browser receiving the pages is always the same one, same settings. Now, case a) in the Tomcat startup files, I do nothing, meaning I just take Tomcat out-of-the-box and run the webapp. Result : in any such html page that contains characters with an ISO-8859 codepoint above \xA0 (meaning the displayable characters of the high part of the table, where one finds things like uppercase A with umlaut), these characters - appear in the browser display as ? (minus the quotes) - also if I save the page from the browser to disk, and look at them with an iso-8859-1 capable editor, they are effectively ?. (So it's not the browser misunderstanding them, it is Tomcat sending them that way). case b) In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or even in /etc/init.d/tomcat5.5), I add the following line LC_CTYPE=en_us.iso88591 (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE) (before the actual start of Tomcat) and restart Tomcat then the same page displays properly in the browser, and also is correct iso-8859-1 when saved to disk and examined with the editor. (In other words, what previously were ? characters, are now the correct iso-8859-1 character bytes). Now my question is : How can it matter which LC_CTYPE Tomcat is started under, that would have the result above ? The behaviour above is consistent across different hosts, across the same or different Tomcat versions, it is always the same webapp, always the same html pages, always the same browser, etc. Only that LC_CTYPE line changes the behaviour. On the face of it, the only thing I can think of that would explain this, is that the webapp in question does something wrong, but what exactly could it be doing ? Any ideas ? It is [EMAIL PROTECTED] pageEncoding=... % that is missing from those pages. Thus JSP compiler does not know what encoding they are using for their source and messes them at compilation time. [...] But these pages, as far as Tomcat and the webapp are concerned, are not dynamic in any way. They are straight static html pages. So is the JSP stuff relevant ? (I'm genuinely asking, since I know nothing about JSP pages) The static HTML pages, as well as all the other static files, are served by the DefaultServlet. You should dig there. I think that fileEncoding initialization parameter of the servlet, as well as mime-mapping settings in web.xml come into play. JSP settings are irrelevant for them, of course. Best regards, Konstantin Kolinko
Re: Migrating to tomcat 6 gives formatted currency amounts problem
- Original Message - From: André Warnier [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Friday, September 12, 2008 10:08 AM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem Caldarale, Charles R wrote: I'm not sure these days what the normal web character set really is. If you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling symbol is not present. However, for any of the ISO-8859-x variants, it is present, using the 163 (0xA3) value you noted (same as the Unicode code point). It's also in UTF-8 of course, but requires two bytes (0xC2 0xA3) to represent the code point. I love these discussions about character sets. They seem to confuse so many people; even I, who have been involved in them for 30 years... Anyway, I have a related question, which I don't think constitutes a hijack of this thread, because the underlying cause is probably similar. Here it goes : Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x) The above Tomcat's running under the same Linux or Solaris, essentially set up the same way. The JVM may vary, but I don't think that is the problem, because of the consistency of the problem as explained below. I am running a webapp from an external supplier, always the same binary version. I don't have the code, can't see what's in it. The pages served by that webapp are the same html pages, all of them having a declaration meta http-equiv=Content-Type content=text/html; charset=iso-8859-1. The pages also *are* properly encoded as iso-8859-1 (100% positive, I know the difference). The browser receiving the pages is always the same one, same settings. Now, case a) in the Tomcat startup files, I do nothing, meaning I just take Tomcat out-of-the-box and run the webapp. Result : in any such html page that contains characters with an ISO-8859 codepoint above \xA0 (meaning the displayable characters of the high part of the table, where one finds things like uppercase A with umlaut), these characters - appear in the browser display as ? (minus the quotes) - also if I save the page from the browser to disk, and look at them with an iso-8859-1 capable editor, they are effectively ?. (So it's not the browser misunderstanding them, it is Tomcat sending them that way). case b) In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or even in /etc/init.d/tomcat5.5), I add the following line LC_CTYPE=en_us.iso88591 (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE) (before the actual start of Tomcat) and restart Tomcat then the same page displays properly in the browser, and also is correct iso-8859-1 when saved to disk and examined with the editor. (In other words, what previously were ? characters, are now the correct iso-8859-1 character bytes). Now my question is : How can it matter which LC_CTYPE Tomcat is started under, that would have the result above ? The behaviour above is consistent across different hosts, across the same or different Tomcat versions, it is always the same webapp, always the same html pages, always the same browser, etc. Only that LC_CTYPE line changes the behaviour. On the face of it, the only thing I can think of that would explain this, is that the webapp in question does something wrong, but what exactly could it be doing ? Any ideas ? Thanks in advance, André Andre see this link, about halfway down... http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale They talking Solaris, which on the default C locale is Ascii... When they do what you doing... more or less... it becomes ISO... So if there is a Java locale function in that web app... one minute its working with ascii, the next with ISO... The page encoding has been hardcoded by the coder to always ISO... Its the Java locale in a web app... I think... Look at the classes in an IDE, or search it... java.util.Locale is hiding in your web-app ;)... I think Thanks... theres the gotcha I was worried about... and you still talking english ;) Does it mean you cant run linux headless?... I wonder... For fun... make your linux box Japanese... I think the web app will really start having fun ... no foreign administrators for you ;) I dont believe at all its Tomcat... its client side Java sitting in servers... gotcha.. The coders broke their own application... all by themselves... admin guys have now got the headache... --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Konstantin Kolinko wrote: 2008/9/12 André Warnier [EMAIL PROTECTED] Konstantin Kolinko wrote: 2008/9/12 André Warnier [EMAIL PROTECTED]: Caldarale, Charles R wrote: I'm not sure these days what the normal web character set really is. If you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling symbol is not present. However, for any of the ISO-8859-x variants, it is present, using the 163 (0xA3) value you noted (same as the Unicode code point). It's also in UTF-8 of course, but requires two bytes (0xC2 0xA3) to represent the code point. I love these discussions about character sets. They seem to confuse so many people; even I, who have been involved in them for 30 years... Anyway, I have a related question, which I don't think constitutes a hijack of this thread, because the underlying cause is probably similar. Here it goes : Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x) The above Tomcat's running under the same Linux or Solaris, essentially set up the same way. The JVM may vary, but I don't think that is the problem, because of the consistency of the problem as explained below. I am running a webapp from an external supplier, always the same binary version. I don't have the code, can't see what's in it. The pages served by that webapp are the same html pages, all of them having a declaration meta http-equiv=Content-Type content=text/html; charset=iso-8859-1. The pages also *are* properly encoded as iso-8859-1 (100% positive, I know the difference). The browser receiving the pages is always the same one, same settings. Now, case a) in the Tomcat startup files, I do nothing, meaning I just take Tomcat out-of-the-box and run the webapp. Result : in any such html page that contains characters with an ISO-8859 codepoint above \xA0 (meaning the displayable characters of the high part of the table, where one finds things like uppercase A with umlaut), these characters - appear in the browser display as ? (minus the quotes) - also if I save the page from the browser to disk, and look at them with an iso-8859-1 capable editor, they are effectively ?. (So it's not the browser misunderstanding them, it is Tomcat sending them that way). case b) In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or even in /etc/init.d/tomcat5.5), I add the following line LC_CTYPE=en_us.iso88591 (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE) (before the actual start of Tomcat) and restart Tomcat then the same page displays properly in the browser, and also is correct iso-8859-1 when saved to disk and examined with the editor. (In other words, what previously were ? characters, are now the correct iso-8859-1 character bytes). Now my question is : How can it matter which LC_CTYPE Tomcat is started under, that would have the result above ? The behaviour above is consistent across different hosts, across the same or different Tomcat versions, it is always the same webapp, always the same html pages, always the same browser, etc. Only that LC_CTYPE line changes the behaviour. On the face of it, the only thing I can think of that would explain this, is that the webapp in question does something wrong, but what exactly could it be doing ? Any ideas ? It is [EMAIL PROTECTED] pageEncoding=... % that is missing from those pages. Thus JSP compiler does not know what encoding they are using for their source and messes them at compilation time. [...] But these pages, as far as Tomcat and the webapp are concerned, are not dynamic in any way. They are straight static html pages. So is the JSP stuff relevant ? (I'm genuinely asking, since I know nothing about JSP pages) The static HTML pages, as well as all the other static files, are served by the DefaultServlet. You should dig there. I think that fileEncoding initialization parameter of the servlet, as well as mime-mapping settings in web.xml come into play. JSP settings are irrelevant for them, of course. Hi. Thanks for the intent and answer above. But I insist : these html pages are served by that webapp of which I am talking, not by the DefaultServlet. Those pages are being accessed via URLs like http://myhost.mycompany.com/myservlet?..(additional parameters indicating which static file to serve).. It is on the way through that servlet that they get corrupted, unless I start Tomcat with LC_CTYPE=iso-8859-1. That servlet, in its own web.xml config file in tomcat_dir/webapps/myservlet/WEB-INF/web.xml, has no fileEncoding nor mime-mapping section nor parameter. So my question remains, I think : what could be going on in that servlet so that : - if LC_CTYPE is not set in the environment *of Tomcat* when it starts, the upper iso-8859-1 characters in the pages are replaced by ? - if LC_CTYPE is set to iso-8859-1 in the Tomcat environment when it starts, then the pages delivered by the servlet are correct ? I am not very qualified in Java, but could it be something like : - the
RE: Migrating to tomcat 6 gives formatted currency amounts problem
Hi, Have you checked the configuration for this catalina opts?: -Duser.language=es -Duser.country=ES Check that they are the same in both tomcats. (In this case, for instance, is configured for Spanish-Spain) Good Luck Best, Toni -Original Message- From: André Warnier [mailto:[EMAIL PROTECTED] Sent: viernes, 12 de septiembre de 2008 16:58 To: Tomcat Users List Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem Konstantin Kolinko wrote: 2008/9/12 André Warnier [EMAIL PROTECTED] Konstantin Kolinko wrote: 2008/9/12 André Warnier [EMAIL PROTECTED]: Caldarale, Charles R wrote: I'm not sure these days what the normal web character set really is. If you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling symbol is not present. However, for any of the ISO-8859-x variants, it is present, using the 163 (0xA3) value you noted (same as the Unicode code point). It's also in UTF-8 of course, but requires two bytes (0xC2 0xA3) to represent the code point. I love these discussions about character sets. They seem to confuse so many people; even I, who have been involved in them for 30 years... Anyway, I have a related question, which I don't think constitutes a hijack of this thread, because the underlying cause is probably similar. Here it goes : Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x) The above Tomcat's running under the same Linux or Solaris, essentially set up the same way. The JVM may vary, but I don't think that is the problem, because of the consistency of the problem as explained below. I am running a webapp from an external supplier, always the same binary version. I don't have the code, can't see what's in it. The pages served by that webapp are the same html pages, all of them having a declaration meta http-equiv=Content-Type content=text/html; charset=iso-8859-1. The pages also *are* properly encoded as iso-8859-1 (100% positive, I know the difference). The browser receiving the pages is always the same one, same settings. Now, case a) in the Tomcat startup files, I do nothing, meaning I just take Tomcat out-of-the-box and run the webapp. Result : in any such html page that contains characters with an ISO-8859 codepoint above \xA0 (meaning the displayable characters of the high part of the table, where one finds things like uppercase A with umlaut), these characters - appear in the browser display as ? (minus the quotes) - also if I save the page from the browser to disk, and look at them with an iso-8859-1 capable editor, they are effectively ?. (So it's not the browser misunderstanding them, it is Tomcat sending them that way). case b) In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or even in /etc/init.d/tomcat5.5), I add the following line LC_CTYPE=en_us.iso88591 (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE) (before the actual start of Tomcat) and restart Tomcat then the same page displays properly in the browser, and also is correct iso-8859-1 when saved to disk and examined with the editor. (In other words, what previously were ? characters, are now the correct iso-8859-1 character bytes). Now my question is : How can it matter which LC_CTYPE Tomcat is started under, that would have the result above ? The behaviour above is consistent across different hosts, across the same or different Tomcat versions, it is always the same webapp, always the same html pages, always the same browser, etc. Only that LC_CTYPE line changes the behaviour. On the face of it, the only thing I can think of that would explain this, is that the webapp in question does something wrong, but what exactly could it be doing ? Any ideas ? It is [EMAIL PROTECTED] pageEncoding=... % that is missing from those pages. Thus JSP compiler does not know what encoding they are using for their source and messes them at compilation time. [...] But these pages, as far as Tomcat and the webapp are concerned, are not dynamic in any way. They are straight static html pages. So is the JSP stuff relevant ? (I'm genuinely asking, since I know nothing about JSP pages) The static HTML pages, as well as all the other static files, are served by the DefaultServlet. You should dig there. I think that fileEncoding initialization parameter of the servlet, as well as mime-mapping settings in web.xml come into play. JSP settings are irrelevant for them, of course. Hi. Thanks for the intent and answer above. But I insist : these html pages are served by that webapp of which I am talking, not by the DefaultServlet. Those pages are being accessed via URLs like http://myhost.mycompany.com/myservlet?..(additional parameters indicating which static file to serve).. It is on the way through that servlet that they get corrupted, unless I start Tomcat with LC_CTYPE=iso-8859-1. That servlet, in its own web.xml config
RE: Migrating to tomcat 6 gives formatted currency amounts problem
From: André Warnier [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem - the servlet reads those documents with some InputStream, without specifying a character set or encoding, and by default that means to use Tomcat's idea of its default LC_CTYPE for those InputStreams ? Essentially correct, if you substitute JVM for Tomcat in the above. Input and output are done via byte streams, converted to and from Unicode based on the specified character encoding. When that's not specified (via Connector attribute or HTTP header), the JVM uses a default encoding. To determine the default, JVM initialization looks at various system properties if they exist, and then certain environment variables. (The exact ones are platform dependent.) Consequently, setting LC_CTYPE (or equivalent) prior to starting up Tomcat can have a dramatic effect on the interpretation of both input and output, as you have discovered. Look at the API doc for java.io.InputStreamReader and java.io.OutputStreamWriter for examples of character set encoding usage. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Migrating to tomcat 6 gives formatted currency amounts problem
From: Caldarale, Charles R Subject: RE: Migrating to tomcat 6 gives formatted currency amounts problem Consequently, setting LC_CTYPE (or equivalent) prior to starting up Tomcat can have a dramatic effect on the interpretation of both input and output, as you have discovered. Also, as Johnny K stated, this should not be left up to the sys admin. It really is the app writers' job to explicitly specify the encoding for both input and output, rather than leaving them up to the whims of the platform and browser. Unfortunately, many developers design with blinders on, and never think about where the app might be deployed or accessed from. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Migrating to tomcat 6 gives formatted currency amounts problem
From: Johnny Kewl [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem If this locale stuff is in fact defaulting to an ISO char set that can do these symbols... There's the basic problem - anytime you allow defaults to come into play you put yourself at risk. and say you where making a non english page, say Japanese... do you think that its possible to use it? Certainly, and you should use it - but with the desired Locale specified, not using whatever the default happens to be at that instant. they using the getCurrencyInstance to make the currency symbols. But, if you want a specific currency symbol (e.g., Yen, Pound Sterling), the Locale should be explicitly provided on the API call; only if you want to use the platform's default should the getCurrencyInstance() without an argument be used. But I'm thinking its a US/Eng only methodology... Nope, it's universal. Java supports a seemingly infinite number of locales. When you say If I override that with say ISO-8859-15, is that the whole page you talking about Yes, I was setting the browser to use a fixed encoding rather than the one in the HTTP header or the browser default. it possible to have different character encoding sections in a web page I don't know HTML well enough to completely answer that question, but I believe HTTP uses the last character set header specified, and all HTTP headers must precede the HTML. You should be able to achieve the desired effect with frames. However, if you just use UTF-8, you don't need to worry about, since that includes every code point in the known universe. if I do look at that test page in a MS tool... it displays correctly with mixed encodings? MS cheats at every opportunity, seemingly avoiding standards whenever they can. IE likes to guess at the intent of the web page, sometimes getting it right, often getting it horribly wrong. But when you choose a font in a text editor like Swing or Word, you are also picking some character set... Nope - most editors do not let you choose the character encoding, they just use the platform default. Some do let you choose a UTF-x flavor in lieu of the platform default, which is quite desirable. Some fonts (e.g., Wingdings) redefine the glyphs for given code points in order to display oddball symbols within a non-Unicode encoding; these were pretty much all developed before Unicode came into widespread use, but are still around for compatibility. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Johnny, Johnny Kewl wrote: If this locale stuff is in fact defaulting to an ISO char set that can do these symbols... and say you where making a non english page, say Japanese... do you think that its possible to use it? It is up to your browser to choose a font that is appropriate for all glyphs (that is, a graphical representation of a code point) that need to be displayed. Some fonts do not support all codepoints because they don't have all the glyphs. For instance, if you have a string in English and also Sanskrit, your browser is likely to display one string in one font (maybe Arial) and the other in another font (say, Sanskrit). Let's say that the browser comes across the pound; entity. pound; maps directly to 8-bit hex character code 0xa3 (http://htmlhelp.com/reference/html40/entities/latin1.html). Whether you put pound; or £ in your HTML, the browser should render it properly -- possibly switching fonts to one that supports that code point for that character only. The problem with your page is not that the £ symbol is not available in the font the browser chose. Your problem is that you illegally encoded it into the page in the first place (or, equivalently, you advertise the wrong encoding for the page, which is really the same thing). If you re-write your page to declare some font around that symbol, you will never be able to get it to work, unless you use the browser to override the server-declared encoding (as Chuck did, when things render properly when using ISO-8859-1). I've actually now seen examples on the web that are doing it Wil's way, they using the getCurrencyInstance to make the currency symbols. Use of Java's built-in currency-symbol-generating methods are likely to produce a proper £ symbol. If you have your encoding chain set up properly, it should go from NumberFormat.format() straight to your web page without a hint of difficulty. But I'm thinking its a US/Eng only methodology... when applied to a web page. Do you think using getCurrencyInstance is generalizable in other languages? Absolutely. The only reason $ is a magic symbol is because it's part of US-ASCII and low enough in the symbol table so that it never gets screwed up by incorrect encodings. Symbols like £ or € do not share that luxury and are therefore error-prone when administrators poorly configure their servers. It's further compounded by the fact that many English-specking coders forget that there are other people in the world. :( When you say If I override that with say ISO-8859-15, is that the whole page you talking about, or it possible to have different character encoding sections in a web page thats another area thats confusing me now, because if I do look at that test page in a MS tool... it displays correctly with mixed encodings? The encoding is for the entire document, not just a single character. basically, you sent an illegal character code. It would be like sending 6 bits of an 8-bit byte. In fact, that's /exactly/ what you did because, to a UTF-8 renderer, your set of 8 bits looks like there should be something else /before/ it in order to make it legal. Your server said hey, client... I'm gonna send you a bunch of oranges and then went right ahead and sent apples mixed-in with those oranges. But when you choose a font in a text editor like Swing or Word, you are also picking some character set... and thats whats been injected into the page as its been formed... Yes and no. Many encodings are limited by a particular character set (for instance, US-ASCII is never going to have Sanskrit letters in it). But that'd why Unicode was invented: to make sure that anything we'd ever possibly want to show on the screen is possible because we have enough bits to display it. (My understanding is that Unicode (16-bit) is actually not big enough for everything, but hey, they tried). The beauty of UTF-8 is that every character you'd want to display has its own code that nobody can steal -- regardless of the font being used. The lesson is to always use UTF-8 and make sure you actually have everything working properly. If your server is saying utf-8 but the character encoding on your servlet Writer is actually ISO-8859-1 then you haven't done your job and your web pages are going to look broken when non-latin characters are thrown in there. The same is true if you are serving static content (as I suspect you are in your example) and advertising that it is utf-8 but the file was written with ISO-8859-1 (or something else). (In your case, the problem is that text files contain no explicit encoding information in them, so the server has to guess -- or, more likely, there's no guessing going on, and the server just blindly uses whatever its default has been configured to be.) I screw up terminology... ok we all know that but Does Wil need to worry about the way he is doing it?... thats all I'm asking... I think so... The short
Re: Migrating to tomcat 6 gives formatted currency amounts problem
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chuck, Caldarale, Charles R wrote: From: Johnny Kewl [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem if I do look at that test page in a MS tool... it displays correctly with mixed encodings? MS cheats at every opportunity, seemingly avoiding standards whenever they can. IE likes to guess at the intent of the web page, sometimes getting it right, often getting it horribly wrong. Yes, they do. MS, contrary to W3 specifications, sniffs the content of a page and chooses the encoding and ignores any server-specified encoding. It also does this with MIME types. (Sorry, can't find the reference right now). Real web browsers do not behave in this way, so you shouldn't base your conclusions on the behavior of MSIE. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjKmzYACgkQ9CaO5/Lv0PBgEACfbFlp6HuBiTd93kGzrtOOVRhV G4AAn2zaU1HGZA9isoewMQ3J5TZMsPjF =E83R -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Migrating to tomcat 6 gives formatted currency amounts problem
From: Christopher Schultz [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem (My understanding is that Unicode (16-bit) is actually not big enough for everything, but hey, they tried). Point of clarification: Unicode is NOT limited to 16 bits (not even in Java, these days). There are defined code points that use 32 bits, and I don't think there's a limit, if you use the defined extension mechanisms. Again, browsing the Unicode web site is extremely enlightening. Unless the browser sucks. ;) Let me guess which browser that is; does it start with an I? - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 André, André Warnier wrote: The pages served by that webapp are the same html pages, all of them having a declaration meta http-equiv=Content-Type content=text/html; charset=iso-8859-1. Note that using META tags to set character sets is a bit dangerous. You're telling the client to ignore the character set indicated by the server which was (probably) responsible for encoding the document in the first place. For static documents, where the server doesn't know any better, and is probably sending binary data and doing no interpretation or encoding of any kind, it's probably okay. The pages also *are* properly encoded as iso-8859-1 (100% positive, I know the difference). So, for instance, the British pound symbol in your source documents (read using an ISO-8859-1-configured viewer) looks correct? The browser receiving the pages is always the same one, same settings. Did you check the md5sum of that page on both the client and the server? I suspect they are actually different. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjKnOAACgkQ9CaO5/Lv0PBbBQCguAzYccOcY1sCgTbsxlXi5Lq5 SfQAn0HMhCIjmL5VENVqvOkwi1G73pI8 =FCfS -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 André, André Warnier wrote: It is on the way through that servlet that they get corrupted, unless I start Tomcat with LC_CTYPE=iso-8859-1. What do the HTTP headers say when the file is served correctly versus when it is not? I suspect that the encoding is either set incorrectly or not set at all unless you specify LC_CTYPE. So my question remains, I think : what could be going on in that servlet so that : - if LC_CTYPE is not set in the environment *of Tomcat* when it starts, the upper iso-8859-1 characters in the pages are replaced by ? - if LC_CTYPE is set to iso-8859-1 in the Tomcat environment when it starts, then the pages delivered by the servlet are correct ? My guess is that the magic servlet here is using the platform's default encoding in the HTTP headers, which may be incorrect for the static file in question. I am not very qualified in Java, but could it be something like : - the servlet reads those documents with some InputStream, without specifying a character set or encoding Note that InputStreams are encoding-less. Sounds like semantics, but encodings only come into play with you are dealing with character-oriented streams which, in Java, are called Readers and Writers. Note that neither InputStream nor OutputStream have any methods that deal with the char data type. and by default that means to use Tomcat's idea of its default LC_CTYPE for those InputStreams ? - or the servlet outputs the document via an OutputStream without specifying an encoding etc.. I'll bet a binary stream of data is being sent (that is, with no interpretation or encoding) and that the JVM's default encoding is being advertised by the server in the HTTP headers. That would certainly cause the problem. I've found that the default encoding on my Linux box is something I've never heard of before: file.encoding=ANSI_X3.4-1968. Since I have my server configured properly (and don't really serve much in the way of static content), the platform's default encoding doesn't matter: my preferred encoding (UTF-8) is always used. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjKntcACgkQ9CaO5/Lv0PAjWACgquvyCh3SDJdqBxPPx3+zOwQ4 z3QAoKL8C5k0ZI3B6Hl4GyuDcZrcnrRf =HPFJ -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Migrating to tomcat 6 gives formatted currency amounts problem
From: Johnny Kewl [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem Does it mean you cant run linux headless?... Of course you can (think about blade servers). Now you're confusing graphical display with encoding. The term headless is concerned with the ability to display graphical information, not render it. JVMs running in headless mode can render glyphs, graphs, or what have you, but must send the resulting bit maps to some graphics server to have it displayed (it can also be saved in files if needed). - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chuck, Caldarale, Charles R wrote: From: Christopher Schultz [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem (My understanding is that Unicode (16-bit) is actually not big enough for everything, but hey, they tried). Point of clarification: Unicode is NOT limited to 16 bits (not even in Java, these days). Sorry, I was trying to say 16-bit Unicode without saying UTF-16 (which is not the same). And regarding Java... the 'char' data type is /defined/ to be 16-bits wide (http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1). Has this changed? When? (And how!?) I always thought it was weird for Java to use 16-bit Unicode internally, but then use UTF-8 for all serialized strings. I guess that's what you get when you try to minimize file sizes and download times. There are defined code points that use 32 bits, and I don't think there's a limit, if you use the defined extension mechanisms. Again, browsing the Unicode web site is extremely enlightening. Unless the browser sucks. ;) Let me guess which browser that is; does it start with an I? I usually spell it with an 'M'. ;) - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjKomMACgkQ9CaO5/Lv0PC1OQCeP8FkNni/J320StYPF4lNeQWi o84AnReYYyjaF+ljUub4wJ2HSkcOA3Jk =JJir -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Johnny, Johnny Kewl wrote: Use this function System.out.print(CharSet : + Charset.defaultCharset().toString()); and thats what you HAVE TO set your page at On my system it tells me its. windows-1252 I think you're still missing something: the file on the disk has an implicit file encoding that is not advertised in any way. This is the core of the problem. If all text files said hey, I'm encoded in UTF-8 or I'm in ISO-8859-1 or This file is WINDOWS-1252, then there would be no problem: all code would use the native encoding of the file as the encoding of the HTTP response, and the file would be streamed as binary without changing a single bit in the stream. Unfortunately, this is better known as explicit encoding and basically doesn't exist (except in some UTF-encoded files). Since the server doesn't know the file's original encoding, it /can never make a sensible decision about the output encoding/. It's simply not possible. It has nothing to do with your OS, of your filesystem, or your per-user locale preferences, installed fonts, etc. It has to do with the fact that the file has no explicit encoding and the server can use. (This is what gives rise to the MSIE practice of sniffing the document content regardless of the server's assertion as to the character encoding). ... it a headache... rather refactor your code so the pages are all the same charset of your choosing and work with pound, yen dollar This is always a sensible way to go. If you stick to pages that always use US-ASCII or anything compatible with it (generally ISO-8859-*, I think), you'll be good to go. A much better way to go is to always use properties files for text that will be displayed on web pages. It's the right thing to do from a localization perspective (yes, you can have separate pages for each language, but that's no fun), AND the encoding for Java properties files is DEFINED TO BE ISO-8859-1, no matter what you want to put in there. In this case, there /is/ an explicit character encoding, and it's predictable. Of course, Java coders can always bone the creation of these files... - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjKpQoACgkQ9CaO5/Lv0PDW4ACdEHqsgCK2IrHF1Bl6cz40Wben liYAn00FVbmPpVAl35Zh6nDd1Q5Cxh/d =4lJ4 -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Migrating to tomcat 6 gives formatted currency amounts problem
From: Christopher Schultz [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem the 'char' data type is /defined/ to be 16-bits wide (http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1). Has this changed? When? (And how!?) A char is still 16 bits, but you can now have 21-bit code points: http://java.sun.com/javase/6/docs/api/java/lang/Character.html#unicode These are manipulated via the int type, rather than char. I always thought it was weird for Java to use 16-bit Unicode internally Back when Java was being defined, Unicode still was 16-bit, but not in widespread use. but then use UTF-8 for all serialized strings Mostly for easy interoperation with existing editors, comm handlers, browsers, etc., which were all byte oriented and, at the time, still largely ASCII. The day-one existence of character encoders in Java permitted use in non-ASCII environments. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Christopher Schultz wrote: [...] Yes, they do. MS, contrary to W3 specifications, sniffs the content of a page and chooses the encoding and ignores any server-specified encoding. It also does this with MIME types. (Sorry, can't find the reference right now). [...] Here is a start, sympathetic to Microsoft : http://blogs.msdn.com/ie/archive/2005/02/01/364581.aspx And here is another relevant MS technical document (not for the faint of heart) : http://msdn.microsoft.com/en-us/library/ms775147.aspx On the other hand, the HTTP 1.1 RFC section 7.2.1 http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7.2.1 says : quote Any HTTP/1.1 message containing an entity-body SHOULD include a Content-Type header field defining the media type of that body. If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource. If the media type remains unknown, the recipient SHOULD treat it as type application/octet-stream. unquote (notice the *if and only if* the media type is not given..) In other words, IE's content sniffing is in clear violation of the HTTP 1.1 RFC, 99% of the time. On the other hand, I once read a justification by one of the Microsoft developers (as I recall that one was related to their implementation of DAV, or Web Folders), which essentiually said this : there are hundreds of millions of Windows (and IE) users, and most of them are *not* developers. So, although we are ourselves developers and we would very much like to adhere to the standards, our marketing people just won't let us, if it risks inconveniencing several hundred million average Windows users (and Microsoft customers), just to please the tiny minority of several hundred thousand developers. I think it's an argument, even a relatively democratic one ... I also personally believe that if the Microsoft developers had not started down the path a long time ago to believe that they could be smarter than everyone else and could outguess webservers, and instead had respected the HTTP RFC and just been more careful about which documents IE opens (or worse, runs), they would have saved Microsoft and the world countless bugs, countless viri and countless unproductive hours of web-developer's forced work-arounds. What I do not however understand is, considering the flak that each IE bug or security advisory generates, why MS have never decided to create and market another parallel browser (or maybe just one checkbox in the regular IE), that would make it RFC-compliant. This way users could just choose to either use a browser that is RFC-compliant and boring and safe(r), or else enjoy all the gimmicks but risk the consequences. But hey, I also do not know in how many viri-scanning companies MS owns shares.. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Nope - most editors do not let you choose the character encoding, they just use the platform default. Some do let you choose a UTF-x flavor in lieu of the platform default, which is quite desirable. Some fonts (e.g., Wingdings) redefine the glyphs for given code points in order to display oddball symbols within a non-Unicode encoding; these were pretty much all developed before Unicode came into widespread use, but are still around for compatibility. You know your stuff Chuck ;) Wonder if Wil knew he asked such a damn big question... ha ha Ok... some more homework on this thing... Servlet Response does in fact have a setLocale(Locale loc) function... Which seems to indicate that if headers or something like response.setContentType(text/html;charset=UTF-8); is *not* used... TC will take on the encoding(ha ha did it again) charset of that locale... I find thinking outside of HTTP headers difficult... and it seems that servlet spec has recognized the conflict inherent in locale and http header. It seems that prior to Servlet spec 2.4 if a coder used locale dependent JSTL to access resource bundles... that would in fact override setContentType this apparently is no longer the case... the header takes pref... So André thats what you could well be seeing in your application because the charset would follow the locale and that would be whatever the JRE wants to give you... ie the coder didnt even have to explicitly use a locale function a JSTL call using a resource bundle will do it... Its seems they are trying to bring locale technology that one applies in Swing without too much thought and web technology a little closer... Still lots of places to get caught it seems... I think you just got to put on a different hat when doing Swing and Web internationalization... different animals, with just enough commonality to cause pain ;) --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Johnny, Johnny Kewl wrote: Servlet Response does in fact have a setLocale(Locale loc) function... Which seems to indicate that if headers or something like response.setContentType(text/html;charset=UTF-8); is *not* used... TC will take on the encoding(ha ha did it again) charset of that locale... Nope! Locale != charset. Locale does not even hint of a /preferred/ charset. I find thinking outside of HTTP headers difficult... and it seems that servlet spec has recognized the conflict inherent in locale and http header. It seems that prior to Servlet spec 2.4 if a coder used locale dependent JSTL to access resource bundles... that would in fact override setContentType this apparently is no longer the case... the header takes pref... Well, the header comes from the encoding set on the response, so it should all be the same. I think you just got to put on a different hat when doing Swing and Web internationalization... You shouldn't have to. The only difference is the character encoding for the requests and responses. The use of the Java API should be identical. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjKyHcACgkQ9CaO5/Lv0PDxDQCfazFHZjh/amrJBOkauDCFmwN0 rQoAoLYmA3A8Y6hbhaMN3dNeJckoy2YV =4bXQ -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
On Fri, Sep 12, 2008 at 9:26 PM, Johnny Kewl [EMAIL PROTECTED] wrote: Wonder if Wil knew he asked such a damn big question... ha ha I'm really amazed at the volume of mails my question has raised. I can only see one solution to this complexity: let's all (everybody in the whole world) speak the same language, use the same currency and move into one and the same timezone (the latter because of past fun with timezones)! Willem
Re: Migrating to tomcat 6 gives formatted currency amounts problem
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chuck, Caldarale, Charles R wrote: From: Christopher Schultz [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem the 'char' data type is /defined/ to be 16-bits wide (http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1). Has this changed? When? (And how!?) A char is still 16 bits, but you can now have 21-bit code points: http://java.sun.com/javase/6/docs/api/java/lang/Character.html#unicode These are manipulated via the int type, rather than char. Interesting... so, Java is still 16-bit Unicode in its char primitive, but you can use ints to hold UTF-16 values using 21-bits? Wo, that's confusing... especially since java.lang.Character only takes a char as a constructor parameter :( - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjKygAACgkQ9CaO5/Lv0PB5lgCfSaUnFHFx+OaL87mPtCsGcTOd pkwAn0ob9OTMfrGCXk4udHyKg627Fd2k =XWif -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Willem, Willem Moors wrote: I can only see one solution to this complexity: let's all (everybody in the whole world) speak the same language, use the same currency and move into one and the same timezone (the latter because of past fun with timezones)! You're not far off, except that you probably mean we should all speak one human language (like English or Farsi or whatever). I agree, but only if you mean we should all speak the same character language. It should be UTF-8. All hail UTF-8! Seriously, switch to UTF-8. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjKyuwACgkQ9CaO5/Lv0PCqFQCbB/9xp+ELXOONuWn7lQvo5hd8 jasAnjtoDUrn3d1kVoFjCcvLmg2R3KI2 =0DqD -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Caldarale, Charles R wrote: From: Christopher Schultz [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem (My understanding is that Unicode (16-bit) is actually not big enough for everything, but hey, they tried). Point of clarification: Unicode is NOT limited to 16 bits (not even in Java, these days). There are defined code points that use 32 bits, and I don't think there's a limit, if you use the defined extension mechanisms. Again, browsing the Unicode web site is extremely enlightening. Further clarification : Unicode is not limited to anything. Unicode is (aims to be) a list which attributes to any distinct character known to man, a number, from 0 to infinity. The particular position number given to a particular character in this Unicode list is known as its Unicode codepoint. The Unicode group (consortium ?) also tries to do this with some order, such as trying to keep together (with consecutive codepoints) various groups of characters that are logically related in some way. For example (but probably because they had to start somewhere), the first 128 codepoints match the original 7-bit US-ASCII alphabet; so for instance the capital letter A, which has code \x41 in US-ASCII, happens to have Unicode codepoint \x0041 (both 65 in decimal terms). For example also, the same first 128 codepoints, plus the next 128 codepoints, match the iso-8859-1 alphabet (also known as iso-latin-1); thus the character known as capital letter A with umlaut (an A with a double-dot on top) has the codepoint \x00C4 in Unicode, and the code \xC4 in iso-8859-1 (both 196 in decimal). New Unicode characters (and codepoints) are being added all the time (I think there's even Klingon in there), but there are also holes in the list (presumably left for whenever some forgotten related character shows up). A quite different issue is encoding. Because it would be quite impractical to specify a series of characters just by writing their codepoints one after the other (using whatever number of bits each codepoint needs), a series of clever schemes have been devised in order to pass Unicode strings around, while being able to separate them into characters, and keep each one with its proper codepoint. Such schemes are known as Unicode encodings with names such as UTF-2, UTF-7, UTF-8, UTF-16, UTF-32, etc.. Each one of them specifies an algorithm whereby one can take any Unicode character (or rather, its codepoint), and encode it into a series of bits, in such a way that at the receiving end, an opposite algorithm can be used to decode that series of bits and retrieve once again the same series of Unicode codepoints (or characters). UTF-16, for example, is an encoding of Unicode which uses always 16 bits for each Unicode codepoint; but it is to my knowledge incomplete, because since it uses a fixed number of 16 bit per character, it can thus only ever represent no more than the first 65,532 Unicode characters. (But we're not there yet, and there is still some leeway). UTF-8 on the other hand is a variable-length scheme, using 1, 2, 3, or more 8-bit groups to represent each Unicode codepoint. And it is in principle not limited, as there are extension mechanisms foreseen for whenever the need arises (imagine that some aliens suddenly show up, and that they happen to write in 167 different languages and alphabets). One frequent misconception is that in UTF-8, the first 256 character encoding bit sequences match the iso-8859-1 codepoints. Only the first 128 characters of iso-8859-1 (which happen to match the 128 characters of US-ASCII and the first 128 Unicode codepoints), have a single-byte representation in UTF-8 which happens to match their Unicode codepoint. The next 128 iso-8859-1 characters (which contain the capital A with umlaut) require 2 bytes each in the UTF-8 encoding. Thus for instance, the capital letter A with umlaut has the Unicode codepoint \x00C4 (196 decimal), because is is the 197th character in the Unicode list (and the first one is \x). It also happens to have the code \xC4 (196 decimal) in the iso-8859-1 table. But in UTF-8, it is encoded as the two bytes \xC3\x84, which is not the decimal number 196 in any way. All of that to say that when some people on this list say things like you should always decode your URLs as if they were Unicode (or UTF-8), because it is the same as ASCII or iso-latin-1 anyway, they are talking nonsense. The only time you can do that is when the server and all the clients have agreed in advance that this is how they were going to encode and decode URLs. (That we developers wish it were so, and that ultimately we may get there, is another matter.) It is also talking nonsense to say that you should by default consider html pages as UTF-8 encoded. The default character set (and encoding, because in that case both are the same) for html is iso-8859-1, and anything
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Rectification to the clarification : what I say below about UTF-16 being always 16-bit and limited is also nonsense. UTF-16 is variable-length, it can cover the entire Unicode character set. It just uses a variable number of 16-bit words per character, as compared to UTF-8 which uses a variable number of 8-bit bytes. I should have checked my sources. Shame on me. About Java's internal char type being 16-bit wide though, I have heard that too, and I'm also curious. André Warnier wrote: Caldarale, Charles R wrote: From: Christopher Schultz [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem (My understanding is that Unicode (16-bit) is actually not big enough for everything, but hey, they tried). Point of clarification: Unicode is NOT limited to 16 bits (not even in Java, these days). There are defined code points that use 32 bits, and I don't think there's a limit, if you use the defined extension mechanisms. Again, browsing the Unicode web site is extremely enlightening. Further clarification : Unicode is not limited to anything. Unicode is (aims to be) a list which attributes to any distinct character known to man, a number, from 0 to infinity. The particular position number given to a particular character in this Unicode list is known as its Unicode codepoint. The Unicode group (consortium ?) also tries to do this with some order, such as trying to keep together (with consecutive codepoints) various groups of characters that are logically related in some way. For example (but probably because they had to start somewhere), the first 128 codepoints match the original 7-bit US-ASCII alphabet; so for instance the capital letter A, which has code \x41 in US-ASCII, happens to have Unicode codepoint \x0041 (both 65 in decimal terms). For example also, the same first 128 codepoints, plus the next 128 codepoints, match the iso-8859-1 alphabet (also known as iso-latin-1); thus the character known as capital letter A with umlaut (an A with a double-dot on top) has the codepoint \x00C4 in Unicode, and the code \xC4 in iso-8859-1 (both 196 in decimal). New Unicode characters (and codepoints) are being added all the time (I think there's even Klingon in there), but there are also holes in the list (presumably left for whenever some forgotten related character shows up). A quite different issue is encoding. Because it would be quite impractical to specify a series of characters just by writing their codepoints one after the other (using whatever number of bits each codepoint needs), a series of clever schemes have been devised in order to pass Unicode strings around, while being able to separate them into characters, and keep each one with its proper codepoint. Such schemes are known as Unicode encodings with names such as UTF-2, UTF-7, UTF-8, UTF-16, UTF-32, etc.. Each one of them specifies an algorithm whereby one can take any Unicode character (or rather, its codepoint), and encode it into a series of bits, in such a way that at the receiving end, an opposite algorithm can be used to decode that series of bits and retrieve once again the same series of Unicode codepoints (or characters). UTF-16, for example, is an encoding of Unicode which uses always 16 bits for each Unicode codepoint; but it is to my knowledge incomplete, because since it uses a fixed number of 16 bit per character, it can thus only ever represent no more than the first 65,532 Unicode characters. (But we're not there yet, and there is still some leeway). UTF-8 on the other hand is a variable-length scheme, using 1, 2, 3, or more 8-bit groups to represent each Unicode codepoint. And it is in principle not limited, as there are extension mechanisms foreseen for whenever the need arises (imagine that some aliens suddenly show up, and that they happen to write in 167 different languages and alphabets). One frequent misconception is that in UTF-8, the first 256 character encoding bit sequences match the iso-8859-1 codepoints. Only the first 128 characters of iso-8859-1 (which happen to match the 128 characters of US-ASCII and the first 128 Unicode codepoints), have a single-byte representation in UTF-8 which happens to match their Unicode codepoint. The next 128 iso-8859-1 characters (which contain the capital A with umlaut) require 2 bytes each in the UTF-8 encoding. Thus for instance, the capital letter A with umlaut has the Unicode codepoint \x00C4 (196 decimal), because is is the 197th character in the Unicode list (and the first one is \x). It also happens to have the code \xC4 (196 decimal) in the iso-8859-1 table. But in UTF-8, it is encoded as the two bytes \xC3\x84, which is not the decimal number 196 in any way. All of that to say that when some people on this list say things like you should always decode your URLs as if they were Unicode (or UTF-8), because it is the same as ASCII or iso-latin-1 anyway
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Willem, Willem Moors wrote: I can only see one solution to this complexity: let's all (everybody in the whole world) speak the same language, use the same currency and move into one and the same timezone (the latter because of past fun with timezones)! You're not far off, except that you probably mean we should all speak one human language (like English or Farsi or whatever). I agree, but only if you mean we should all speak the same character language. It should be UTF-8. All hail UTF-8! Seriously, switch to UTF-8. That reminds me of the old joke, about England deciding to switch from driving on the (wrong) left side of the road instead of the (correct) right side. To minimise disruptions, they were going to do it in stages; the trucks first, the cars a week later. Anyway, there is a flaw in the above suggestions, if taken together : if we all spoke and wrote the same language, there would be no need for Unicode nor for multi-byte character encodings. Unless the language was Chinese of course. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Just for the sake of completeness : Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 André, André Warnier wrote: It is on the way through that servlet that they get corrupted, unless I start Tomcat with LC_CTYPE=iso-8859-1. What do the HTTP headers say when the file is served correctly versus when it is not? I suspect that the encoding is either set incorrectly or not set at all unless you specify LC_CTYPE. So my question remains, I think : what could be going on in that servlet so that : - if LC_CTYPE is not set in the environment *of Tomcat* when it starts, the upper iso-8859-1 characters in the pages are replaced by ? - if LC_CTYPE is set to iso-8859-1 in the Tomcat environment when it starts, then the pages delivered by the servlet are correct ? My guess is that the magic servlet here is using the platform's default encoding in the HTTP headers, which may be incorrect for the static file in question. I am not very qualified in Java, but could it be something like : - the servlet reads those documents with some InputStream, without specifying a character set or encoding Note that InputStreams are encoding-less. Sounds like semantics, but encodings only come into play with you are dealing with character-oriented streams which, in Java, are called Readers and Writers. Note that neither InputStream nor OutputStream have any methods that deal with the char data type. and by default that means to use Tomcat's idea of its default LC_CTYPE for those InputStreams ? - or the servlet outputs the document via an OutputStream without specifying an encoding etc.. I'll bet a binary stream of data is being sent (that is, with no interpretation or encoding) and that the JVM's default encoding is being advertised by the server in the HTTP headers. That would certainly cause the problem. The last tine I looked, the http headers sent along with the documents were the same in both cases. It is physically (if that's the appropriate expression in this case) the high iso-8859-1 characters (bytes) in the htnl document that are being replaced by ? (single-byte low-ascii question mark), on the way from the disk file to the browser, via the servlet. And if the LC_CTYPE of java (and Tomcat) is set to iso-8859-1 in the Tomcat startup script, it is no longer the case. So I (now) believe that Chuck's earlier explanation is the correct one : the servlet reads the disk document with a Reader (thanks Chris), without specifying an encoding when it opens this Reader. The effect is thus as follows : - if the LC_CTYPE environment variable is not set for Java and Tomcat, this Reader is opened using whichever encoding happens to be then the JVM's default. Obviously, in this case it is not iso-8859-1. The servlet thus reads the iso-8859-1 data, but with the wrong decoder. I guess then that this decoder replaces anything that does not fit into that default encoding, by a ?. (Would it do that, or would it trigger an exception ?) So that is what the servlet reads, and it passes it unchanged to it's Writer and to the browser. (Alternatively, it is at the level of the Writer of the servlet that the wrong encoding is used, or both). - if the LC_CTYPE variable is set to iso-8859-1, then these reader_Writer default to that as an encoding, and everything works fine. Fortunately setting the LC_CTYPE in the Tomcat startup script does not seem to affect other applications on the server; that is probably because this particular servlet is the only sloppy one, which does not explicitly specify an encoding when reading or writing stuff. (It's also because in this case, there are not many other servlets apart from the sloppy one). Now I'm writing the above without a solid knowledge of Java or Tomcat behind, so it's mostly guessing. If someone has a good reason for shooting this down as an explanation, I'm still open. I'll post another question under another title, I think this thread is long enough by now. Thanks to all though. André - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
- Original Message - From: André Warnier [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Friday, September 12, 2008 10:56 PM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem Just for the sake of completeness : Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 André, André Warnier wrote: It is on the way through that servlet that they get corrupted, unless I start Tomcat with LC_CTYPE=iso-8859-1. What do the HTTP headers say when the file is served correctly versus when it is not? I suspect that the encoding is either set incorrectly or not set at all unless you specify LC_CTYPE. So my question remains, I think : what could be going on in that servlet so that : - if LC_CTYPE is not set in the environment *of Tomcat* when it starts, the upper iso-8859-1 characters in the pages are replaced by ? - if LC_CTYPE is set to iso-8859-1 in the Tomcat environment when it starts, then the pages delivered by the servlet are correct ? My guess is that the magic servlet here is using the platform's default encoding in the HTTP headers, which may be incorrect for the static file in question. I am not very qualified in Java, but could it be something like : - the servlet reads those documents with some InputStream, without specifying a character set or encoding Note that InputStreams are encoding-less. Sounds like semantics, but encodings only come into play with you are dealing with character-oriented streams which, in Java, are called Readers and Writers. Note that neither InputStream nor OutputStream have any methods that deal with the char data type. and by default that means to use Tomcat's idea of its default LC_CTYPE for those InputStreams ? - or the servlet outputs the document via an OutputStream without specifying an encoding etc.. I'll bet a binary stream of data is being sent (that is, with no interpretation or encoding) and that the JVM's default encoding is being advertised by the server in the HTTP headers. That would certainly cause the problem. The last tine I looked, the http headers sent along with the documents were the same in both cases. It is physically (if that's the appropriate expression in this case) the high iso-8859-1 characters (bytes) in the htnl document that are being replaced by ? (single-byte low-ascii question mark), on the way from the disk file to the browser, via the servlet. And if the LC_CTYPE of java (and Tomcat) is set to iso-8859-1 in the Tomcat startup script, it is no longer the case. So I (now) believe that Chuck's earlier explanation is the correct one : the servlet reads the disk document with a Reader (thanks Chris), without specifying an encoding when it opens this Reader. The effect is thus as follows : - if the LC_CTYPE environment variable is not set for Java and Tomcat, this Reader is opened using whichever encoding happens to be then the JVM's default. Obviously, in this case it is not iso-8859-1. The servlet thus reads the iso-8859-1 data, but with the wrong decoder. I guess then that this decoder replaces anything that does not fit into that default encoding, by a ?. (Would it do that, or would it trigger an exception ?) So that is what the servlet reads, and it passes it unchanged to it's Writer and to the browser. (Alternatively, it is at the level of the Writer of the servlet that the wrong encoding is used, or both). - if the LC_CTYPE variable is set to iso-8859-1, then these reader_Writer default to that as an encoding, and everything works fine. Fortunately setting the LC_CTYPE in the Tomcat startup script does not seem to affect other applications on the server; that is probably because this particular servlet is the only sloppy one, which does not explicitly specify an encoding when reading or writing stuff. (It's also because in this case, there are not many other servlets apart from the sloppy one). Now I'm writing the above without a solid knowledge of Java or Tomcat behind, so it's mostly guessing. If someone has a good reason for shooting this down as an explanation, I'm still open. I'll post another question under another title, I think this thread is long enough by now. Thanks to all though. By goerge... I think you have it... the locale encoding is taking preference over the header. In theory... in newer servlets that will no longer happen... the header now overrules locale encoding. If you do decide to look at this link... http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale Whats happening to you is described at the very bottom ;) --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
RE: Migrating to tomcat 6 gives formatted currency amounts problem
From: André Warnier [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem The servlet thus reads the iso-8859-1 data, but with the wrong decoder. I guess then that this decoder replaces anything that does not fit into that default encoding, by a ?. (Would it do that, or would it trigger an exception ?) I believe (but have not verified) that the substitution occurs for any decoding errors. At least, I can't find any exceptions defined for the APIs that perform decoding. I'll post another question under another title, I think this thread is long enough by now. Nah, let's go for the record. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Migrating to tomcat 6 gives formatted currency amounts problem
From: Christopher Schultz [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem so, Java is still 16-bit Unicode in its char primitive, but you can use ints to hold UTF-16 values using 21-bits? The 21-bit values are represented by pairs of Java chars, the first from the UTF-16 high-surrogate range, the second from the low-surrogate range. The 21-bit code point can be accessed as an int by some of the java.lang.Character methods introduced in 1.5. especially since java.lang.Character only takes a char as a constructor parameter :( Yes, I think all the new Character methods related to code points are static; there are corresponding instance methods in java.lang.String though. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
On Wed, Sep 10, 2008 at 7:55 PM, Steve Ochani [EMAIL PROTECTED] wrote: Hmm odd. I tried it on my Redhat test server and worked fine also. Is your tomcat 6 install a default/fresh install? What browser are you using? What character encoding does it think the HelloWorldExample output is coming in as? Odd indeed! The tomcat6 install is from a fresh install. The browser I'm using is FF3. Really, apart from Tomcat 5.5 and 6, all else is equal: it's the same app (same war-file), running on the same hardware using exactly the same java. And to display the app I use one and the same browser (with different tabs) but still my application gives this difference: http://www.laadruim.com/issue/comparison_currrency_problem.png (I don't know if it's proper to use attachments in posting to this list, so I made the pic available on that URL). Willem
Re: Migrating to tomcat 6 gives formatted currency amounts problem
- Original Message - From: Willem Moors [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Thursday, September 11, 2008 8:15 AM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem On Wed, Sep 10, 2008 at 7:55 PM, Steve Ochani [EMAIL PROTECTED] wrote: Hmm odd. I tried it on my Redhat test server and worked fine also. Is your tomcat 6 install a default/fresh install? What browser are you using? What character encoding does it think the HelloWorldExample output is coming in as? Odd indeed! The tomcat6 install is from a fresh install. The browser I'm using is FF3. Really, apart from Tomcat 5.5 and 6, all else is equal: it's the same app (same war-file), running on the same hardware using exactly the same java. And to display the app I use one and the same browser (with different tabs) but still my application gives this difference: http://www.laadruim.com/issue/comparison_currrency_problem.png (I don't know if it's proper to use attachments in posting to this list, so I made the pic available on that URL). Willem Will if possible use pound instead... that I think its font independent... Otherwise I think you have to sorround that getCurrencyInstance stuff with a font... and tell it what font it must use... ... I think I'm just wondering how the systems guess the character set from getCurrencyInstance... I think the answer is there... I think this because in a text editor if you insert a pound symbol you also have to choose it from a font set and not all fonts support it... So.. its getting inserted on some unknown font... and then the browser has to guess it... Its something like that pound may be easier Have fun... --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
2008/9/11 Willem Moors [EMAIL PROTECTED]: On Wed, Sep 10, 2008 at 7:55 PM, Steve Ochani [EMAIL PROTECTED] wrote: Hmm odd. I tried it on my Redhat test server and worked fine also. Is your tomcat 6 install a default/fresh install? What browser are you using? What character encoding does it think the HelloWorldExample output is coming in as? Odd indeed! The tomcat6 install is from a fresh install. The browser I'm using is FF3. Really, apart from Tomcat 5.5 and 6, all else is equal: it's the same app (same war-file), running on the same hardware using exactly the same java. And to display the app I use one and the same browser (with different tabs) but still my application gives this difference: http://www.laadruim.com/issue/comparison_currrency_problem.png (I don't know if it's proper to use attachments in posting to this list, so I made the pic available on that URL). Willem 1. What the _Browser_ thinks about encoding of your page. In menu View Encoding what encoding is auto-selected there. 2. In Page Info dialog of Firefox (in Tools menu or in context menu Page Info ) what is Encoding, Content Type, and what META tags are mentioned (does it include Content-Type tag) (disclaimer: I have a localized version of FF, so the above names are translated ones). 3. Save both pages as HTML (choose HTML only format when saving), and compare their text. Is there any difference? 4. Well, pound; (notice the trailing ';'), or better #163; should display the pound sign irregardless of what encoding the browser thinks that your page uses. Use the #..; notation if generic xml processing is involved (the pound; entity is defined for (X)HTML only). Best regards, Konstantin Kolinko - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
You are almost certainly having a problem with (default) character encodings on your system, usual things to check are the encoding that the JVM is using, for example what does: echo $LANG return (usually controlled by what's defined in /etc/sysconfig/i18n - although I'm not familiar with Ubuntu systems). The most likely thing is that the tomcat servlet is effectively generating content in UTF-8, and then trying to return this character to the end client, via a PrintWriter, in ISO1 where the currency symbol in use is not supported by ISO1, hence the '?'. Alternatively tomcat is returning either ISO1 or UTF-8 characters but not declaring them as such in its response headers, leaving the browser confused and its choosing the wrong default. Be useful to know what headers tomcat is returning really. I can't begin to count the number of times I've had problems with character encoding issues in the past, both on response and request handling, fortunately the general trend for everything (including mobile browsers) to support UTF-8 is slowly making life much much easier. Mark This email has been scanned for all known viruses by the MessageLabs SkyScan service. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Will if possible use pound instead... that I think its font independent... Otherwise I think you have to sorround that getCurrencyInstance stuff with a font... and tell it what font it must use... ... I think I'm just wondering how the systems guess the character set from getCurrencyInstance... I think the answer is there... I think this because in a text editor if you insert a pound symbol you also have to choose it from a font set and not all fonts support it... So.. its getting inserted on some unknown font... and then the browser has to guess it... Its something like that pound may be easier Have fun... Definitely having fun ! ;-) Thanks for your suggestion in using the pound sign / and the fonts, but the 'getCurrencyInstance' is supposed to hide all that from me. I rather think it has something to do with the Tomcat 6 configuration, because all else is equal: same server with same jave / same app / same client / .. only in Tomcat 5.5 it does work, and in Tomcat 6 it doesn't.
Re: Migrating to tomcat 6 gives formatted currency amounts problem
1. What the _Browser_ thinks about encoding of your page. In menu View Encoding what encoding is auto-selected there. Western / ISO 8859-1 for both. 2. In Page Info dialog of Firefox (in Tools menu or in context menu Page Info ) what is Encoding, Content Type, and what META tags are mentioned (does it include Content-Type tag) (disclaimer: I have a localized version of FF, so the above names are translated ones). Encoding: ISO-8859-1 Content type / meta tags are not mentioned. 3. Save both pages as HTML (choose HTML only format when saving), and compare their text. Is there any difference? Since the content is Ajax generated, a save-page doesn't make much sense. When I highlight the bits, and do a view-selection-source and then copy/paste this into vi, I notice that the 5.5 page shows the pound sign, while the 6.0 page shows a blank spot where the pound sign is supposed to be. 4. Well, pound; (notice the trailing ';'), or better #163; should display the pound sign irregardless of what encoding the browser thinks that your page uses. Use the #..; notation if generic xml processing is involved (the pound; entity is defined for (X)HTML only). The NumberFormat.getCurrencyInstance(Locale.UK) is supposed to save me the pain of putting currency signs in. Thanks for your reply, Konstantin. Regards, Willem
Re: Migrating to tomcat 6 gives formatted currency amounts problem
On Thu, Sep 11, 2008 at 11:35 AM, Mark Hagger [EMAIL PROTECTED]wrote: You are almost certainly having a problem with (default) character encodings on your system, usual things to check are the encoding that the JVM is using, for example what does: echo $LANG return (usually controlled by what's defined in /etc/sysconfig/i18n - although I'm not familiar with Ubuntu systems). echo $LANG gives me this: en_US.UTF-8 The most likely thing is that the tomcat servlet is effectively generating content in UTF-8, and then trying to return this character to the end client, via a PrintWriter, in ISO1 where the currency symbol in use is not supported by ISO1, hence the '?'. Alternatively tomcat is returning either ISO1 or UTF-8 characters but not declaring them as such in its response headers, leaving the browser confused and its choosing the wrong default. Be useful to know what headers tomcat is returning really. But then, it would be the same issue for tomcat 5.5, no ? And there it doesn't go wrong... Like stated earlier: I rather think it has something to do with the Tomcat 6 configuration, because all else is equal: same server with same java / same webapp / same client(FF4) / .. only in Tomcat 5.5 it does work, and in Tomcat 6 it doesn't. Thanks for your reply, Mark! Regards, Willem
Re: Migrating to tomcat 6 gives formatted currency amounts problem
- Original Message - From: Willem Moors [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Thursday, September 11, 2008 12:36 PM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem On Thu, Sep 11, 2008 at 11:35 AM, Mark Hagger [EMAIL PROTECTED]wrote: You are almost certainly having a problem with (default) character encodings on your system, usual things to check are the encoding that the JVM is using, for example what does: echo $LANG return (usually controlled by what's defined in /etc/sysconfig/i18n - although I'm not familiar with Ubuntu systems). echo $LANG gives me this: en_US.UTF-8 The most likely thing is that the tomcat servlet is effectively generating content in UTF-8, and then trying to return this character to the end client, via a PrintWriter, in ISO1 where the currency symbol in use is not supported by ISO1, hence the '?'. Alternatively tomcat is returning either ISO1 or UTF-8 characters but not declaring them as such in its response headers, leaving the browser confused and its choosing the wrong default. Be useful to know what headers tomcat is returning really. But then, it would be the same issue for tomcat 5.5, no ? And there it doesn't go wrong... Like stated earlier: I rather think it has something to do with the Tomcat 6 configuration, because all else is equal: same server with same java / same webapp / same client(FF4) / .. only in Tomcat 5.5 it does work, and in Tomcat 6 it doesn't. Thanks for your reply, Mark! Regards, Willem Will, I cant see how TC can be influencing it You write a char (the pound) to an output stream it appears differently in browser... TC is just sendign what it gets... Its got to be this... NumberFormat.getCurrencyInstance(Locale.UK) and that is Java... so I conclude... TC 6 is not on the same JDK/JRE as TC 5 You JAVA has changed... must be.. That stuff that you like is LOCALE stuff... and that stuff can all be configured from outside Java... You are choosing a Locale... but if the font.property files in JRE/LIB are different... its probably picking a wide super new Sun font... which in swing will make no diffs... but where the old JRE was using the something a browser gets... the new GB_SUPER font with english flags and the national anthem... confuses current browsers. I think... you looking in the wrong place... Convert it to bytes... and print that... you will see it... I think Then just to confince yourself that TC is not doing a weird Arabic header... get the header plugin for FireFox... and have a look... I doubt they diffs... Have more fun... --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
On Thu, Sep 11, 2008 at 1:42 PM, Johnny Kewl [EMAIL PROTECTED] wrote: Will, I cant see how TC can be influencing it You write a char (the pound) to an output stream it appears differently in browser... TC is just sendign what it gets... Its got to be this... NumberFormat.getCurrencyInstance(Locale.UK) and that is Java... so I conclude... TC 6 is not on the same JDK/JRE as TC 5 You JAVA has changed... must be.. Sorry to have to disappoint you, but this server was installed just a few days ago, and there is only ONE JDK on it: java version 1.6.0_07 Java(TM) SE Runtime Environment (build 1.6.0_07-b06) Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode) So it's impossible the TC5.5 uses a diffferent Java then TC6. That stuff that you like is LOCALE stuff... and that stuff can all be configured from outside Java... You are choosing a Locale... but if the font.property files in JRE/LIB are different... its probably picking a wide super new Sun font... which in swing will make no diffs... but where the old JRE was using the something a browser gets... the new GB_SUPER font with english flags and the national anthem... confuses current browsers. I think... you looking in the wrong place... Convert it to bytes... and print that... you will see it... I think Can it be one of the libraries (*.jar) that is different, that forcec TC6 to act differently ? Then just to confince yourself that TC is not doing a weird Arabic header... get the header plugin for FireFox... and have a look... I doubt they diffs... That is a good track to follow ! Thanks for this advice. Have more fun... Thanks! Willem
Re: Migrating to tomcat 6 gives formatted currency amounts problem
On Wed, Sep 10, 2008 at 10:27 AM, Willem Moors [EMAIL PROTECTED] wrote: I'm transferring my application from a tomcat 5.5.26 server to tomcat 6.0.18, and notice that my formatted currency amounts are not being properly displayed. Instead of a Pound (GBP) sign I get a question mark within a black diamond (the app works fine in 5.5.26). This can easily be emulated. Add the following lines to the HelloWorldExample.java of the servlet examples in Tomcat 5.5.26 and those of 6.0.18: java.text.NumberFormat currencyFormat= java.text.NumberFormat.getCurrencyInstance(Locale.UK); out.print(Formatted currency (GBP) : + currencyFormat.format( 1623540.00 ) ); This will display the following : In Tomcat 6.0.18: Formatted currency (GBP) : ?1,623,540.00 (I've emulated the question-mark within diamond here, I'll send you a screenshot if you want) Tomcat 5.5.26: Formatted currency (GBP) : £1,623,540.00 (depending on your client you may or not may see the pound sign in front of the above amount) What can be the problem, is there some extra locale configuration that needs to be done ? I experienced similar issues (though not UK Locale) running Tomcat in Linux/UNIX. For reasons unknown, my Tomcat/Java was not picking up the default locale of the OS. So I explicitly set them for the JVM by putting JAVA_OPTS=-Duser.country=US -Duser.language=en in setenv.sh. Problem solved. This is admittedly a duct-tape solution. I would rather know why Java was not using the proper locale and get that fixed, but time is money. Examine your Tomcat 5 setup, maybe a similar tweak had been made there.. -- Jeff - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
- Original Message - From: Willem Moors [EMAIL PROTECTED] I think... you looking in the wrong place... Convert it to bytes... and print that... you will see it... I think Can it be one of the libraries (*.jar) that is different, that forcec TC6 to act differently ? --- Will's Phantom Font Project --- I been trying to find a way for you to set the font you want for a locale... It does seem to be an option in JAVA... ie I think Java is expecting to find that from a GUI But here is the whole story http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale Notice that on linux there are things like it depends if the font server starts up... yada yada. I'm totally surprized that its the same JRE... I think it may be possible that something else is setting the font... and then the JRE is using that. The above link actually gives you a way to find out what font is been picked up... But... I think this is all wrong anyway... say you get it figured out, and pick Heleva... or whatever... then you now have to tell the browser to use that in CSS or whatever its the beginning of a complex cycle... pound is making it the browsers problem and internally the browser will find a font and make it happen... And then if someone moves your servlet to a headless linux here we go again... is the font there... etc I think you can get it to work, and it is interesting... but I'm not sure you want to... I'd luv to know if the theory is right on your system... ie run this String s = currencyFormat.format(1623540.00 ); byte[] ba = s.getBytes(); String ans = ; for (int i = 0; i ba.length; i++) { ans += Integer.toHexString(ba[i]); } System.out.print(DA BYTES : + ans); See if the bytes are changing... ie the fonts are changing... ... that me out of idea's... other than it look like Java's localization can nail you... and I'm now worrying about some of my systems... ha ha. --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
- Original Message - From: Johnny Kewl [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Thursday, September 11, 2008 4:28 PM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem - Original Message - From: Willem Moors [EMAIL PROTECTED] I think... you looking in the wrong place... Convert it to bytes... and print that... you will see it... I think Can it be one of the libraries (*.jar) that is different, that forcec TC6 to act differently ? --- Will's Phantom Font Project --- I been trying to find a way for you to set the font you want for a locale... It does seem to be an option in JAVA... ie I think Java is expecting to find that from a GUI But here is the whole story http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale Notice that on linux there are things like it depends if the font server starts up... yada yada. I'm totally surprized that its the same JRE... I think it may be possible that something else is setting the font... and then the JRE is using that. The above link actually gives you a way to find out what font is been picked up... But... I think this is all wrong anyway... say you get it figured out, and pick Heleva... or whatever... then you now have to tell the browser to use that in CSS or whatever its the beginning of a complex cycle... pound is making it the browsers problem and internally the browser will find a font and make it happen... And then if someone moves your servlet to a headless linux here we go again... is the font there... etc I think you can get it to work, and it is interesting... but I'm not sure you want to... I'd luv to know if the theory is right on your system... ie run this String s = currencyFormat.format(1623540.00 ); byte[] ba = s.getBytes(); String ans = ; for (int i = 0; i ba.length; i++) { ans += Integer.toHexString(ba[i]); } System.out.print(DA BYTES : + ans); See if the bytes are changing... ie the fonts are changing... ... that me out of idea's... other than it look like Java's localization can nail you... and I'm now worrying about some of my systems... ha ha. IE Format your numbers but dont include a currency symbol thru Java... use pound... Interesting question... thanks --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
I studied the Response Headers for the ajax call that generates the output and found that for the correct result (ie. in TC55), the content type was this: Content-Typetext/plain;charset=ISO-8859-1 while for the wrong result (ie. in TC6), the content type was: Content-Typetext/plain So I added this line to my code : response.setCharacterEncoding(ISO-8859-15); (I chose the ISO-..-15 set, to see if my change had effect) And lo and behold: problem solved ! So would this be the right conclusion : it's TC55 that's wrong here and not TC 6 ? TC55 slaps on the 'charset=ISO-8859-1' by default and TC 6 doesn't. Anyway, glad to have found the solution, thank you all for chipping in your ideas! Regards, Willem
Re: Migrating to tomcat 6 gives formatted currency amounts problem
2008/9/11 Willem Moors [EMAIL PROTECTED]: I studied the Response Headers for the ajax call that generates the output and found that for the correct result (ie. in TC55), the content type was this: Content-Typetext/plain;charset=ISO-8859-1 while for the wrong result (ie. in TC6), the content type was: Content-Typetext/plain So I added this line to my code : response.setCharacterEncoding(ISO-8859-15); (I chose the ISO-..-15 set, to see if my change had effect) And lo and behold: problem solved ! So would this be the right conclusion : it's TC55 that's wrong here and not TC 6 ? TC55 slaps on the 'charset=ISO-8859-1' by default and TC 6 doesn't. Anyway, glad to have found the solution, thank you all for chipping in your ideas! Hi, Willem! Glad to hear, that you solved this. By the way, I think it is not Tomcat, but the browser that is confused when the encoding is not specified in the Content-Type header. Those question marks were '? in a romb' i.e. Unicode replacement symbol. I.e. as if those were replaced at the browser side. When PrintWriter replaces symbols, it prints '?' punctuation mark. Is it true, that the Content-Type header of your Ajax responses now has the ;charset=... suffix? (Is Content-Type updated from your setCharacterEncoding(), or not?) Also, I have heard that Ajax responses that are read through XmlHttpRequest are expected to be in UTF-8. E.g., mentioned here: http://dojotoolkit.org/book/dojo-book-0-9/part-3-programmatic-dijit-and-dojo/i18n/encoding-considerations Also, your HTML pages do not specify their charset explicitly, thus the browser has to autodetect their encoding, http://www.w3.org/TR/html4/charset.html#spec-char-encoding Also, Tomcat wiki: http://wiki.apache.org/tomcat/FAQ/CharacterEncoding Best regards, Konstantin Kolinko - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
- Original Message - From: Willem Moors [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Thursday, September 11, 2008 5:06 PM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem I studied the Response Headers for the ajax call that generates the output and found that for the correct result (ie. in TC55), the content type was this: Content-Typetext/plain;charset=ISO-8859-1 while for the wrong result (ie. in TC6), the content type was: Content-Typetext/plain So I added this line to my code : response.setCharacterEncoding(ISO-8859-15); (I chose the ISO-..-15 set, to see if my change had effect) And lo and behold: problem solved ! So would this be the right conclusion : it's TC55 that's wrong here and not TC 6 ? TC55 slaps on the 'charset=ISO-8859-1' by default and TC 6 doesn't. Anyway, glad to have found the solution, thank you all for chipping in your ideas! Regards, Willem Didnt realize this was Ajax... ;) I think browsers default to ISO-8859-1 unless set otherwise anyway... so its a bit strange. Maybe the plain text has an effect... It also depend on the Accept headers that Ajax sent to TC... if it doesnt specify a required encoding TC is actually at liberty to return whatever it wants, unless of course you dictate the encoding... I see now why you cant use pound ;) I think its just a matter of telling TC what it must do, either from client header or as you doing... forcing a response. Its your servlet... and you should probably also be setting the size headers in your response... Its a question/answer thing, so there is no bug, unless the client said, gimme utf/ISO whatever and TC didnt... So I guess the theory on localized fonts changing just fell thru ;) I wonder how that actually works... I mean if you set a china locale... it just has to be a weird font... what happens if it no there? Set those headers Ajax is not automatic either... make sure the system isnt guessing... --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
- Original Message - From: Johnny Kewl [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Thursday, September 11, 2008 6:18 PM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem - Original Message - From: Willem Moors [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Thursday, September 11, 2008 5:06 PM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem I studied the Response Headers for the ajax call that generates the output and found that for the correct result (ie. in TC55), the content type was this: Content-Typetext/plain;charset=ISO-8859-1 while for the wrong result (ie. in TC6), the content type was: Content-Typetext/plain So I added this line to my code : response.setCharacterEncoding(ISO-8859-15); (I chose the ISO-..-15 set, to see if my change had effect) And lo and behold: problem solved ! So would this be the right conclusion : it's TC55 that's wrong here and not TC 6 ? TC55 slaps on the 'charset=ISO-8859-1' by default and TC 6 doesn't. Anyway, glad to have found the solution, thank you all for chipping in your ideas! Regards, Willem Didnt realize this was Ajax... ;) I think browsers default to ISO-8859-1 unless set otherwise anyway... so its a bit strange. Maybe the plain text has an effect... It also depend on the Accept headers that Ajax sent to TC... if it doesnt specify a required encoding TC is actually at liberty to return whatever it wants, unless of course you dictate the encoding... I see now why you cant use pound ;) I think its just a matter of telling TC what it must do, either from client header or as you doing... forcing a response. Its your servlet... and you should probably also be setting the size headers in your response... Its a question/answer thing, so there is no bug, unless the client said, gimme utf/ISO whatever and TC didnt... So I guess the theory on localized fonts changing just fell thru ;) I wonder how that actually works... I mean if you set a china locale... it just has to be a weird font... what happens if it no there? Set those headers Ajax is not automatic either... make sure the system isnt guessing... Actually here something interesting for you to try I discovered the IE is a huge guesser... some may say more intelligent... On IE if you set the header to text/plain... but make an HTML page... its somehow guesses that its not text plain and makes it HTML... Other browsers will dispay the raw HTML... browsers do guess if you dont help em... and IE just over rules you ;) Make sure you test in more than one browser as well... that often catches stuff like this... --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Johnny, Johnny Kewl wrote: I think it may be possible that something else is setting the font... and then the JRE is using that. I think you're totally confusing yourself about font issues. Java only interacts with fonts of any kind when running AWT/Swing apps. Webapps have no interactions with fonts of any kind. The font used to display the web page is entirely dependent on the web browser. The web browser chooses the font based upon the style of the text to be displayed, and the language it's being displayed in. The likely problem, here, is the encoding appearing in the Content-Type header from the server. It's possible that Willem's 5.x server is configured with a Valve to set the default character encoding, and that the 6.x server is not similarly configured. Willem, can you post the relevant sections of your server.xml files from each version? If you can't figure out what's relevant, just post the entire thing. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjJSmwACgkQ9CaO5/Lv0PCrjQCgnyTGy7SuYmJQme+uJRo+kpkH qu0AniqswmAHi50a/6NgQlyuWJbP4U3x =jBNr -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Willem, Willem Moors wrote: I studied the Response Headers for the ajax call that generates the output and found that for the correct result (ie. in TC55), the content type was this: Content-Typetext/plain;charset=ISO-8859-1 while for the wrong result (ie. in TC6), the content type was: Content-Typetext/plain Looks like the server is using something else (UTF-8?) in TC 6 and not reporting it to the client. The client is assuming ISO-8859-1 and therefore misinterpreting those characters outside of US-ASCII (such as £). So would this be the right conclusion : it's TC55 that's wrong here and not TC 6 ? TC55 slaps on the 'charset=ISO-8859-1' by default and TC 6 doesn't. It might not be by default: lots of folks explicitly set their charsets to UTF-8 using some other technique. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjJS3sACgkQ9CaO5/Lv0PDsoQCfXcxM6uOoaA7lWCbySN8dNblG u0oAn0ybnK1s5T6TVZuhHemLHnoriQkr =tDJb -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
- Original Message - From: Christopher Schultz [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Thursday, September 11, 2008 6:42 PM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Johnny, Johnny Kewl wrote: I think it may be possible that something else is setting the font... and then the JRE is using that. I think you're totally confusing yourself about font issues. Java only interacts with fonts of any kind when running AWT/Swing apps. Webapps have no interactions with fonts of any kind. Chris... exactly yes... it turns out he want setting headers, so you absolutely right... but his code is introducing a font into a web app and thats what I'm wondering about... Forget about the webapp for a moment and just look at his code... java.text.NumberFormat currencyFormat= java.text.NumberFormat.getCurrencyInstance(Locale.UK); out.print(Formatted currency (GBP) : + currencyFormat.format( 1623540.00 ) ); Its generating a pound... the question is, the webapp is not dicatation the font... so I'm asking what font is being used for the pound? And then yes... it so happens that he has found the encoding that works in text plain... but its a flook, is lucky, its a problem waiting to happen because if I change that locale of his to french, german, chinese... what font is that now going to be... and that will probably definitely not work in default US encoding... Theres a few problem here... He *is* introducing a font into a webapp and we dont even know what it is? --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
On Thu, Sep 11, 2008 at 11:16 AM, Johnny Kewl [EMAIL PROTECTED] wrote: Its generating a pound... the question is, the webapp is not dicatation the font... so I'm asking what font is being used for the pound? Whatever the browser picks from what it has available. :-) He *is* introducing a font into a webapp No. A character, a codepoint, yes, not a font. -- Hassan Schroeder [EMAIL PROTECTED] - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
I'm willing to bet the symbol for the british pound is not part of the normal web character set like a US dollar symbol is and as a result needs to be expressed by entity notation ( pound; or #163; ). --David Johnny Kewl wrote: - Original Message - From: Christopher Schultz [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Thursday, September 11, 2008 6:42 PM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Johnny, Johnny Kewl wrote: I think it may be possible that something else is setting the font... and then the JRE is using that. I think you're totally confusing yourself about font issues. Java only interacts with fonts of any kind when running AWT/Swing apps. Webapps have no interactions with fonts of any kind. Chris... exactly yes... it turns out he want setting headers, so you absolutely right... but his code is introducing a font into a web app and thats what I'm wondering about... Forget about the webapp for a moment and just look at his code... java.text.NumberFormat currencyFormat= java.text.NumberFormat.getCurrencyInstance(Locale.UK); out.print(Formatted currency (GBP) : + currencyFormat.format( 1623540.00 ) ); Its generating a pound... the question is, the webapp is not dicatation the font... so I'm asking what font is being used for the pound? And then yes... it so happens that he has found the encoding that works in text plain... but its a flook, is lucky, its a problem waiting to happen because if I change that locale of his to french, german, chinese... what font is that now going to be... and that will probably definitely not work in default US encoding... Theres a few problem here... He *is* introducing a font into a webapp and we dont even know what it is? --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
- Original Message - From: Hassan Schroeder [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Thursday, September 11, 2008 8:58 PM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem On Thu, Sep 11, 2008 at 11:16 AM, Johnny Kewl [EMAIL PROTECTED] wrote: Its generating a pound... the question is, the webapp is not dicatation the font... so I'm asking what font is being used for the pound? Whatever the browser picks from what it has available. :-) He *is* introducing a font into a webapp No. A character, a codepoint, yes, not a font. I tell you Wils example has confused the hell out of me... ha ha Wil... you have caused chaos... ha ha I'm probably using definition incorrectly lets just say you internationalizing on a page...so you have meta http-equiv=Content-Type content=text/html; charset=UTF-8 So you can display multiple langauges Now you designing a web page... you pick Arial... you select ® (registered trade mark as a font if it doesnt come out) And life is good But when that gets done for you... not from your own resource bundles, but from a locale that can be using any character point in a font and you dont know what the font actually is the charset wont even help you because how does the browser know it was Arial? If it diplays it in MS Serif... surely its going to be wrong... Its not really a browser problem thats bugging me... its the local gives you something, it varies, especially on a headless linux and you cant assume its anything Even worse if a chinese font has not been installed... its probably a ? I think one has to use pound because Java's localization in this area is unpredicatable... So if you do want to use the pound symbols from localization... you also have to discover the font (some how) and then you have to add that HTML to CSS code to your page Or maybe Java is a whole lot smarter than I'm giving it credit for and its embedding font attributed in the UTF8 or something... I dont know... all I do know is that putting pound in your Resource bundle is a whole lot easier... Totally confused... but I think if Wil is internationalizing that app... its going to give him a huge head ache They disnt make pound and reg and all the rest for nothing... I think its because it is a major head ache otherwise... ... I dont know... Wils phantom font has got me... ;) --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
On Thu, Sep 11, 2008 at 12:54 PM, Johnny Kewl [EMAIL PROTECTED] wrote: Now you designing a web page... you pick Arial... have to discover the font (some how) and then you have to add that HTML to CSS code to your page Do you not understand that style information, including fonts, is just a serving suggestion? A user-agent has *no* obligation to use any given font, or any font at all. If I'm looking at your page in Lynx, the font will be whatever my own terminal window settings specify, be it Comic Sans or Copperplate Gothic Bold. If I use wget to grab a page and store it into a file or a DB, there is no font information involved at any point whatsoever -- it's just character data in some specified (or assumed!) encoding. If a user-agent is intended to generate a visual display /and/ has a font available to it with a glyph matching a specified code-point in a specified encoding, great. If not -- so sorry. Doesn't matter whether you were using HTML entities or numeric representation: ? is it. FWIW, -- Hassan Schroeder [EMAIL PROTECTED] - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
- Original Message - From: Hassan Schroeder [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Thursday, September 11, 2008 11:07 PM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem On Thu, Sep 11, 2008 at 12:54 PM, Johnny Kewl [EMAIL PROTECTED] wrote: Now you designing a web page... you pick Arial... have to discover the font (some how) and then you have to add that HTML to CSS code to your page Do you not understand that style information, including fonts, is just a serving suggestion? A user-agent has *no* obligation to use any given font, or any font at all. http://www.kewlstuff.co.za/test/test.htm What do you see in this test page? --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
On Thu, Sep 11, 2008 at 2:41 PM, Johnny Kewl [EMAIL PROTECTED] wrote: http://www.kewlstuff.co.za/test/test.htm What do you see in this test page? problems :-) http://validator.w3.org/check?uri=http%3A%2F%2Fwww.kewlstuff.co.za%2Ftest%2Ftest.htmcharset=%28detect+automatically%29doctype=Inliness=1group=0verbose=1user-agent=W3C_Validator%2F1.591 -- Hassan Schroeder [EMAIL PROTECTED] - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
- Original Message - From: Johnny Kewl [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Thursday, September 11, 2008 11:41 PM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem - Original Message - From: Hassan Schroeder [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Thursday, September 11, 2008 11:07 PM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem On Thu, Sep 11, 2008 at 12:54 PM, Johnny Kewl [EMAIL PROTECTED] wrote: Now you designing a web page... you pick Arial... have to discover the font (some how) and then you have to add that HTML to CSS code to your page Do you not understand that style information, including fonts, is just a serving suggestion? A user-agent has *no* obligation to use any given font, or any font at all. http://www.kewlstuff.co.za/test/test.htm What do you see in this test page? Hassan I not arguing, you know nothing about that font... how is your client going to display it? I'm probably missing something... teach me. --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
On Thu, Sep 11, 2008 at 2:53 PM, Johnny Kewl [EMAIL PROTECTED] wrote: Hassan I not arguing, you know nothing about that font... how is your client going to display it? If the page contains an invalid code-point, as the error message points out, then what should a browser display?? -- Hassan Schroeder [EMAIL PROTECTED] - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Johnny Kewl wrote: http://www.kewlstuff.co.za/test/test.htm What do you see in this test page? The output of a server that lies right to my face. It says, it is serving UTF-8-encoded text, while it really serves text encoded with some 8-bit charset - probably ISO-8859-1. Regards mks - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
- Original Message - From: Hassan Schroeder [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Thursday, September 11, 2008 11:59 PM Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem On Thu, Sep 11, 2008 at 2:53 PM, Johnny Kewl [EMAIL PROTECTED] wrote: Hassan I not arguing, you know nothing about that font... how is your client going to display it? If the page contains an invalid code-point, as the error message points out, then what should a browser display?? Thats probably what I'm not getting... All I did was set the Font to Verdana and drop a registered mark in... And thats what I'm worried about because locale info will default to something similar I dont think that local code of Wils, knows its in a webapp? Anyway... look I dont get it... maybe the only thing to say is that if one introduces technology targeting GUI and Swing into a server, its probably got issues. Whether that locale stuff is intelligent enuf not to make an invalid code point... thats the question. ... I dont know ;) --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
On Thu, Sep 11, 2008 at 3:20 PM, Johnny Kewl [EMAIL PROTECTED] wrote: If the page contains an invalid code-point, as the error message points out, then what should a browser display?? Thats probably what I'm not getting... All I did was set the Font to Verdana and drop a registered mark in... However you created your test page, it /isn't valid UTF-8/. Until that's resolved, it has no value as a test of anything. Whether that locale stuff is intelligent enuf not to make an invalid code point... thats the question. If that were my question, I'd be testing Locale-based code :-) -- Hassan Schroeder [EMAIL PROTECTED] - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
http://validator.w3.org Very cool btw... didnt know it was there --- HARBOR : http://www.kewlstuff.co.za/index.htm The most powerful application server on earth. The only real POJO Application Server. See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm --- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Migrating to tomcat 6 gives formatted currency amounts problem
From: Johnny Kewl [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem http://www.kewlstuff.co.za/test/test.htm What do you see in this test page? Depends on which character encoding I choose to view the page in. For the declared UTF-8, FF3 shows the invalid hex value at that spot in your page. If I override that with say ISO-8859-15, the R in a circle appears. Note that no font is involved here, just the encoding declaration. You need to get over this fixation with fonts - they have absolutely nothing to do with this issue. A font is just a graphical description of how to draw one or more code points on an output device, based on the font designer's take on what each code point should look like. It's the character encoding that tells the message recipient what code point to generate for a given bit pattern; only after the code point is determined does any font get involved to create the visible symbol. This is a great site to get lost in for a few days: http://www.unicode.org/ - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Migrating to tomcat 6 gives formatted currency amounts problem
From: David Smith [mailto:[EMAIL PROTECTED] Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem I'm willing to bet the symbol for the british pound is not part of the normal web character set like a US dollar symbol is and as a result needs to be expressed by entity notation ( pound; or #163; ). I'm not sure these days what the normal web character set really is. If you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling symbol is not present. However, for any of the ISO-8859-x variants, it is present, using the 163 (0xA3) value you noted (same as the Unicode code point). It's also in UTF-8 of course, but requires two bytes (0xC2 0xA3) to represent the code point. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Send reply to: Tomcat Users List users@tomcat.apache.org Date sent: Wed, 10 Sep 2008 17:27:51 +0200 From: Willem Moors [EMAIL PROTECTED] To: users@tomcat.apache.org Subject:Migrating to tomcat 6 gives formatted currency amounts problem I'm transferring my application from a tomcat 5.5.26 server to tomcat 6.0.18, and notice that my formatted currency amounts are not being properly displayed. Instead of a Pound (GBP) sign I get a question mark within a black diamond (the app works fine in 5.5.26). This can easily be emulated. Add the following lines to the HelloWorldExample.java of the servlet examples in Tomcat 5.5.26 and those of 6.0.18: java.text.NumberFormat currencyFormat= java.text.NumberFormat.getCurrencyInstance(Locale.UK); out.print(Formatted currency (GBP) : + currencyFormat.format( 1623540.00 ) ); This will display the following : In Tomcat 6.0.18: Formatted currency (GBP) : ?1,623,540.00 (I've emulated the question-mark within diamond here, I'll send you a screenshot if you want) Tomcat 5.5.26: Formatted currency (GBP) : £1,623,540.00 (depending on your client you may or not may see the pound sign in front of the above amount) Works fine for me, fresh install of 6.0.18, changed the HelloWorldExample.java and recompiled. Tried with both IE7 and FF 3. Are you sure you don't have a httpd in front of tomcat? I've seen simillar problem when using apache httpd. I had to turn off the option AddDefaultCharset -Steve O. What can be the problem, is there some extra locale configuration that needs to be done ? Thanks for your answer, Regards, Willem - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Works fine for me, fresh install of 6.0.18, changed the HelloWorldExample.java and recompiled. Tried with both IE7 and FF 3. Are you sure you don't have a httpd in front of tomcat? I've seen simillar problem when using apache httpd. I had to turn off the option AddDefaultCharset -Steve O. Thanks for your quick response. No there is no Apache Httpd in front of it (yet). It's strange that you don't have the problem. The environment in which I'm running both tomcat 6.0.18 and tomcat 5.5.26 is on Ubuntu 64-bit linux with this java version: java version 1.6.0_07 Java(TM) SE Runtime Environment (build 1.6.0_07-b06) Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode) Regards, Willem
Re: Migrating to tomcat 6 gives formatted currency amounts problem
Send reply to: Tomcat Users List users@tomcat.apache.org Date sent: Wed, 10 Sep 2008 18:58:48 +0200 From: Willem Moors [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Subject:Re: Migrating to tomcat 6 gives formatted currency amounts problem Works fine for me, fresh install of 6.0.18, changed the HelloWorldExample.java and recompiled. Tried with both IE7 and FF 3. Are you sure you don't have a httpd in front of tomcat? I've seen simillar problem when using apache httpd. I had to turn off the option AddDefaultCharset -Steve O. Thanks for your quick response. No there is no Apache Httpd in front of it (yet). It's strange that you don't have the problem. The environment in which I'm running both tomcat 6.0.18 and tomcat 5.5.26 is on Ubuntu 64-bit linux with this java version: java version 1.6.0_07 Java(TM) SE Runtime Environment (build 1.6.0_07-b06) Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode) Hmm odd. I tried it on my Redhat test server and worked fine also. Is your tomcat 6 install a default/fresh install? What browser are you using? What character encoding does it think the HelloWorldExample output is coming in as? -Steve O. Regards, Willem - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]