subject:"Re\: Migrating to tomcat 6 gives formatted currency amounts problem"


Caldarale, Charles R wrote:


I'm not sure these days what the normal web character set really is.  If 
you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling symbol is not 
present.  However, for any of the ISO-8859-x variants, it is present, using the 163 
(0xA3) value you noted (same as the Unicode code point).  It's also in UTF-8 of course, 
but requires two bytes (0xC2 0xA3) to represent the code point.

I love these discussions about character sets. They seem to confuse so 
many people; even I, who have been involved in them for 30 years...


Anyway, I have a related question, which I don't think constitutes a 
hijack of this thread, because the underlying cause is probably similar.

Here it goes :

Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x)
The above Tomcat's running under the same Linux or Solaris, essentially 
set up the same way. The JVM may vary, but I don't think that is the 
problem, because of the consistency of the problem as explained below.
I am running a webapp from an external supplier, always the same binary 
version.  I don't have the code, can't see what's in it.
The pages served by that webapp are the same html pages, all of them 
having a declaration meta http-equiv=Content-Type content=text/html; 
charset=iso-8859-1.
The pages also *are* properly encoded as iso-8859-1 (100% positive, I 
know the difference).

The browser receiving the pages is always the same one, same settings.

Now,

case a)
in the Tomcat startup files, I do nothing, meaning I just take Tomcat 
out-of-the-box and run the webapp.
Result : in any such html page that contains characters with an ISO-8859 
codepoint above \xA0 (meaning the displayable characters of the high 
part of the table, where one finds things like uppercase A with 
umlaut), these characters

  - appear in the browser display as ? (minus the quotes)
  - also if I save the page from the browser to disk, and look at them 
with an iso-8859-1 capable editor, they are effectively ?.
(So it's not the browser misunderstanding them, it is Tomcat sending 
them that way).


case b)
In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or 
even in /etc/init.d/tomcat5.5), I add the following line

LC_CTYPE=en_us.iso88591
(or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE)
(before the actual start of Tomcat)
and restart Tomcat
then the same page displays properly in the browser, and also is correct 
iso-8859-1 when saved to disk and examined with the editor.
(In other words, what previously were ? characters, are now the 
correct iso-8859-1 character bytes).


Now my question is :
How can it matter which LC_CTYPE Tomcat is started under, that would 
have the result above ?
The behaviour above is consistent across different hosts, across the 
same or different Tomcat versions, it is always the same webapp, always 
the same html pages, always the same browser, etc.  Only that LC_CTYPE 
line changes the behaviour.
On the face of it, the only thing I can think of that would explain 
this, is that the webapp in question does something wrong, but what 
exactly could it be doing ?

Any ideas ?

Thanks in advance,
André


-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

- Original Message - 
From: Caldarale, Charles R [EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Friday, September 12, 2008 6:01 AM
Subject: RE: Migrating to tomcat 6 gives formatted currency amounts problem

From: Johnny Kewl [mailto:[EMAIL PROTECTED]
Subject: Re: Migrating to tomcat 6 gives formatted currency
amounts problem

http://www.kewlstuff.co.za/test/test.htm
What do you see in this test page?

Depends on which character encoding I choose to view the page in.  For the 
declared UTF-8, FF3 shows the invalid hex value at that spot in your page. 
If I override that with say ISO-8859-15, the R in a circle appears.  Note 
that no font is involved here, just the encoding declaration.

You need to get over this fixation with fonts - they have absolutely nothing 
to do with this issue.  A font is just a graphical description of how to 
draw one or more code points on an output device, based on the font 
designer's take on what each code point should look like.  It's the 
character encoding that tells the message recipient what code point to 
generate for a given bit pattern; only after the code point is determined 
does any font get involved to create the visible symbol.

This is a great site to get lost in for a few days:
http://www.unicode.org/

- Chuck

Yes, I do that, mix terminology

But can I just get your opinion on this...

If this locale stuff is in fact defaulting to an ISO char set that can do 
these symbols... and say you where making a non english page, say 
Japanese... do you think that its possible to use it?

I've actually now seen examples on the web that are doing it Wil's way, they 
using the getCurrencyInstance to make the currency symbols.
And it is the most natural thing in the world for a coder to want to do... 
the functions are synonymous with internationalization.

Its probably in the Java manaul...

But I'm thinking its a US/Eng only methodology... when applied to a web 
page.

Do you think using getCurrencyInstance is generalizable in other languages?

When you say If I override that with say ISO-8859-15, is that the 
whole page you talking about, or it possible to have different character 
encoding sections in a web page thats another area thats confusing me 
now, because if I do look at that test page in a MS tool... it displays 
correctly with mixed encodings?

You see... people are saying in a well designed web page... its a 
suggestion, I get that.
But when you choose a font in a text editor like Swing or Word, you are also 
picking some character set... and thats whats been injected into the page as 
its been formed... Or in a MS localization panel, if you choose you want 
Verdana as a default font... these systems dont throw character sets at 
users, it just picks one in the background... thus my analogy... and its the 
cross over between these systems thats got me confused ;)

I screw up terminology... ok we all know that but
Does Wil need to worry about the way he is doing it?... thats all I'm 
asking... I think so...

Thanks...

---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
--- 

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

2008-09-12 Thread Konstantin Kolinko

2008/9/12 André Warnier [EMAIL PROTECTED]:
 Caldarale, Charles R wrote:

 I'm not sure these days what the normal web character set really is.  If
 you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling
 symbol is not present.  However, for any of the ISO-8859-x variants, it is
 present, using the 163 (0xA3) value you noted (same as the Unicode code
 point).  It's also in UTF-8 of course, but requires two bytes (0xC2 0xA3) to
 represent the code point.

 I love these discussions about character sets. They seem to confuse so many
 people; even I, who have been involved in them for 30 years...

 Anyway, I have a related question, which I don't think constitutes a hijack
 of this thread, because the underlying cause is probably similar.
 Here it goes :

 Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x)
 The above Tomcat's running under the same Linux or Solaris, essentially set
 up the same way. The JVM may vary, but I don't think that is the problem,
 because of the consistency of the problem as explained below.
 I am running a webapp from an external supplier, always the same binary
 version.  I don't have the code, can't see what's in it.
 The pages served by that webapp are the same html pages, all of them having
 a declaration meta http-equiv=Content-Type content=text/html;
 charset=iso-8859-1.
 The pages also *are* properly encoded as iso-8859-1 (100% positive, I know
 the difference).
 The browser receiving the pages is always the same one, same settings.

 Now,

 case a)
 in the Tomcat startup files, I do nothing, meaning I just take Tomcat
 out-of-the-box and run the webapp.
 Result : in any such html page that contains characters with an ISO-8859
 codepoint above \xA0 (meaning the displayable characters of the high part
 of the table, where one finds things like uppercase A with umlaut), these
 characters
  - appear in the browser display as ? (minus the quotes)
  - also if I save the page from the browser to disk, and look at them with
 an iso-8859-1 capable editor, they are effectively ?.
 (So it's not the browser misunderstanding them, it is Tomcat sending them
 that way).

 case b)
 In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or even
 in /etc/init.d/tomcat5.5), I add the following line
 LC_CTYPE=en_us.iso88591
 (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE)
 (before the actual start of Tomcat)
 and restart Tomcat
 then the same page displays properly in the browser, and also is correct
 iso-8859-1 when saved to disk and examined with the editor.
 (In other words, what previously were ? characters, are now the correct
 iso-8859-1 character bytes).

 Now my question is :
 How can it matter which LC_CTYPE Tomcat is started under, that would have
 the result above ?
 The behaviour above is consistent across different hosts, across the same or
 different Tomcat versions, it is always the same webapp, always the same
 html pages, always the same browser, etc.  Only that LC_CTYPE line changes
 the behaviour.
 On the face of it, the only thing I can think of that would explain this, is
 that the webapp in question does something wrong, but what exactly could it
 be doing ?
 Any ideas ?


It is [EMAIL PROTECTED] pageEncoding=... % that is missing from those pages.
Thus JSP compiler does not know what encoding they are using for their
source and messes them at compilation time.

AFAIK (but never tried) it can be configured without modifying the sources
using the jsp-config element in web.xml. It can be done in the default one
in conf/web.xml.
The configuration element is described in JSP.3.3.4 of JSP2.0 spec.

By the way: in my pages I usually declare
[EMAIL PROTECTED] contentType=text/html; charset=... pageEncoding=... %
and add
META http-equiv=Content-type content=%=response.getContentType() %

Thus both HTTP Content-Type: header and the META tag are present
in my response and are always in sync.

Best regards,
Konstantin Kolinko

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem


OK, Wil you made me do some homework... got it sorted for you

You must not guess the Charset... as we been doing.

Use this function

System.out.print(CharSet :  + Charset.defaultCharset().toString());

and thats what you HAVE TO set your page at

On my system it tells me its. windows-1252

On Solaris if you running in a C Locale... it will be US-ASCII
If you running in a US locale it will be ISO-8859-1

Now you doing Ajax, so I imagine you may want to inject this stuff in DIV 
statements...

 I'll let someone else try answer that... mission impossible... I think.

So... you have to convert character sets from what the locale is using... 
from the looks of things different on every single machine and OS... to what 
you using in the web page proper... probably UTF8 if you are 
internationalizing


... it a headache... rather refactor your code so the pages are all the same 
charset of your choosing and work with pound, yen dollar


 anyway use that function to get the decoding that is actually been 
used... they all changed from outside Java... in linux itself by the 
user... so you cannot guess... and then how you going to try get that Ajax 
into DIV's and tables using Javascript and DHtml or whatever only you 
know ;)


. Dont do it..
---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---


-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem



Then one last thing before I put this in my little black book of things I'm 
never going to do... and forget about it forever ;)


This is what windows does

If the machine is on US English...

Regardless of the local I set... German, English, Japanese I set in Java 
the charset is always


windows-1252... which is basically ISO with differences...

But if I switch the machine back to Japanese... then its

windows-32j

So thats what you injecting into your web pages... when using Java 
locale functions... in a web page...
Maybe thats what a person wants and in a company, using these local 
functions and every user is on Windows... it may just work

... thats actually scary...

Nice question
---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---






-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem


Konstantin Kolinko wrote:

2008/9/12 André Warnier [EMAIL PROTECTED]:

Caldarale, Charles R wrote:

I'm not sure these days what the normal web character set really is.  If
you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling
symbol is not present.  However, for any of the ISO-8859-x variants, it is
present, using the 163 (0xA3) value you noted (same as the Unicode code
point).  It's also in UTF-8 of course, but requires two bytes (0xC2 0xA3) to
represent the code point.


I love these discussions about character sets. They seem to confuse so many
people; even I, who have been involved in them for 30 years...

Anyway, I have a related question, which I don't think constitutes a hijack
of this thread, because the underlying cause is probably similar.
Here it goes :

Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x)
The above Tomcat's running under the same Linux or Solaris, essentially set
up the same way. The JVM may vary, but I don't think that is the problem,
because of the consistency of the problem as explained below.
I am running a webapp from an external supplier, always the same binary
version.  I don't have the code, can't see what's in it.
The pages served by that webapp are the same html pages, all of them having
a declaration meta http-equiv=Content-Type content=text/html;
charset=iso-8859-1.
The pages also *are* properly encoded as iso-8859-1 (100% positive, I know
the difference).
The browser receiving the pages is always the same one, same settings.

Now,

case a)
in the Tomcat startup files, I do nothing, meaning I just take Tomcat
out-of-the-box and run the webapp.
Result : in any such html page that contains characters with an ISO-8859
codepoint above \xA0 (meaning the displayable characters of the high part
of the table, where one finds things like uppercase A with umlaut), these
characters
 - appear in the browser display as ? (minus the quotes)
 - also if I save the page from the browser to disk, and look at them with
an iso-8859-1 capable editor, they are effectively ?.
(So it's not the browser misunderstanding them, it is Tomcat sending them
that way).

case b)
In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or even
in /etc/init.d/tomcat5.5), I add the following line
LC_CTYPE=en_us.iso88591
(or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE)
(before the actual start of Tomcat)
and restart Tomcat
then the same page displays properly in the browser, and also is correct
iso-8859-1 when saved to disk and examined with the editor.
(In other words, what previously were ? characters, are now the correct
iso-8859-1 character bytes).

Now my question is :
How can it matter which LC_CTYPE Tomcat is started under, that would have
the result above ?
The behaviour above is consistent across different hosts, across the same or
different Tomcat versions, it is always the same webapp, always the same
html pages, always the same browser, etc.  Only that LC_CTYPE line changes
the behaviour.
On the face of it, the only thing I can think of that would explain this, is
that the webapp in question does something wrong, but what exactly could it
be doing ?
Any ideas ?



It is [EMAIL PROTECTED] pageEncoding=... % that is missing from those pages.
Thus JSP compiler does not know what encoding they are using for their
source and messes them at compilation time.

[...]

But these pages, as far as Tomcat and the webapp are concerned, are not 
dynamic in any way.  They are sraight static html pages.

So is the JSP stuff relevant ?
(I'm genuinely asking, since I know nothing about JSP pages)


-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

2008-09-12 Thread Konstantin Kolinko

2008/9/12 André Warnier [EMAIL PROTECTED]

 Konstantin Kolinko wrote:

 2008/9/12 André Warnier [EMAIL PROTECTED]:

 Caldarale, Charles R wrote:

 I'm not sure these days what the normal web character set really is.
  If
 you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling
 symbol is not present.  However, for any of the ISO-8859-x variants, it
 is
 present, using the 163 (0xA3) value you noted (same as the Unicode code
 point).  It's also in UTF-8 of course, but requires two bytes (0xC2
 0xA3) to
 represent the code point.

  I love these discussions about character sets. They seem to confuse so
 many
 people; even I, who have been involved in them for 30 years...

 Anyway, I have a related question, which I don't think constitutes a
 hijack
 of this thread, because the underlying cause is probably similar.
 Here it goes :

 Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x)
 The above Tomcat's running under the same Linux or Solaris, essentially
 set
 up the same way. The JVM may vary, but I don't think that is the problem,
 because of the consistency of the problem as explained below.
 I am running a webapp from an external supplier, always the same binary
 version.  I don't have the code, can't see what's in it.
 The pages served by that webapp are the same html pages, all of them
 having
 a declaration meta http-equiv=Content-Type content=text/html;
 charset=iso-8859-1.
 The pages also *are* properly encoded as iso-8859-1 (100% positive, I
 know
 the difference).
 The browser receiving the pages is always the same one, same settings.

 Now,

 case a)
 in the Tomcat startup files, I do nothing, meaning I just take Tomcat
 out-of-the-box and run the webapp.
 Result : in any such html page that contains characters with an ISO-8859
 codepoint above \xA0 (meaning the displayable characters of the high
 part
 of the table, where one finds things like uppercase A with umlaut),
 these
 characters
  - appear in the browser display as ? (minus the quotes)
  - also if I save the page from the browser to disk, and look at them
 with
 an iso-8859-1 capable editor, they are effectively ?.
 (So it's not the browser misunderstanding them, it is Tomcat sending them
 that way).

 case b)
 In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or
 even
 in /etc/init.d/tomcat5.5), I add the following line
 LC_CTYPE=en_us.iso88591
 (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE)
 (before the actual start of Tomcat)
 and restart Tomcat
 then the same page displays properly in the browser, and also is correct
 iso-8859-1 when saved to disk and examined with the editor.
 (In other words, what previously were ? characters, are now the correct
 iso-8859-1 character bytes).

 Now my question is :
 How can it matter which LC_CTYPE Tomcat is started under, that would have
 the result above ?
 The behaviour above is consistent across different hosts, across the same
 or
 different Tomcat versions, it is always the same webapp, always the same
 html pages, always the same browser, etc.  Only that LC_CTYPE line
 changes
 the behaviour.
 On the face of it, the only thing I can think of that would explain this,
 is
 that the webapp in question does something wrong, but what exactly could
 it
 be doing ?
 Any ideas ?


 It is [EMAIL PROTECTED] pageEncoding=... % that is missing from those 
 pages.
 Thus JSP compiler does not know what encoding they are using for their
 source and messes them at compilation time.

 [...]

 But these pages, as far as Tomcat and the webapp are concerned, are not
 dynamic

in any way.  They are straight static html pages.
 So is the JSP stuff relevant ?
 (I'm genuinely asking, since I know nothing about JSP pages)


The static HTML pages, as well as all the other static files, are served by
the
DefaultServlet. You should dig there. I think that fileEncoding
initialization parameter
of the servlet, as well as mime-mapping settings in web.xml come into
play.

JSP settings are irrelevant for them, of course.

Best regards,
Konstantin Kolinko

Re: Migrating to tomcat 6 gives formatted currency amounts problem

- Original Message - 
From: André Warnier [EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Friday, September 12, 2008 10:08 AM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

Caldarale, Charles R wrote:

I'm not sure these days what the normal web character set really is. 
If you're referring to ASCII (aka Basic Latin), then no, the Pound 
Sterling symbol is not present.  However, for any of the ISO-8859-x 
variants, it is present, using the 163 (0xA3) value you noted (same as 
the Unicode code point).  It's also in UTF-8 of course, but requires two 
bytes (0xC2 0xA3) to represent the code point.

I love these discussions about character sets. They seem to confuse so 
many people; even I, who have been involved in them for 30 years...

Anyway, I have a related question, which I don't think constitutes a 
hijack of this thread, because the underlying cause is probably similar.

Here it goes :

Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x)
The above Tomcat's running under the same Linux or Solaris, essentially 
set up the same way. The JVM may vary, but I don't think that is the 
problem, because of the consistency of the problem as explained below.
I am running a webapp from an external supplier, always the same binary 
version.  I don't have the code, can't see what's in it.
The pages served by that webapp are the same html pages, all of them 
having a declaration meta http-equiv=Content-Type content=text/html; 
charset=iso-8859-1.
The pages also *are* properly encoded as iso-8859-1 (100% positive, I know 
the difference).

The browser receiving the pages is always the same one, same settings.

Now,

case a)
in the Tomcat startup files, I do nothing, meaning I just take Tomcat 
out-of-the-box and run the webapp.
Result : in any such html page that contains characters with an ISO-8859 
codepoint above \xA0 (meaning the displayable characters of the high 
part of the table, where one finds things like uppercase A with umlaut), 
these characters

  - appear in the browser display as ? (minus the quotes)
  - also if I save the page from the browser to disk, and look at them 
with an iso-8859-1 capable editor, they are effectively ?.
(So it's not the browser misunderstanding them, it is Tomcat sending them 
that way).

case b)
In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or even 
in /etc/init.d/tomcat5.5), I add the following line

LC_CTYPE=en_us.iso88591
(or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE)
(before the actual start of Tomcat)
and restart Tomcat
then the same page displays properly in the browser, and also is correct 
iso-8859-1 when saved to disk and examined with the editor.
(In other words, what previously were ? characters, are now the correct 
iso-8859-1 character bytes).

Now my question is :
How can it matter which LC_CTYPE Tomcat is started under, that would have 
the result above ?
The behaviour above is consistent across different hosts, across the same 
or different Tomcat versions, it is always the same webapp, always the 
same html pages, always the same browser, etc.  Only that LC_CTYPE line 
changes the behaviour.
On the face of it, the only thing I can think of that would explain this, 
is that the webapp in question does something wrong, but what exactly 
could it be doing ?

Any ideas ?

Thanks in advance,
André

Andre see this link, about halfway down...
http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale
They talking Solaris, which on the default C locale is Ascii...
When they do what you doing... more or less...  it becomes ISO...

So if there is a Java locale function in that web app... one minute its 
working with ascii, the next with ISO...

The page encoding has been hardcoded by the coder to always ISO...
Its the Java locale in a web app... I think...

Look at the classes in an IDE, or search it...
java.util.Locale
is hiding in your web-app ;)... I think

Thanks... theres the gotcha I was worried about... and you still talking 
english ;)

Does it mean you cant run linux headless?... I wonder...
For fun... make your linux box Japanese... I think the web app will really 
start having fun

... no foreign administrators for you ;)

I dont believe at all its Tomcat... its client side Java sitting in 
servers... gotcha..
The coders broke their own application... all by themselves... admin guys 
have now got the headache...

---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL

Re: Migrating to tomcat 6 gives formatted currency amounts problem


Konstantin Kolinko wrote:

2008/9/12 André Warnier [EMAIL PROTECTED]


Konstantin Kolinko wrote:


2008/9/12 André Warnier [EMAIL PROTECTED]:


Caldarale, Charles R wrote:


I'm not sure these days what the normal web character set really is.
 If
you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling
symbol is not present.  However, for any of the ISO-8859-x variants, it
is
present, using the 163 (0xA3) value you noted (same as the Unicode code
point).  It's also in UTF-8 of course, but requires two bytes (0xC2
0xA3) to
represent the code point.

 I love these discussions about character sets. They seem to confuse so

many
people; even I, who have been involved in them for 30 years...

Anyway, I have a related question, which I don't think constitutes a
hijack
of this thread, because the underlying cause is probably similar.
Here it goes :

Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x)
The above Tomcat's running under the same Linux or Solaris, essentially
set
up the same way. The JVM may vary, but I don't think that is the problem,
because of the consistency of the problem as explained below.
I am running a webapp from an external supplier, always the same binary
version.  I don't have the code, can't see what's in it.
The pages served by that webapp are the same html pages, all of them
having
a declaration meta http-equiv=Content-Type content=text/html;
charset=iso-8859-1.
The pages also *are* properly encoded as iso-8859-1 (100% positive, I
know
the difference).
The browser receiving the pages is always the same one, same settings.

Now,

case a)
in the Tomcat startup files, I do nothing, meaning I just take Tomcat
out-of-the-box and run the webapp.
Result : in any such html page that contains characters with an ISO-8859
codepoint above \xA0 (meaning the displayable characters of the high
part
of the table, where one finds things like uppercase A with umlaut),
these
characters
 - appear in the browser display as ? (minus the quotes)
 - also if I save the page from the browser to disk, and look at them
with
an iso-8859-1 capable editor, they are effectively ?.
(So it's not the browser misunderstanding them, it is Tomcat sending them
that way).

case b)
In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or
even
in /etc/init.d/tomcat5.5), I add the following line
LC_CTYPE=en_us.iso88591
(or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE)
(before the actual start of Tomcat)
and restart Tomcat
then the same page displays properly in the browser, and also is correct
iso-8859-1 when saved to disk and examined with the editor.
(In other words, what previously were ? characters, are now the correct
iso-8859-1 character bytes).

Now my question is :
How can it matter which LC_CTYPE Tomcat is started under, that would have
the result above ?
The behaviour above is consistent across different hosts, across the same
or
different Tomcat versions, it is always the same webapp, always the same
html pages, always the same browser, etc.  Only that LC_CTYPE line
changes
the behaviour.
On the face of it, the only thing I can think of that would explain this,
is
that the webapp in question does something wrong, but what exactly could
it
be doing ?
Any ideas ?



It is [EMAIL PROTECTED] pageEncoding=... % that is missing from those pages.
Thus JSP compiler does not know what encoding they are using for their
source and messes them at compilation time.


[...]

But these pages, as far as Tomcat and the webapp are concerned, are not
dynamic


in any way.  They are straight static html pages.

So is the JSP stuff relevant ?
(I'm genuinely asking, since I know nothing about JSP pages)



The static HTML pages, as well as all the other static files, are served by
the
DefaultServlet. You should dig there. I think that fileEncoding
initialization parameter
of the servlet, as well as mime-mapping settings in web.xml come into
play.

JSP settings are irrelevant for them, of course.



Hi.
Thanks for the intent and answer above.
But I insist : these html pages are served by that webapp of which I am 
talking, not by the DefaultServlet.

Those pages are being accessed via URLs like
http://myhost.mycompany.com/myservlet?..(additional parameters 
indicating which static file to serve)..
It is on the way through that servlet that they get corrupted, unless 
I start Tomcat with LC_CTYPE=iso-8859-1.
That servlet, in its own web.xml config file in 
tomcat_dir/webapps/myservlet/WEB-INF/web.xml, has no fileEncoding nor 
mime-mapping section nor parameter.


So my question remains, I think : what could be going on in that servlet 
so that :
- if LC_CTYPE is not set in the environment *of Tomcat* when it starts, 
the upper iso-8859-1 characters in the pages are replaced by ?
- if LC_CTYPE is set to iso-8859-1 in the Tomcat environment when it 
starts, then the pages delivered by the servlet are correct

?

I am not very qualified in Java, but could it be something like :
- the

RE: Migrating to tomcat 6 gives formatted currency amounts problem

2008-09-12 Thread Antonio Vidal Ferrer

Hi,

Have you checked the configuration for this catalina opts?:

-Duser.language=es
-Duser.country=ES

Check that they are the same in both tomcats. (In this case, for instance,
is configured for Spanish-Spain)

Good Luck

Best,

Toni

-Original Message-
From: André Warnier [mailto:[EMAIL PROTECTED] 
Sent: viernes, 12 de septiembre de 2008 16:58
To: Tomcat Users List
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

Konstantin Kolinko wrote:
 2008/9/12 André Warnier [EMAIL PROTECTED]
 
 Konstantin Kolinko wrote:

 2008/9/12 André Warnier [EMAIL PROTECTED]:

 Caldarale, Charles R wrote:

 I'm not sure these days what the normal web character set really is.
  If
 you're referring to ASCII (aka Basic Latin), then no, the Pound
Sterling
 symbol is not present.  However, for any of the ISO-8859-x variants,
it
 is
 present, using the 163 (0xA3) value you noted (same as the Unicode
code
 point).  It's also in UTF-8 of course, but requires two bytes (0xC2
 0xA3) to
 represent the code point.

  I love these discussions about character sets. They seem to confuse
so
 many
 people; even I, who have been involved in them for 30 years...

 Anyway, I have a related question, which I don't think constitutes a
 hijack
 of this thread, because the underlying cause is probably similar.
 Here it goes :

 Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x)
 The above Tomcat's running under the same Linux or Solaris, essentially
 set
 up the same way. The JVM may vary, but I don't think that is the
problem,
 because of the consistency of the problem as explained below.
 I am running a webapp from an external supplier, always the same binary
 version.  I don't have the code, can't see what's in it.
 The pages served by that webapp are the same html pages, all of them
 having
 a declaration meta http-equiv=Content-Type content=text/html;
 charset=iso-8859-1.
 The pages also *are* properly encoded as iso-8859-1 (100% positive, I
 know
 the difference).
 The browser receiving the pages is always the same one, same settings.

 Now,

 case a)
 in the Tomcat startup files, I do nothing, meaning I just take Tomcat
 out-of-the-box and run the webapp.
 Result : in any such html page that contains characters with an
ISO-8859
 codepoint above \xA0 (meaning the displayable characters of the high
 part
 of the table, where one finds things like uppercase A with umlaut),
 these
 characters
  - appear in the browser display as ? (minus the quotes)
  - also if I save the page from the browser to disk, and look at them
 with
 an iso-8859-1 capable editor, they are effectively ?.
 (So it's not the browser misunderstanding them, it is Tomcat sending
them
 that way).

 case b)
 In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or
 even
 in /etc/init.d/tomcat5.5), I add the following line
 LC_CTYPE=en_us.iso88591
 (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE)
 (before the actual start of Tomcat)
 and restart Tomcat
 then the same page displays properly in the browser, and also is
correct
 iso-8859-1 when saved to disk and examined with the editor.
 (In other words, what previously were ? characters, are now the
correct
 iso-8859-1 character bytes).

 Now my question is :
 How can it matter which LC_CTYPE Tomcat is started under, that would
have
 the result above ?
 The behaviour above is consistent across different hosts, across the
same
 or
 different Tomcat versions, it is always the same webapp, always the
same
 html pages, always the same browser, etc.  Only that LC_CTYPE line
 changes
 the behaviour.
 On the face of it, the only thing I can think of that would explain
this,
 is
 that the webapp in question does something wrong, but what exactly
could
 it
 be doing ?
 Any ideas ?


 It is [EMAIL PROTECTED] pageEncoding=... % that is missing from those 
 pages.
 Thus JSP compiler does not know what encoding they are using for their
 source and messes them at compilation time.

 [...]

 But these pages, as far as Tomcat and the webapp are concerned, are not
 dynamic

 in any way.  They are straight static html pages.
 So is the JSP stuff relevant ?
 (I'm genuinely asking, since I know nothing about JSP pages)


 The static HTML pages, as well as all the other static files, are served
by
 the
 DefaultServlet. You should dig there. I think that fileEncoding
 initialization parameter
 of the servlet, as well as mime-mapping settings in web.xml come into
 play.
 
 JSP settings are irrelevant for them, of course.
 

Hi.
Thanks for the intent and answer above.
But I insist : these html pages are served by that webapp of which I am 
talking, not by the DefaultServlet.
Those pages are being accessed via URLs like
http://myhost.mycompany.com/myservlet?..(additional parameters 
indicating which static file to serve)..
It is on the way through that servlet that they get corrupted, unless 
I start Tomcat with LC_CTYPE=iso-8859-1.
That servlet, in its own web.xml config

RE: Migrating to tomcat 6 gives formatted currency amounts problem

 From: André Warnier [mailto:[EMAIL PROTECTED]
 Subject: Re: Migrating to tomcat 6 gives formatted currency
 amounts problem

 - the servlet reads those documents with some InputStream,
 without specifying a character set or encoding, and by
 default that  means to use Tomcat's idea of its default
 LC_CTYPE for those InputStreams ?

Essentially correct, if you substitute JVM for Tomcat in the above.  Input 
and output are done via byte streams, converted to and from Unicode based on 
the specified character encoding.  When that's not specified (via Connector 
attribute or HTTP header), the JVM uses a default encoding.  To determine the 
default, JVM initialization looks at various system properties if they exist, 
and then certain environment variables.  (The exact ones are platform 
dependent.)

Consequently, setting LC_CTYPE (or equivalent) prior to starting up Tomcat can 
have a dramatic effect on the interpretation of both input and output, as you 
have discovered.

Look at the API doc for java.io.InputStreamReader and 
java.io.OutputStreamWriter for examples of character set encoding usage.

 - Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Migrating to tomcat 6 gives formatted currency amounts problem

 From: Caldarale, Charles R
 Subject: RE: Migrating to tomcat 6 gives formatted currency
 amounts problem

 Consequently, setting LC_CTYPE (or equivalent) prior to
 starting up Tomcat can have a dramatic effect on the
 interpretation of both input and output, as you have discovered.

Also, as Johnny K stated, this should not be left up to the sys admin.  It 
really is the app writers' job to explicitly specify the encoding for both 
input and output, rather than leaving them up to the whims of the platform and 
browser.  Unfortunately, many developers design with blinders on, and never 
think about where the app might be deployed or accessed from.

 - Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Migrating to tomcat 6 gives formatted currency amounts problem

 From: Johnny Kewl [mailto:[EMAIL PROTECTED]
 Subject: Re: Migrating to tomcat 6 gives formatted currency
 amounts problem

 If this locale stuff is in fact defaulting to an ISO char set
 that can do these symbols...

There's the basic problem - anytime you allow defaults to come into play you 
put yourself at risk.

 and say you where making a non english page, say
 Japanese... do you think that its possible to use it?

Certainly, and you should use it - but with the desired Locale specified, not 
using whatever the default happens to be at that instant.

 they using the getCurrencyInstance to make the currency symbols.

But, if you want a specific currency symbol (e.g., Yen, Pound Sterling), the 
Locale should be explicitly provided on the API call; only if you want to use 
the platform's default should the getCurrencyInstance() without an argument be 
used.

 But I'm thinking its a US/Eng only methodology...

Nope, it's universal.  Java supports a seemingly infinite number of locales.

 When you say If I override that with say ISO-8859-15,
 is that the whole page you talking about

Yes, I was setting the browser to use a fixed encoding rather than the one in 
the HTTP header or the browser default.

 it possible to have  different character encoding sections
 in a web page

I don't know HTML well enough to completely answer that question, but I believe 
HTTP uses the last character set header specified, and all HTTP headers must 
precede the HTML.  You should be able to achieve the desired effect with 
frames.  However, if you just use UTF-8, you don't need to worry about, since 
that includes every code point in the known universe.

 if I do look at that test page in a MS tool...
 it displays correctly with mixed encodings?

MS cheats at every opportunity, seemingly avoiding standards whenever they can. 
 IE likes to guess at the intent of the web page, sometimes getting it right, 
often getting it horribly wrong.

 But when you choose a font in a text editor like Swing or
 Word, you are also picking some character set...

Nope - most editors do not let you choose the character encoding, they just use 
the platform default.  Some do let you choose a UTF-x flavor in lieu of the 
platform default, which is quite desirable.  Some fonts (e.g., Wingdings) 
redefine the glyphs for given code points in order to display oddball symbols 
within a non-Unicode encoding; these were pretty much all developed before 
Unicode came into widespread use, but are still around for compatibility.

 - Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Johnny,

Johnny Kewl wrote:
 If this locale stuff is in fact defaulting to an ISO char set that can
 do these symbols... and say you where making a non english page, say
 Japanese... do you think that its possible to use it?

It is up to your browser to choose a font that is appropriate for all
glyphs (that is, a graphical representation of a code point) that need
to be displayed. Some fonts do not support all codepoints because they
don't have all the glyphs. For instance, if you have a string in English
and also Sanskrit, your browser is likely to display one string in one
font (maybe Arial) and the other in another font (say, Sanskrit).

Let's say that the browser comes across the pound; entity. pound; maps
directly to 8-bit hex character code 0xa3
(http://htmlhelp.com/reference/html40/entities/latin1.html). Whether you
put pound; or £ in your HTML, the browser should render it properly --
possibly switching fonts to one that supports that code point for that
character only.

The problem with your page is not that the £ symbol is not available in
the font the browser chose. Your problem is that you illegally encoded
it into the page in the first place (or, equivalently, you advertise the
wrong encoding for the page, which is really the same thing).

If you re-write your page to declare some font around that symbol, you
will never be able to get it to work, unless you use the browser to
override the server-declared encoding (as Chuck did, when things render
properly when using ISO-8859-1).

 I've actually now seen examples on the web that are doing it Wil's way,
 they using the getCurrencyInstance to make the currency symbols.

Use of Java's built-in currency-symbol-generating methods are likely to
produce a proper £ symbol. If you have your encoding chain set up
properly, it should go from NumberFormat.format() straight to your web
page without a hint of difficulty.

 But I'm thinking its a US/Eng only methodology... when applied to a web
 page.
 Do you think using getCurrencyInstance is generalizable in other languages?

Absolutely. The only reason $ is a magic symbol is because it's part of
US-ASCII and low enough in the symbol table so that it never gets
screwed up by incorrect encodings. Symbols like £ or € do not share that
luxury and are therefore error-prone when administrators poorly
configure their servers. It's further compounded by the fact that many
English-specking coders forget that there are other people in the world. :(

 When you say If I override that with say ISO-8859-15, is that the
 whole page you talking about, or it possible to have different character
 encoding sections in a web page thats another area thats confusing
 me now, because if I do look at that test page in a MS tool... it
 displays correctly with mixed encodings?

The encoding is for the entire document, not just a single character.
basically, you sent an illegal character code. It would be like sending
6 bits of an 8-bit byte. In fact, that's /exactly/ what you did because,
to a UTF-8 renderer, your set of 8 bits looks like there should be
something else /before/ it in order to make it legal. Your server said
hey, client... I'm gonna send you a bunch of oranges and then went
right ahead and sent apples mixed-in with those oranges.

 But when you choose a font in a text editor like Swing or Word, you are
 also picking some character set... and thats whats been injected into
 the page as its been formed...

Yes and no. Many encodings are limited by a particular character set
(for instance, US-ASCII is never going to have Sanskrit letters in it).
But that'd why Unicode was invented: to make sure that anything we'd
ever possibly want to show on the screen is possible because we have
enough bits to display it. (My understanding is that Unicode (16-bit) is
actually not big enough for everything, but hey, they tried). The beauty
of UTF-8 is that every character you'd want to display has its own code
that nobody can steal -- regardless of the font being used.

The lesson is to always use UTF-8 and make sure you actually have
everything working properly. If your server is saying utf-8 but the
character encoding on your servlet Writer is actually ISO-8859-1 then
you haven't done your job and your web pages are going to look broken
when non-latin characters are thrown in there. The same is true if you
are serving static content (as I suspect you are in your example) and
advertising that it is utf-8 but the file was written with ISO-8859-1
(or something else). (In your case, the problem is that text files
contain no explicit encoding information in them, so the server has to
guess -- or, more likely, there's no guessing going on, and the server
just blindly uses whatever its default has been configured to be.)

 I screw up terminology... ok we all know that but
 Does Wil need to worry about the way he is doing it?... thats all I'm
 asking... I think so...

The short

Re: Migrating to tomcat 6 gives formatted currency amounts problem

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Chuck,

Caldarale, Charles R wrote:
 From: Johnny Kewl [mailto:[EMAIL PROTECTED] Subject: Re:
 Migrating to tomcat 6 gives formatted currency amounts problem
 
 if I do look at that test page in a MS tool... it displays
 correctly with mixed encodings?
 
 MS cheats at every opportunity, seemingly avoiding standards whenever
 they can.  IE likes to guess at the intent of the web page, sometimes
 getting it right, often getting it horribly wrong.

Yes, they do. MS, contrary to W3 specifications, sniffs the content of a
page and chooses the encoding and ignores any server-specified encoding.
It also does this with MIME types. (Sorry, can't find the reference
right now). Real web browsers do not behave in this way, so you
shouldn't base your conclusions on the behavior of MSIE.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKmzYACgkQ9CaO5/Lv0PBgEACfbFlp6HuBiTd93kGzrtOOVRhV
G4AAn2zaU1HGZA9isoewMQ3J5TZMsPjF
=E83R
-END PGP SIGNATURE-

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Migrating to tomcat 6 gives formatted currency amounts problem

 From: Christopher Schultz [mailto:[EMAIL PROTECTED]
 Subject: Re: Migrating to tomcat 6 gives formatted currency
 amounts problem

 (My understanding is that Unicode (16-bit) is actually not
 big enough for everything, but hey, they tried).

Point of clarification: Unicode is NOT limited to 16 bits (not even in Java, 
these days).  There are defined code points that use 32 bits, and I don't think 
there's a limit, if you use the defined extension mechanisms.  Again, browsing 
the Unicode web site is extremely enlightening.

 Unless the browser sucks. ;)

Let me guess which browser that is; does it start with an I?

 - Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

André Warnier wrote:
 The pages served by that webapp are the same html pages, all of them
 having a declaration meta http-equiv=Content-Type content=text/html;
 charset=iso-8859-1.

Note that using META tags to set character sets is a bit dangerous.
You're telling the client to ignore the character set indicated by the
server which was (probably) responsible for encoding the document in the
first place. For static documents, where the server doesn't know any
better, and is probably sending binary data and doing no interpretation
or encoding of any kind, it's probably okay.

 The pages also *are* properly encoded as iso-8859-1 (100% positive, I
 know the difference).

So, for instance, the British pound symbol in your source documents
(read using an ISO-8859-1-configured viewer) looks correct?

 The browser receiving the pages is always the same one, same settings.

Did you check the md5sum of that page on both the client and the server?
I suspect they are actually different.

- -chris

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKnOAACgkQ9CaO5/Lv0PBbBQCguAzYccOcY1sCgTbsxlXi5Lq5
SfQAn0HMhCIjmL5VENVqvOkwi1G73pI8
=FCfS
-END PGP SIGNATURE-

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

André Warnier wrote:
 It is on the way through that servlet that they get corrupted, unless
 I start Tomcat with LC_CTYPE=iso-8859-1.

What do the HTTP headers say when the file is served correctly versus
when it is not? I suspect that the encoding is either set incorrectly or
not set at all unless you specify LC_CTYPE.

 So my question remains, I think : what could be going on in that servlet
 so that :
 - if LC_CTYPE is not set in the environment *of Tomcat* when it starts,
 the upper iso-8859-1 characters in the pages are replaced by ?
 - if LC_CTYPE is set to iso-8859-1 in the Tomcat environment when it
 starts, then the pages delivered by the servlet are correct
 ?

My guess is that the magic servlet here is using the platform's default
encoding in the HTTP headers, which may be incorrect for the static file
in question.

 I am not very qualified in Java, but could it be something like :
 - the servlet reads those documents with some InputStream, without
 specifying a character set or encoding

Note that InputStreams are encoding-less. Sounds like semantics, but
encodings only come into play with you are dealing with
character-oriented streams which, in Java, are called Readers and
Writers. Note that neither InputStream nor OutputStream have any methods
that deal with the char data type.

 and by default that means to use
 Tomcat's idea of its default LC_CTYPE for those InputStreams ?
 - or the servlet outputs the document via an OutputStream without
 specifying an encoding etc..

I'll bet a binary stream of data is being sent (that is, with no
interpretation or encoding) and that the JVM's default encoding is being
advertised by the server in the HTTP headers. That would certainly cause
the problem.

I've found that the default encoding on my Linux box is something I've
never heard of before: file.encoding=ANSI_X3.4-1968. Since I have my
server configured properly (and don't really serve much in the way of
static content), the platform's default encoding doesn't matter: my
preferred encoding (UTF-8) is always used.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKntcACgkQ9CaO5/Lv0PAjWACgquvyCh3SDJdqBxPPx3+zOwQ4
z3QAoKL8C5k0ZI3B6Hl4GyuDcZrcnrRf
=HPFJ
-END PGP SIGNATURE-

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Migrating to tomcat 6 gives formatted currency amounts problem

 From: Johnny Kewl [mailto:[EMAIL PROTECTED]
 Subject: Re: Migrating to tomcat 6 gives formatted currency
 amounts problem

 Does it mean you cant run linux headless?...

Of course you can (think about blade servers).

Now you're confusing graphical display with encoding.  The term headless is 
concerned with the ability to display graphical information, not render it.  
JVMs running in headless mode can render glyphs, graphs, or what have you, but 
must send the resulting bit maps to some graphics server to have it displayed 
(it can also be saved in files if needed).

 - Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Chuck,

Caldarale, Charles R wrote:
 From: Christopher Schultz [mailto:[EMAIL PROTECTED] 
 Subject: Re: Migrating to tomcat 6 gives formatted currency amounts
 problem

 (My understanding is that Unicode (16-bit) is actually not big
 enough for everything, but hey, they tried).

 Point of clarification: Unicode is NOT limited to 16 bits (not even
 in Java, these days).

Sorry, I was trying to say 16-bit Unicode without saying UTF-16 (which
is not the same).

And regarding Java... the 'char' data type is /defined/ to be 16-bits
wide
(http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1).
Has this changed? When? (And how!?)

I always thought it was weird for Java to use 16-bit Unicode internally,
but then use UTF-8 for all serialized strings. I guess that's what you
get when you try to minimize file sizes and download times.

 There are defined code points that use 32
 bits, and I don't think there's a limit, if you use the defined
 extension mechanisms.  Again, browsing the Unicode web site is
 extremely enlightening.

 Unless the browser sucks. ;)

 Let me guess which browser that is; does it start with an I?

I usually spell it with an 'M'. ;)

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKomMACgkQ9CaO5/Lv0PC1OQCeP8FkNni/J320StYPF4lNeQWi
o84AnReYYyjaF+ljUub4wJ2HSkcOA3Jk
=JJir
-END PGP SIGNATURE-

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Johnny,

Johnny Kewl wrote:
 Use this function
 
 System.out.print(CharSet :  + Charset.defaultCharset().toString());
 
 and thats what you HAVE TO set your page at
 
 On my system it tells me its. windows-1252

I think you're still missing something: the file on the disk has an
implicit file encoding that is not advertised in any way. This is the
core of the problem.

If all text files said hey, I'm encoded in UTF-8 or I'm in
ISO-8859-1 or This file is WINDOWS-1252, then there would be no
problem: all code would use the native encoding of the file as the
encoding of the HTTP response, and the file would be streamed as binary
without changing a single bit in the stream.

Unfortunately, this is better known as explicit encoding and basically
doesn't exist (except in some UTF-encoded files). Since the server
doesn't know the file's original encoding, it /can never make a sensible
decision about the output encoding/. It's simply not possible.

It has nothing to do with your OS, of your filesystem, or your per-user
locale preferences, installed fonts, etc. It has to do with the fact
that the file has no explicit encoding and the server can use. (This is
what gives rise to the MSIE practice of sniffing the document content
regardless of the server's assertion as to the character encoding).

 ... it a headache... rather refactor your code so the pages are all the
 same charset of your choosing and work with pound, yen dollar

This is always a sensible way to go. If you stick to pages that always
use US-ASCII or anything compatible with it (generally ISO-8859-*, I
think), you'll be good to go.

A much better way to go is to always use properties files for text that
will be displayed on web pages. It's the right thing to do from a
localization perspective (yes, you can have separate pages for each
language, but that's no fun), AND the encoding for Java properties files
is DEFINED TO BE ISO-8859-1, no matter what you want to put in there. In
this case, there /is/ an explicit character encoding, and it's
predictable. Of course, Java coders can always bone the creation of
these files...

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKpQoACgkQ9CaO5/Lv0PDW4ACdEHqsgCK2IrHF1Bl6cz40Wben
liYAn00FVbmPpVAl35Zh6nDd1Q5Cxh/d
=4lJ4
-END PGP SIGNATURE-

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Migrating to tomcat 6 gives formatted currency amounts problem

 From: Christopher Schultz [mailto:[EMAIL PROTECTED]
 Subject: Re: Migrating to tomcat 6 gives formatted currency
 amounts problem

 the 'char' data type is /defined/ to be 16-bits wide
 (http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1).
 Has this changed? When? (And how!?)

A char is still 16 bits, but you can now have 21-bit code points:
http://java.sun.com/javase/6/docs/api/java/lang/Character.html#unicode

These are manipulated via the int type, rather than char.

 I always thought it was weird for Java to use 16-bit Unicode
 internally

Back when Java was being defined, Unicode still was 16-bit, but not in 
widespread use.

 but then use UTF-8 for all serialized strings

Mostly for easy interoperation with existing editors, comm handlers, browsers, 
etc., which were all byte oriented and, at the time, still largely ASCII.  The 
day-one existence of character encoders in Java permitted use in non-ASCII 
environments.

 - Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem


Christopher Schultz wrote:
[...]



Yes, they do. MS, contrary to W3 specifications, sniffs the content of a
page and chooses the encoding and ignores any server-specified encoding.
It also does this with MIME types. (Sorry, can't find the reference
right now).

[...]

Here is a start, sympathetic to Microsoft :
http://blogs.msdn.com/ie/archive/2005/02/01/364581.aspx

And here is another relevant MS technical document (not for the faint of 
heart) :

http://msdn.microsoft.com/en-us/library/ms775147.aspx

On the other hand, the HTTP 1.1 RFC section 7.2.1
http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7.2.1
says :
quote
Any HTTP/1.1 message containing an entity-body SHOULD include a 
Content-Type header field defining the media type of that body. If and 
only if the media type is not given by a Content-Type field, the 
recipient MAY attempt to guess the media type via inspection of its 
content and/or the name extension(s) of the URI used to identify the 
resource. If the media type remains unknown, the recipient SHOULD treat 
it as type application/octet-stream.

unquote
(notice the *if and only if* the media type is not given..)

In other words, IE's content sniffing is in clear violation of the HTTP 
1.1 RFC, 99% of the time.


On the other hand, I once read a justification by one of the Microsoft 
developers (as I recall that one was related to their implementation of 
DAV, or Web Folders), which essentiually said this : there are 
hundreds of millions of Windows (and IE) users, and most of them are 
*not* developers. So, although we are ourselves developers and we would 
very much like to adhere to the standards, our marketing people just 
won't let us, if it risks inconveniencing several hundred million 
average Windows users (and Microsoft customers), just to please the tiny 
minority of several hundred thousand developers.


I think it's an argument, even a relatively democratic one ...

I also personally believe that if the Microsoft developers had not 
started down the path a long time ago to believe that they could be 
smarter than everyone else and could outguess webservers, and instead 
had respected the HTTP RFC and just been more careful about which 
documents IE opens (or worse, runs), they would have saved Microsoft and 
the world countless bugs, countless viri and countless unproductive 
hours of web-developer's forced work-arounds.


What I do not however understand is, considering the flak that each IE 
bug or security advisory generates, why MS have never decided to create 
and market another parallel browser (or maybe just one checkbox in the 
regular IE), that would make it RFC-compliant.  This way users could 
just choose to either use a browser that is RFC-compliant and boring and 
safe(r), or else enjoy all the gimmicks but risk the consequences.
But hey, I also do not know in how many viri-scanning companies MS owns 
shares..



-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem



Nope - most editors do not let you choose the character encoding, they just 
use the platform default.  Some do let you choose a UTF-x flavor in lieu of 
the platform default, which is quite desirable.  Some fonts (e.g., 
Wingdings) redefine the glyphs for given code points in order to display 
oddball symbols within a non-Unicode encoding; these were pretty much all 
developed before Unicode came into widespread use, but are still around for 
compatibility.


You know your stuff Chuck ;)

Wonder if Wil knew he asked such a damn big question... ha ha

Ok... some more homework on this thing...

Servlet Response does in fact have a setLocale(Locale loc) function...
Which seems to indicate that if headers or something like
response.setContentType(text/html;charset=UTF-8);
is *not* used... TC will take on the encoding(ha ha did it again) charset of 
that locale...


I find thinking outside of HTTP headers difficult... and it seems that 
servlet spec has recognized the conflict inherent in locale and http header.
It seems that prior to Servlet spec 2.4 if a coder used locale dependent 
JSTL to access resource bundles... that would in fact override
setContentType this apparently is no longer the case... the header takes 
pref...


So André thats what you could well be seeing in your application because 
the charset would follow the locale and that would be whatever

the JRE wants to give you...

ie the coder didnt even have to explicitly use a locale function a JSTL call 
using a resource bundle will do it...


Its seems they are trying to bring locale technology that one applies in 
Swing without too much thought and web technology a little closer...

Still lots of places to get caught it seems...
I think you just got to put on a different hat when doing Swing and Web 
internationalization... different animals, with just enough commonality to 
cause pain ;)


---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---








-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Johnny,

Johnny Kewl wrote:
 Servlet Response does in fact have a setLocale(Locale loc) function...
 Which seems to indicate that if headers or something like
 response.setContentType(text/html;charset=UTF-8);
 is *not* used... TC will take on the encoding(ha ha did it again)
 charset of that locale...

Nope! Locale != charset. Locale does not even hint of a /preferred/ charset.

 I find thinking outside of HTTP headers difficult... and it seems that
 servlet spec has recognized the conflict inherent in locale and http
 header.
 It seems that prior to Servlet spec 2.4 if a coder used locale dependent
 JSTL to access resource bundles... that would in fact override
 setContentType this apparently is no longer the case... the header
 takes pref...

Well, the header comes from the encoding set on the response, so it
should all be the same.

 I think you just got to put on a different hat when doing Swing and Web
 internationalization...

You shouldn't have to. The only difference is the character encoding for
the requests and responses. The use of the Java API should be identical.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKyHcACgkQ9CaO5/Lv0PDxDQCfazFHZjh/amrJBOkauDCFmwN0
rQoAoLYmA3A8Y6hbhaMN3dNeJckoy2YV
=4bXQ
-END PGP SIGNATURE-

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

2008-09-12 Thread Willem Moors

On Fri, Sep 12, 2008 at 9:26 PM, Johnny Kewl [EMAIL PROTECTED] wrote:

 Wonder if Wil knew he asked such a damn big question... ha ha


I'm really amazed at the volume of mails my question has raised.
I can only see one solution to this complexity: let's all (everybody in the
whole world) speak the same language, use the same currency and move into
one and the same timezone (the latter because of past fun with timezones)!

Willem

Re: Migrating to tomcat 6 gives formatted currency amounts problem

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Chuck,

Caldarale, Charles R wrote:
 From: Christopher Schultz [mailto:[EMAIL PROTECTED] 
 Subject: Re: Migrating to tomcat 6 gives formatted currency amounts
 problem

 the 'char' data type is /defined/ to be 16-bits wide 
 (http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1).
  Has this changed? When? (And how!?)

 A char is still 16 bits, but you can now have 21-bit code points: 
 http://java.sun.com/javase/6/docs/api/java/lang/Character.html#unicode

 These are manipulated via the int type, rather than char.

Interesting... so, Java is still 16-bit Unicode in its char primitive,
but you can use ints to hold UTF-16 values using 21-bits? Wo, that's
confusing... especially since java.lang.Character only takes a char as a
constructor parameter :(

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKygAACgkQ9CaO5/Lv0PB5lgCfSaUnFHFx+OaL87mPtCsGcTOd
pkwAn0ob9OTMfrGCXk4udHyKg627Fd2k
=XWif
-END PGP SIGNATURE-

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Willem,

Willem Moors wrote:
 I can only see one solution to this complexity: let's all (everybody in the
 whole world) speak the same language, use the same currency and move into
 one and the same timezone (the latter because of past fun with timezones)!

You're not far off, except that you probably mean we should all speak
one human language (like English or Farsi or whatever). I agree, but
only if you mean we should all speak the same character language. It
should be UTF-8.

All hail UTF-8!

Seriously, switch to UTF-8.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKyuwACgkQ9CaO5/Lv0PCqFQCbB/9xp+ELXOONuWn7lQvo5hd8
jasAnjtoDUrn3d1kVoFjCcvLmg2R3KI2
=0DqD
-END PGP SIGNATURE-

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

Caldarale, Charles R wrote:

From: Christopher Schultz [mailto:[EMAIL PROTECTED]
Subject: Re: Migrating to tomcat 6 gives formatted currency
amounts problem

(My understanding is that Unicode (16-bit) is actually not
big enough for everything, but hey, they tried).

Point of clarification: Unicode is NOT limited to 16 bits (not even in Java, 
these days).  There are defined code points that use 32 bits, and I don't think 
there's a limit, if you use the defined extension mechanisms.  Again, browsing 
the Unicode web site is extremely enlightening.

Further clarification :
Unicode is not limited to anything.  Unicode is (aims to be) a list 
which attributes to any distinct character known to man, a number, from 
0 to infinity. The particular position number given to a particular 
character in this Unicode list is known as its Unicode codepoint.
The Unicode group (consortium ?) also tries to do this with some order, 
such as trying to keep together (with consecutive codepoints) various 
groups of characters that are logically related in some way.
For example (but probably because they had to start somewhere), the 
first 128 codepoints match the original 7-bit US-ASCII alphabet;
so for instance the capital letter A, which has code \x41 in US-ASCII, 
happens to have Unicode codepoint \x0041 (both 65 in decimal terms).
For example also, the same first 128 codepoints, plus the next 128 
codepoints, match the iso-8859-1 alphabet (also known as iso-latin-1); 
thus the character known as capital letter A with umlaut (an A with a 
double-dot on top) has the codepoint \x00C4 in Unicode, and the code 
\xC4 in iso-8859-1 (both 196 in decimal).

New Unicode characters (and codepoints) are being added all the time (I 
think there's even Klingon in there), but there are also holes in the 
list (presumably left for whenever some forgotten related character 
shows up).

A quite different issue is encoding.

Because it would be quite impractical to specify a series of characters 
just by writing their codepoints one after the other (using whatever 
number of bits each codepoint needs), a series of clever schemes have 
been devised in order to pass Unicode strings around, while being able 
to separate them into characters, and keep each one with its proper 
codepoint.
Such schemes are known as Unicode encodings with names such as UTF-2, 
UTF-7, UTF-8, UTF-16, UTF-32, etc..
Each one of them specifies an algorithm whereby one can take any Unicode 
character (or rather, its codepoint), and encode it into a series of 
bits, in such a way that at the receiving end, an opposite algorithm can 
be used to decode that series of bits and retrieve once again the same 
series of Unicode codepoints (or characters).

UTF-16, for example, is an encoding of Unicode which uses always 16 bits 
for each Unicode codepoint; but it is to my knowledge incomplete, 
because since it uses a fixed number of 16 bit per character, it can 
thus only ever represent no more than the first 65,532 Unicode 
characters. (But we're not there yet, and there is still some leeway).

UTF-8 on the other hand is a variable-length scheme, using 1, 2, 3, or 
more 8-bit groups to represent each Unicode codepoint.  And it is in 
principle not limited, as there are extension mechanisms foreseen for 
whenever the need arises (imagine that some aliens suddenly show up, and 
that they happen to write in 167 different languages and alphabets).

One frequent misconception is that in UTF-8, the first 256 character 
encoding bit sequences match the iso-8859-1 codepoints.
Only the first 128 characters of iso-8859-1 (which happen to match the 
128 characters of US-ASCII and the first 128 Unicode codepoints), have a 
single-byte representation in UTF-8 which happens to match their Unicode 
codepoint.  The next 128 iso-8859-1 characters (which contain the 
capital A with umlaut) require 2 bytes each in the UTF-8 encoding.
Thus for instance, the capital letter A with umlaut has the Unicode 
codepoint \x00C4 (196 decimal), because is is the 197th character in the 
Unicode list (and the first one is \x).  It also happens to have the 
code \xC4 (196 decimal) in the iso-8859-1 table.
But in UTF-8, it is encoded as the two bytes \xC3\x84, which is not the 
decimal number 196 in any way.

All of that to say that when some people on this list say things like 
you should always decode your URLs as if they were Unicode (or UTF-8), 
because it is the same as ASCII or iso-latin-1 anyway, they are talking 
nonsense.  The only time you can do that is when the server and all the 
clients have agreed in advance that this is how they were going to 
encode and decode URLs.
(That we developers wish it were so, and that ultimately we may get 
there, is another matter.)

It is also talking nonsense to say that you should by default consider 
html pages as UTF-8 encoded.  The default character set (and encoding, 
because in that case both are the same) for html is iso-8859-1, and 
anything

Re: Migrating to tomcat 6 gives formatted currency amounts problem

Rectification to the clarification : what I say below about UTF-16 being 
always 16-bit and limited is also nonsense.  UTF-16 is variable-length, 
it can cover the entire Unicode character set.  It just uses a variable 
number of 16-bit words per character, as compared to UTF-8 which uses a 
variable number of 8-bit bytes.

I should have checked my sources. Shame on me.

About Java's internal char type being 16-bit wide though, I have heard 
that too, and I'm also curious.


André Warnier wrote:

Caldarale, Charles R wrote:

From: Christopher Schultz [mailto:[EMAIL PROTECTED]
Subject: Re: Migrating to tomcat 6 gives formatted currency
amounts problem

(My understanding is that Unicode (16-bit) is actually not
big enough for everything, but hey, they tried).


Point of clarification: Unicode is NOT limited to 16 bits (not even in 
Java, these days).  There are defined code points that use 32 bits, 
and I don't think there's a limit, if you use the defined extension 
mechanisms.  Again, browsing the Unicode web site is extremely 
enlightening.



Further clarification :
Unicode is not limited to anything.  Unicode is (aims to be) a list 
which attributes to any distinct character known to man, a number, from 
0 to infinity. The particular position number given to a particular 
character in this Unicode list is known as its Unicode codepoint.
The Unicode group (consortium ?) also tries to do this with some order, 
such as trying to keep together (with consecutive codepoints) various 
groups of characters that are logically related in some way.
For example (but probably because they had to start somewhere), the 
first 128 codepoints match the original 7-bit US-ASCII alphabet;
so for instance the capital letter A, which has code \x41 in US-ASCII, 
happens to have Unicode codepoint \x0041 (both 65 in decimal terms).
For example also, the same first 128 codepoints, plus the next 128 
codepoints, match the iso-8859-1 alphabet (also known as iso-latin-1); 
thus the character known as capital letter A with umlaut (an A with a 
double-dot on top) has the codepoint \x00C4 in Unicode, and the code 
\xC4 in iso-8859-1 (both 196 in decimal).


New Unicode characters (and codepoints) are being added all the time (I 
think there's even Klingon in there), but there are also holes in the 
list (presumably left for whenever some forgotten related character 
shows up).


A quite different issue is encoding.

Because it would be quite impractical to specify a series of characters 
just by writing their codepoints one after the other (using whatever 
number of bits each codepoint needs), a series of clever schemes have 
been devised in order to pass Unicode strings around, while being able 
to separate them into characters, and keep each one with its proper 
codepoint.
Such schemes are known as Unicode encodings with names such as UTF-2, 
UTF-7, UTF-8, UTF-16, UTF-32, etc..
Each one of them specifies an algorithm whereby one can take any Unicode 
character (or rather, its codepoint), and encode it into a series of 
bits, in such a way that at the receiving end, an opposite algorithm can 
be used to decode that series of bits and retrieve once again the same 
series of Unicode codepoints (or characters).


UTF-16, for example, is an encoding of Unicode which uses always 16 bits 
for each Unicode codepoint; but it is to my knowledge incomplete, 
because since it uses a fixed number of 16 bit per character, it can 
thus only ever represent no more than the first 65,532 Unicode 
characters. (But we're not there yet, and there is still some leeway).


UTF-8 on the other hand is a variable-length scheme, using 1, 2, 3, or 
more 8-bit groups to represent each Unicode codepoint.  And it is in 
principle not limited, as there are extension mechanisms foreseen for 
whenever the need arises (imagine that some aliens suddenly show up, and 
that they happen to write in 167 different languages and alphabets).


One frequent misconception is that in UTF-8, the first 256 character 
encoding bit sequences match the iso-8859-1 codepoints.
Only the first 128 characters of iso-8859-1 (which happen to match the 
128 characters of US-ASCII and the first 128 Unicode codepoints), have a 
single-byte representation in UTF-8 which happens to match their Unicode 
codepoint.  The next 128 iso-8859-1 characters (which contain the 
capital A with umlaut) require 2 bytes each in the UTF-8 encoding.
Thus for instance, the capital letter A with umlaut has the Unicode 
codepoint \x00C4 (196 decimal), because is is the 197th character in the 
Unicode list (and the first one is \x).  It also happens to have the 
code \xC4 (196 decimal) in the iso-8859-1 table.
But in UTF-8, it is encoded as the two bytes \xC3\x84, which is not the 
decimal number 196 in any way.



All of that to say that when some people on this list say things like 
you should always decode your URLs as if they were Unicode (or UTF-8), 
because it is the same as ASCII or iso-latin-1 anyway

Re: Migrating to tomcat 6 gives formatted currency amounts problem


Christopher Schultz wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Willem,

Willem Moors wrote:

I can only see one solution to this complexity: let's all (everybody in the
whole world) speak the same language, use the same currency and move into
one and the same timezone (the latter because of past fun with timezones)!


You're not far off, except that you probably mean we should all speak
one human language (like English or Farsi or whatever). I agree, but
only if you mean we should all speak the same character language. It
should be UTF-8.

All hail UTF-8!

Seriously, switch to UTF-8.

That reminds me of the old joke, about England deciding to switch from 
driving on the (wrong) left side of the road instead of the (correct) 
right side.  To minimise disruptions, they were going to do it in 
stages; the trucks first, the cars a week later.


Anyway, there is a flaw in the above suggestions, if taken together : if 
we all spoke and wrote the same language, there would be no need for 
Unicode nor for multi-byte character encodings.

Unless the language was Chinese of course.


-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem


Just for the sake of completeness :

Christopher Schultz wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

André Warnier wrote:

It is on the way through that servlet that they get corrupted, unless
I start Tomcat with LC_CTYPE=iso-8859-1.


What do the HTTP headers say when the file is served correctly versus
when it is not? I suspect that the encoding is either set incorrectly or
not set at all unless you specify LC_CTYPE.




So my question remains, I think : what could be going on in that servlet
so that :
- if LC_CTYPE is not set in the environment *of Tomcat* when it starts,
the upper iso-8859-1 characters in the pages are replaced by ?
- if LC_CTYPE is set to iso-8859-1 in the Tomcat environment when it
starts, then the pages delivered by the servlet are correct
?


My guess is that the magic servlet here is using the platform's default
encoding in the HTTP headers, which may be incorrect for the static file
in question.


I am not very qualified in Java, but could it be something like :
- the servlet reads those documents with some InputStream, without
specifying a character set or encoding


Note that InputStreams are encoding-less. Sounds like semantics, but
encodings only come into play with you are dealing with
character-oriented streams which, in Java, are called Readers and
Writers. Note that neither InputStream nor OutputStream have any methods
that deal with the char data type.


and by default that means to use
Tomcat's idea of its default LC_CTYPE for those InputStreams ?
- or the servlet outputs the document via an OutputStream without
specifying an encoding etc..


I'll bet a binary stream of data is being sent (that is, with no
interpretation or encoding) and that the JVM's default encoding is being
advertised by the server in the HTTP headers. That would certainly cause
the problem.

The last tine I looked, the http headers sent along with the documents 
were the same in both cases.


It is physically (if that's the appropriate expression in this case) the 
 high iso-8859-1 characters (bytes) in the htnl document that are 
being replaced by ? (single-byte low-ascii question mark), on the way 
from the disk file to the browser, via the servlet.
And if the LC_CTYPE of java (and Tomcat) is set to iso-8859-1 in the 
Tomcat startup script, it is no longer the case.


So I (now) believe that Chuck's earlier explanation is the correct one : 
the servlet reads the disk document with a Reader (thanks Chris), 
without specifying an encoding when it opens this Reader.

The effect is thus as follows :
- if the LC_CTYPE environment variable is not set for Java and Tomcat, 
this Reader is opened using whichever encoding happens to be then the 
JVM's default.  Obviously, in this case it is not iso-8859-1.

The servlet thus reads the iso-8859-1 data, but with the wrong decoder.
I guess then that this decoder replaces anything that does not fit into 
that default encoding, by a ?. (Would it do that, or would it trigger 
an exception ?)
So that is what the servlet reads, and it passes it unchanged to it's 
Writer and to the browser.
(Alternatively, it is at the level of the Writer of the servlet that the 
wrong encoding is used, or both).
- if the LC_CTYPE variable is set to iso-8859-1, then these 
reader_Writer default to that as an encoding, and everything works fine.


Fortunately setting the LC_CTYPE in the Tomcat startup script does not 
seem to affect other applications on the server; that is probably 
because this particular servlet is the only sloppy one, which does not 
explicitly specify an encoding when reading or writing stuff.
(It's also because in this case, there are not many other servlets apart 
from the sloppy one).


Now I'm writing the above without a solid knowledge of Java or Tomcat 
behind, so it's mostly guessing.  If someone has a good reason for 
shooting this down as an explanation, I'm still open.



I'll post another question under another title, I think this thread is 
long enough by now.


Thanks to all though.

André


-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

- Original Message - 
From: André Warnier [EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Friday, September 12, 2008 10:56 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

Just for the sake of completeness :

Christopher Schultz wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

André Warnier wrote:

It is on the way through that servlet that they get corrupted, unless
I start Tomcat with LC_CTYPE=iso-8859-1.

What do the HTTP headers say when the file is served correctly versus
when it is not? I suspect that the encoding is either set incorrectly or
not set at all unless you specify LC_CTYPE.

So my question remains, I think : what could be going on in that servlet
so that :
- if LC_CTYPE is not set in the environment *of Tomcat* when it starts,
the upper iso-8859-1 characters in the pages are replaced by ?
- if LC_CTYPE is set to iso-8859-1 in the Tomcat environment when it
starts, then the pages delivered by the servlet are correct
?

My guess is that the magic servlet here is using the platform's default
encoding in the HTTP headers, which may be incorrect for the static file
in question.

I am not very qualified in Java, but could it be something like :
- the servlet reads those documents with some InputStream, without
specifying a character set or encoding

Note that InputStreams are encoding-less. Sounds like semantics, but
encodings only come into play with you are dealing with
character-oriented streams which, in Java, are called Readers and
Writers. Note that neither InputStream nor OutputStream have any methods
that deal with the char data type.

and by default that means to use
Tomcat's idea of its default LC_CTYPE for those InputStreams ?
- or the servlet outputs the document via an OutputStream without
specifying an encoding etc..

I'll bet a binary stream of data is being sent (that is, with no
interpretation or encoding) and that the JVM's default encoding is being
advertised by the server in the HTTP headers. That would certainly cause
the problem.

The last tine I looked, the http headers sent along with the documents 
were the same in both cases.

It is physically (if that's the appropriate expression in this case) the 
high iso-8859-1 characters (bytes) in the htnl document that are being 
replaced by ? (single-byte low-ascii question mark), on the way from the 
disk file to the browser, via the servlet.
And if the LC_CTYPE of java (and Tomcat) is set to iso-8859-1 in the 
Tomcat startup script, it is no longer the case.

So I (now) believe that Chuck's earlier explanation is the correct one : 
the servlet reads the disk document with a Reader (thanks Chris), without 
specifying an encoding when it opens this Reader.

The effect is thus as follows :
- if the LC_CTYPE environment variable is not set for Java and Tomcat, 
this Reader is opened using whichever encoding happens to be then the 
JVM's default.  Obviously, in this case it is not iso-8859-1.

The servlet thus reads the iso-8859-1 data, but with the wrong decoder.
I guess then that this decoder replaces anything that does not fit into 
that default encoding, by a ?. (Would it do that, or would it trigger an 
exception ?)
So that is what the servlet reads, and it passes it unchanged to it's 
Writer and to the browser.
(Alternatively, it is at the level of the Writer of the servlet that the 
wrong encoding is used, or both).
- if the LC_CTYPE variable is set to iso-8859-1, then these 
reader_Writer default to that as an encoding, and everything works fine.

Fortunately setting the LC_CTYPE in the Tomcat startup script does not 
seem to affect other applications on the server; that is probably because 
this particular servlet is the only sloppy one, which does not explicitly 
specify an encoding when reading or writing stuff.
(It's also because in this case, there are not many other servlets apart 
from the sloppy one).

Now I'm writing the above without a solid knowledge of Java or Tomcat 
behind, so it's mostly guessing.  If someone has a good reason for 
shooting this down as an explanation, I'm still open.

I'll post another question under another title, I think this thread is 
long enough by now.

Thanks to all though.

By goerge... I think you have it... the locale encoding is taking preference 
over the header.
In theory... in newer servlets that will no longer happen... the header now 
overrules locale encoding.

If you do decide to look at this link...
http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale
Whats happening to you is described at the very bottom ;)

---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm

RE: Migrating to tomcat 6 gives formatted currency amounts problem

 From: André Warnier [mailto:[EMAIL PROTECTED]
 Subject: Re: Migrating to tomcat 6 gives formatted currency
 amounts problem

 The servlet thus reads the iso-8859-1 data, but with the
 wrong decoder. I guess then that this decoder replaces
 anything that does not fit into that default encoding,
 by a ?. (Would it do that, or would it trigger an
 exception ?)

I believe (but have not verified) that the substitution occurs for any decoding 
errors.  At least, I can't find any exceptions defined for the APIs that 
perform decoding.

 I'll post another question under another title, I think this thread is
 long enough by now.

Nah, let's go for the record.

 - Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Migrating to tomcat 6 gives formatted currency amounts problem

 From: Christopher Schultz [mailto:[EMAIL PROTECTED]
 Subject: Re: Migrating to tomcat 6 gives formatted currency
 amounts problem

 so, Java is still 16-bit Unicode in its char primitive,
 but you can use ints to hold UTF-16 values using 21-bits?

The 21-bit values are represented by pairs of Java chars, the first from the 
UTF-16 high-surrogate range, the second from the low-surrogate range.  The 
21-bit code point can be accessed as an int by some of the java.lang.Character 
methods introduced in 1.5.

 especially since java.lang.Character only takes a char as a
 constructor parameter :(

Yes, I think all the new Character methods related to code points are static; 
there are corresponding instance methods in java.lang.String though.

 - Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

On Wed, Sep 10, 2008 at 7:55 PM, Steve Ochani [EMAIL PROTECTED] wrote:

 Hmm odd.

 I tried it on my Redhat test server and worked fine also.

 Is your tomcat 6 install a default/fresh install?

 What browser are you using? What character encoding does it think the
 HelloWorldExample
 output is coming in as?


Odd indeed!

The tomcat6 install is from a fresh install. The browser I'm using is FF3.

Really, apart from Tomcat 5.5 and 6, all else is equal: it's the same app
(same war-file), running on the same hardware using exactly the same java.
And to display the app I use one and the same browser (with different tabs)
but still my application gives this difference:
http://www.laadruim.com/issue/comparison_currrency_problem.png
(I don't know if it's proper to use attachments in posting to this list, so
I made the pic available on that URL).

Willem

Re: Migrating to tomcat 6 gives formatted currency amounts problem

- Original Message - 
From: Willem Moors [EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Thursday, September 11, 2008 8:15 AM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

On Wed, Sep 10, 2008 at 7:55 PM, Steve Ochani [EMAIL PROTECTED] wrote:

Hmm odd.

I tried it on my Redhat test server and worked fine also.

Is your tomcat 6 install a default/fresh install?

What browser are you using? What character encoding does it think the
HelloWorldExample
output is coming in as?

Odd indeed!

The tomcat6 install is from a fresh install. The browser I'm using is FF3.

Really, apart from Tomcat 5.5 and 6, all else is equal: it's the same app
(same war-file), running on the same hardware using exactly the same java.
And to display the app I use one and the same browser (with different 
tabs)

but still my application gives this difference:
http://www.laadruim.com/issue/comparison_currrency_problem.png
(I don't know if it's proper to use attachments in posting to this list, 
so

I made the pic available on that URL).

Willem

Will if possible use
pound
instead... that I think its font independent...

Otherwise I think you have to sorround that
getCurrencyInstance
stuff with a font... and tell it what font it must use...

... I think

I'm just wondering how the systems guess the character set from
getCurrencyInstance... I think the answer is there...

I think this because in a text editor if you insert a pound symbol you also 
have to choose it from a font set and not all fonts support it...

So.. its getting inserted on some unknown font... and then the browser has 
to guess it...

Its something like that pound may be easier
Have fun...
---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

2008-09-11 Thread Konstantin Kolinko

2008/9/11 Willem Moors [EMAIL PROTECTED]:
 On Wed, Sep 10, 2008 at 7:55 PM, Steve Ochani [EMAIL PROTECTED] wrote:

 Hmm odd.

 I tried it on my Redhat test server and worked fine also.

 Is your tomcat 6 install a default/fresh install?

 What browser are you using? What character encoding does it think the
 HelloWorldExample
 output is coming in as?


 Odd indeed!

 The tomcat6 install is from a fresh install. The browser I'm using is FF3.

 Really, apart from Tomcat 5.5 and 6, all else is equal: it's the same app
 (same war-file), running on the same hardware using exactly the same java.
 And to display the app I use one and the same browser (with different tabs)
 but still my application gives this difference:
 http://www.laadruim.com/issue/comparison_currrency_problem.png
 (I don't know if it's proper to use attachments in posting to this list, so
 I made the pic available on that URL).

 Willem


1. What the _Browser_ thinks about encoding of your page.

In menu View  Encoding  what encoding is auto-selected there.

2. In Page Info dialog of Firefox
(in Tools menu or in context menu  Page Info )

what is Encoding, Content Type, and what META tags are mentioned (does
it include Content-Type tag)

(disclaimer: I have a localized version of FF, so the above names are
translated ones).

3. Save both pages as HTML (choose HTML only format when saving), and compare
their text.

Is there any difference?

4. Well, pound; (notice the trailing ';'), or better #163; should
display the pound sign
irregardless of what encoding the browser thinks that your page uses.

Use the #..; notation if generic xml processing is involved (the
pound; entity is defined
for (X)HTML only).

Best regards,
Konstantin Kolinko

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

2008-09-11 Thread Mark Hagger

You are almost certainly having a problem with (default) character
encodings on your system, usual things to check are the encoding that
the JVM is using, for example what does:

echo $LANG

return (usually controlled by what's defined in /etc/sysconfig/i18n -
although I'm not familiar with Ubuntu systems).

The most likely thing is that the tomcat servlet is effectively
generating content in UTF-8, and then trying to return this character to
the end client, via a PrintWriter, in ISO1 where the currency symbol in
use is not supported by ISO1, hence the '?'.  Alternatively tomcat is
returning either ISO1 or UTF-8 characters but not declaring them as such
in its response headers, leaving the browser confused and its choosing
the wrong default.  Be useful to know what headers tomcat is returning
really.

I can't begin to count the number of times I've had problems with
character encoding issues in the past, both on response and request
handling, fortunately the general trend for everything (including mobile
browsers) to support UTF-8 is slowly making life much much easier.

Mark




This email has been scanned for all known viruses by the MessageLabs SkyScan 
service.

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem


 Will if possible use
 pound
 instead... that I think its font independent...

 Otherwise I think you have to sorround that
 getCurrencyInstance
 stuff with a font... and tell it what font it must use...

 ... I think

 I'm just wondering how the systems guess the character set from
 getCurrencyInstance... I think the answer is there...

 I think this because in a text editor if you insert a pound symbol you also
 have to choose it from a font set and not all fonts support it...

 So.. its getting inserted on some unknown font... and then the browser has
 to guess it...

 Its something like that pound may be easier
 Have fun...

Definitely having fun !  ;-)

Thanks for your suggestion in using the pound sign / and the fonts, but the
'getCurrencyInstance' is supposed to hide all that from me.

I rather think it has something to do with the Tomcat 6 configuration,
because all else is equal: same server with same jave / same app / same
client / .. only in Tomcat 5.5 it does work, and in Tomcat 6 it doesn't.

Re: Migrating to tomcat 6 gives formatted currency amounts problem


 1. What the _Browser_ thinks about encoding of your page.

 In menu View  Encoding  what encoding is auto-selected there.

Western / ISO 8859-1 for both.



 2. In Page Info dialog of Firefox
 (in Tools menu or in context menu  Page Info )

 what is Encoding, Content Type, and what META tags are mentioned (does
 it include Content-Type tag)

 (disclaimer: I have a localized version of FF, so the above names are
 translated ones).


Encoding: ISO-8859-1
Content type / meta tags are not mentioned.



 3. Save both pages as HTML (choose HTML only format when saving), and
 compare
 their text.

 Is there any difference?

Since the content is Ajax generated, a save-page doesn't make much sense.
When I highlight the bits, and do a view-selection-source and then
copy/paste this into vi, I notice that the 5.5 page shows the pound sign,
while the 6.0 page shows a blank spot where the pound sign is supposed to
be.



 4. Well, pound; (notice the trailing ';'), or better #163; should
 display the pound sign
 irregardless of what encoding the browser thinks that your page uses.

 Use the #..; notation if generic xml processing is involved (the
 pound; entity is defined
 for (X)HTML only).


The NumberFormat.getCurrencyInstance(Locale.UK) is supposed to save me the
pain of putting currency signs in.


Thanks for your reply, Konstantin.

Regards,

Willem

Re: Migrating to tomcat 6 gives formatted currency amounts problem

On Thu, Sep 11, 2008 at 11:35 AM, Mark Hagger [EMAIL PROTECTED]wrote:

 You are almost certainly having a problem with (default) character
 encodings on your system, usual things to check are the encoding that
 the JVM is using, for example what does:

 echo $LANG

 return (usually controlled by what's defined in /etc/sysconfig/i18n -
 although I'm not familiar with Ubuntu systems).


echo $LANG gives me this:
en_US.UTF-8




 The most likely thing is that the tomcat servlet is effectively
 generating content in UTF-8, and then trying to return this character to
 the end client, via a PrintWriter, in ISO1 where the currency symbol in
 use is not supported by ISO1, hence the '?'.  Alternatively tomcat is
 returning either ISO1 or UTF-8 characters but not declaring them as such
 in its response headers, leaving the browser confused and its choosing
 the wrong default.  Be useful to know what headers tomcat is returning
 really.

But then, it would be the same issue for tomcat 5.5, no ? And there it
doesn't go wrong...
Like stated earlier: I rather think it has something to do with the Tomcat 6
configuration, because all else is equal: same server with same java / same
webapp / same client(FF4) / .. only in Tomcat 5.5 it does work, and in
Tomcat 6 it doesn't.

Thanks for your reply, Mark!

Regards,

Willem

Re: Migrating to tomcat 6 gives formatted currency amounts problem

- Original Message - 
From: Willem Moors [EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Thursday, September 11, 2008 12:36 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

On Thu, Sep 11, 2008 at 11:35 AM, Mark Hagger 
[EMAIL PROTECTED]wrote:

You are almost certainly having a problem with (default) character
encodings on your system, usual things to check are the encoding that
the JVM is using, for example what does:

echo $LANG

return (usually controlled by what's defined in /etc/sysconfig/i18n -
although I'm not familiar with Ubuntu systems).

echo $LANG gives me this:
en_US.UTF-8

The most likely thing is that the tomcat servlet is effectively
generating content in UTF-8, and then trying to return this character to
the end client, via a PrintWriter, in ISO1 where the currency symbol in
use is not supported by ISO1, hence the '?'.  Alternatively tomcat is
returning either ISO1 or UTF-8 characters but not declaring them as such
in its response headers, leaving the browser confused and its choosing
the wrong default.  Be useful to know what headers tomcat is returning
really.

But then, it would be the same issue for tomcat 5.5, no ? And there it
doesn't go wrong...
Like stated earlier: I rather think it has something to do with the Tomcat 
6
configuration, because all else is equal: same server with same java / 
same

webapp / same client(FF4) / .. only in Tomcat 5.5 it does work, and in
Tomcat 6 it doesn't.

Thanks for your reply, Mark!

Regards,

Willem

Will, I cant see how TC can be influencing it
You write a char (the pound) to an output stream it appears differently in 
browser...

TC is just sendign what it gets...
Its got to be this...
   NumberFormat.getCurrencyInstance(Locale.UK)
and that is Java... so I conclude... TC 6 is not on the same JDK/JRE as TC 5

You JAVA has changed... must be..

That stuff that you like is LOCALE stuff... and that stuff can all be 
configured from outside Java...

You are choosing a Locale... but if the font.property files in JRE/LIB
are different... its probably picking a wide super new Sun font... which in 
swing will make no diffs... but
where the old JRE was using the something a browser gets... the new GB_SUPER 
font with english flags and the national anthem... confuses current 
browsers.

 I think... you looking in the wrong place...

Convert it to bytes... and print that... you will see it... I think

Then just to confince yourself that TC is not doing a weird Arabic header... 
get the header plugin for FireFox... and have a look...

I doubt they diffs...

Have more fun...
---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

On Thu, Sep 11, 2008 at 1:42 PM, Johnny Kewl [EMAIL PROTECTED] wrote:

 Will, I cant see how TC can be influencing it
 You write a char (the pound) to an output stream it appears differently in
 browser...
 TC is just sendign what it gets...
 Its got to be this...
   NumberFormat.getCurrencyInstance(Locale.UK)
 and that is Java... so I conclude... TC 6 is not on the same JDK/JRE as TC
 5

 You JAVA has changed... must be..

Sorry to have to disappoint you, but this server was installed just a few
days ago, and there is only ONE JDK on it:
 java version 1.6.0_07
 Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
 Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)

So it's impossible the TC5.5 uses a diffferent Java then TC6.



 That stuff that you like is LOCALE stuff... and that stuff can all be
 configured from outside Java...

 You are choosing a Locale... but if the font.property files in JRE/LIB
 are different... its probably picking a wide super new Sun font... which in
 swing will make no diffs... but
 where the old JRE was using the something a browser gets... the new
 GB_SUPER font with english flags and the national anthem... confuses current
 browsers.

  I think... you looking in the wrong place...

 Convert it to bytes... and print that... you will see it... I think

Can it be one of the libraries (*.jar) that is different, that forcec TC6 to
act differently ?



 Then just to confince yourself that TC is not doing a weird Arabic
 header... get the header plugin for FireFox... and have a look...
 I doubt they diffs...

That is a good track to follow ! Thanks for this advice.




 Have more fun...

Thanks!

Willem

Re: Migrating to tomcat 6 gives formatted currency amounts problem

2008-09-11 Thread Jeff

On Wed, Sep 10, 2008 at 10:27 AM, Willem Moors [EMAIL PROTECTED] wrote:
 I'm transferring my application from a tomcat 5.5.26 server to tomcat
 6.0.18, and notice that my formatted currency amounts are not being properly
 displayed. Instead of a Pound (GBP) sign I get a question mark within a
 black diamond (the app works fine in 5.5.26).

 This can easily be emulated. Add the following lines to the
 HelloWorldExample.java of the servlet examples in Tomcat 5.5.26 and those of
 6.0.18:

  java.text.NumberFormat currencyFormat=
 java.text.NumberFormat.getCurrencyInstance(Locale.UK);
  out.print(Formatted currency (GBP) :  + currencyFormat.format(
 1623540.00 ) );

 This will display the following :

 In Tomcat 6.0.18: Formatted currency (GBP) : ?1,623,540.00
 (I've emulated the question-mark within diamond here, I'll send you a
 screenshot if you want)

 Tomcat 5.5.26: Formatted currency (GBP) : £1,623,540.00
 (depending on your client you may or not may see the pound sign in front of
 the above amount)

 What can be the problem, is there some extra locale configuration that needs
 to be done ?

I experienced similar issues (though not UK Locale) running Tomcat in
Linux/UNIX. For reasons unknown, my Tomcat/Java was not picking up the
default locale of the OS. So I explicitly set them for the JVM by
putting JAVA_OPTS=-Duser.country=US -Duser.language=en in setenv.sh.
Problem solved. This is admittedly a duct-tape solution. I would
rather know why Java was not using the proper locale and get that
fixed, but time is money.

Examine your Tomcat 5 setup, maybe a similar tweak had been made there..

-- 
Jeff

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem



- Original Message - 
From: Willem Moors [EMAIL PROTECTED]



 I think... you looking in the wrong place...

Convert it to bytes... and print that... you will see it... I think


Can it be one of the libraries (*.jar) that is different, that forcec TC6 
to

act differently ?


--- Will's Phantom Font Project ---

I been trying to find a way for you to set the font you want for a locale...
It does seem to be an option in JAVA... ie I think Java is expecting to find 
that from a GUI


But here is the whole story
http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale

Notice that on linux there are things like it depends if the font server 
starts up... yada yada.

I'm totally surprized that its the same JRE...

I think it may be possible that something else is setting the font... and 
then the JRE is using that.
The above link actually gives you a way to find out what font is been picked 
up...


But... I think this is all wrong anyway... say you get it figured out, and 
pick Heleva... or whatever... then you now have to tell the browser to use 
that in CSS or whatever its the beginning of a complex cycle...


pound is making it the browsers problem and internally the browser will 
find a font and make it happen...


And then if someone moves your servlet to a headless linux here we go 
again... is the font there... etc


I think you can get it to work, and it is interesting... but I'm not sure 
you want to...


I'd luv to know if the theory is right on your system... ie run this

 String s = currencyFormat.format(1623540.00 );
 byte[] ba = s.getBytes();
 String ans = ;
 for (int i = 0; i  ba.length; i++) {
 ans += Integer.toHexString(ba[i]);
 }
 System.out.print(DA BYTES :  + ans);

See if the bytes are changing... ie the fonts are changing...

... that me out of idea's... other than it look like Java's localization can 
nail you... and I'm now worrying about some of my systems... ha ha.

---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---







-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

- Original Message - 
From: Johnny Kewl [EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Thursday, September 11, 2008 4:28 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

- Original Message - 
From: Willem Moors [EMAIL PROTECTED]

 I think... you looking in the wrong place...

Convert it to bytes... and print that... you will see it... I think

Can it be one of the libraries (*.jar) that is different, that forcec TC6 
to

act differently ?

--- Will's Phantom Font Project ---

I been trying to find a way for you to set the font you want for a 
locale...
It does seem to be an option in JAVA... ie I think Java is expecting to 
find that from a GUI

But here is the whole story
http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale

Notice that on linux there are things like it depends if the font server 
starts up... yada yada.

I'm totally surprized that its the same JRE...

I think it may be possible that something else is setting the font... and 
then the JRE is using that.
The above link actually gives you a way to find out what font is been 
picked up...

But... I think this is all wrong anyway... say you get it figured out, and 
pick Heleva... or whatever... then you now have to tell the browser to use 
that in CSS or whatever its the beginning of a complex cycle...

pound is making it the browsers problem and internally the browser 
will find a font and make it happen...

And then if someone moves your servlet to a headless linux here we go 
again... is the font there... etc

I think you can get it to work, and it is interesting... but I'm not sure 
you want to...

I'd luv to know if the theory is right on your system... ie run this

 String s = currencyFormat.format(1623540.00 );
 byte[] ba = s.getBytes();
 String ans = ;
 for (int i = 0; i  ba.length; i++) {
 ans += Integer.toHexString(ba[i]);
 }
 System.out.print(DA BYTES :  + ans);

See if the bytes are changing... ie the fonts are changing...

... that me out of idea's... other than it look like Java's localization 
can nail you... and I'm now worrying about some of my systems... ha ha.

IE Format your numbers but dont include a currency symbol thru Java... 
use pound...

Interesting question... thanks
---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

I studied the Response Headers for the ajax call that generates the output
and found that for the correct result (ie. in TC55), the content type was
this:
Content-Typetext/plain;charset=ISO-8859-1

while for the wrong result (ie. in TC6), the content type was:
Content-Typetext/plain


So I added this line to my code :
response.setCharacterEncoding(ISO-8859-15);
(I chose the ISO-..-15 set, to see if my change had effect)

And lo and behold: problem solved !

So would this be the right conclusion : it's TC55 that's wrong here and not
TC 6 ?
TC55 slaps on the 'charset=ISO-8859-1' by default and TC 6 doesn't.

Anyway, glad to have found the solution, thank you all for chipping in
your ideas!


Regards,

Willem

Re: Migrating to tomcat 6 gives formatted currency amounts problem

2008-09-11 Thread Konstantin Kolinko

2008/9/11 Willem Moors [EMAIL PROTECTED]:
I studied the Response Headers for the ajax call that generates the output
and found that for the correct result (ie. in TC55), the content type was
this:
Content-Typetext/plain;charset=ISO-8859-1

while for the wrong result (ie. in TC6), the content type was:
Content-Typetext/plain

So I added this line to my code :
response.setCharacterEncoding(ISO-8859-15);
(I chose the ISO-..-15 set, to see if my change had effect)

And lo and behold: problem solved !

So would this be the right conclusion : it's TC55 that's wrong here and not
TC 6 ?
TC55 slaps on the 'charset=ISO-8859-1' by default and TC 6 doesn't.

Anyway, glad to have found the solution, thank you all for chipping in
your ideas!

Hi, Willem!

Glad to hear, that you solved this.

By the way, I think it is not Tomcat, but the browser that is confused when the
encoding is not specified in the Content-Type header.

Those question marks were '? in a romb' i.e. Unicode replacement symbol. I.e.
as if those were replaced at the browser side. When PrintWriter replaces
symbols, it prints '?' punctuation mark.

Is it true, that the Content-Type header of your Ajax responses now has
the ;charset=... suffix? (Is Content-Type updated from your
setCharacterEncoding(), or not?)

Also, I have heard that Ajax responses that are read through XmlHttpRequest
are expected to be in UTF-8. E.g., mentioned here:
http://dojotoolkit.org/book/dojo-book-0-9/part-3-programmatic-dijit-and-dojo/i18n/encoding-considerations

Also, your HTML pages do not specify their charset explicitly, thus the
browser has to autodetect their encoding,
http://www.w3.org/TR/html4/charset.html#spec-char-encoding

Also, Tomcat wiki:
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding

Best regards,
Konstantin Kolinko

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

- Original Message - 
From: Willem Moors [EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Thursday, September 11, 2008 5:06 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

I studied the Response Headers for the ajax call that generates the output
and found that for the correct result (ie. in TC55), the content type was
this:
Content-Typetext/plain;charset=ISO-8859-1

while for the wrong result (ie. in TC6), the content type was:
Content-Typetext/plain

So I added this line to my code :
response.setCharacterEncoding(ISO-8859-15);
(I chose the ISO-..-15 set, to see if my change had effect)

And lo and behold: problem solved !

So would this be the right conclusion : it's TC55 that's wrong here and 
not

TC 6 ?
TC55 slaps on the 'charset=ISO-8859-1' by default and TC 6 doesn't.

Anyway, glad to have found the solution, thank you all for chipping in
your ideas!

Regards,

Willem

Didnt realize this was Ajax... ;)
I think browsers default to ISO-8859-1 unless set otherwise anyway... so its 
a bit strange.

Maybe the plain text has an effect...

It also depend on the Accept headers that Ajax sent to TC... if it doesnt 
specify a required encoding TC is actually at liberty to return whatever it 
wants, unless of course you dictate the encoding... I see now why you cant 
use pound ;)

I think its just a matter of telling TC what it must do, either from client 
header or as you doing... forcing a response.
Its your servlet... and you should probably also be setting the size headers 
in your response...
Its a question/answer thing, so there is no bug, unless the client said, 
gimme utf/ISO whatever and TC didnt...

So I guess the theory on localized fonts changing just fell thru ;)
I wonder how that actually works... I mean if you set a china locale... it 
just has to be a weird font... what happens if it no there?

Set those headers Ajax is not automatic either... make sure the system 
isnt guessing...

---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

- Original Message - 
From: Johnny Kewl [EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Thursday, September 11, 2008 6:18 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

- Original Message - 
From: Willem Moors [EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Thursday, September 11, 2008 5:06 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts 
problem

I studied the Response Headers for the ajax call that generates the output
and found that for the correct result (ie. in TC55), the content type was
this:
Content-Typetext/plain;charset=ISO-8859-1

while for the wrong result (ie. in TC6), the content type was:
Content-Typetext/plain

So I added this line to my code :
response.setCharacterEncoding(ISO-8859-15);
(I chose the ISO-..-15 set, to see if my change had effect)

And lo and behold: problem solved !

So would this be the right conclusion : it's TC55 that's wrong here and 
not

TC 6 ?
TC55 slaps on the 'charset=ISO-8859-1' by default and TC 6 doesn't.

Anyway, glad to have found the solution, thank you all for chipping in
your ideas!

Regards,

Willem

Didnt realize this was Ajax... ;)
I think browsers default to ISO-8859-1 unless set otherwise anyway... so 
its a bit strange.

Maybe the plain text has an effect...

It also depend on the Accept headers that Ajax sent to TC... if it doesnt 
specify a required encoding TC is actually at liberty to return whatever 
it wants, unless of course you dictate the encoding... I see now why you 
cant use pound ;)

I think its just a matter of telling TC what it must do, either from 
client header or as you doing... forcing a response.
Its your servlet... and you should probably also be setting the size 
headers in your response...
Its a question/answer thing, so there is no bug, unless the client said, 
gimme utf/ISO whatever and TC didnt...

So I guess the theory on localized fonts changing just fell thru ;)
I wonder how that actually works... I mean if you set a china locale... it 
just has to be a weird font... what happens if it no there?

Set those headers Ajax is not automatic either... make sure the system 
isnt guessing...

Actually here something interesting for you to try I discovered the IE 
is a huge guesser... some may say more intelligent...
On IE if you set the header to text/plain... but make an HTML page... its 
somehow guesses that its not text plain and makes it HTML...
Other browsers will dispay the raw HTML... browsers do guess if you dont 
help em... and IE just over rules you ;)

Make sure you test in more than one browser as well... that often catches 
stuff like this...

---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

2008-09-11 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Johnny,

Johnny Kewl wrote:
 I think it may be possible that something else is setting the font...
 and then the JRE is using that.

I think you're totally confusing yourself about font issues. Java only
interacts with fonts of any kind when running AWT/Swing apps. Webapps
have no interactions with fonts of any kind.

The font used to display the web page is entirely dependent on the web
browser. The web browser chooses the font based upon the style of the
text to be displayed, and the language it's being displayed in.

The likely problem, here, is the encoding appearing in the Content-Type
header from the server.

It's possible that Willem's 5.x server is configured with a Valve to set
the default character encoding, and that the 6.x server is not similarly
configured.

Willem, can you post the relevant sections of your server.xml files from
each version? If you can't figure out what's relevant, just post the
entire thing.

- -chris

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjJSmwACgkQ9CaO5/Lv0PCrjQCgnyTGy7SuYmJQme+uJRo+kpkH
qu0AniqswmAHi50a/6NgQlyuWJbP4U3x
=jBNr
-END PGP SIGNATURE-

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

2008-09-11 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Willem,

Willem Moors wrote:
 I studied the Response Headers for the ajax call that generates the output
 and found that for the correct result (ie. in TC55), the content type was
 this:
 Content-Typetext/plain;charset=ISO-8859-1
 
 while for the wrong result (ie. in TC6), the content type was:
 Content-Typetext/plain

Looks like the server is using something else (UTF-8?) in TC 6 and not
reporting it to the client. The client is assuming ISO-8859-1 and
therefore misinterpreting those characters outside of US-ASCII (such as £).

 So would this be the right conclusion : it's TC55 that's wrong here and not
 TC 6 ?
 TC55 slaps on the 'charset=ISO-8859-1' by default and TC 6 doesn't.

It might not be by default: lots of folks explicitly set their charsets
to UTF-8 using some other technique.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjJS3sACgkQ9CaO5/Lv0PDsoQCfXcxM6uOoaA7lWCbySN8dNblG
u0oAn0ybnK1s5T6TVZuhHemLHnoriQkr
=tDJb
-END PGP SIGNATURE-

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

- Original Message - 
From: Christopher Schultz [EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Thursday, September 11, 2008 6:42 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Johnny,

Johnny Kewl wrote:

I think it may be possible that something else is setting the font...
and then the JRE is using that.

I think you're totally confusing yourself about font issues. Java only
interacts with fonts of any kind when running AWT/Swing apps. Webapps
have no interactions with fonts of any kind.

Chris... exactly yes... it turns out he want setting headers, so you 
absolutely right...
but his code is introducing a font into a web app and thats what I'm 
wondering about...

Forget about the webapp for a moment and just look at his code...

 java.text.NumberFormat currencyFormat=
java.text.NumberFormat.getCurrencyInstance(Locale.UK);
 out.print(Formatted currency (GBP) :  + currencyFormat.format(
1623540.00 ) );

Its generating a pound... the question is, the webapp is not dicatation the 
font... so I'm asking what font is being used for the pound?

And then yes... it so happens that he has found the encoding that works in 
text plain... but its a flook, is lucky, its a problem waiting to happen
because if I change that locale of his to french, german, chinese... what 
font is that now going to be... and that will probably definitely not work 
in default US encoding...

Theres a few problem here...

He *is* introducing a font into a webapp and we dont even know what it 
is?

---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

On Thu, Sep 11, 2008 at 11:16 AM, Johnny Kewl [EMAIL PROTECTED] wrote:

 Its generating a pound... the question is, the webapp is not dicatation the
 font... so I'm asking what font is being used for the pound?

Whatever the browser picks from what it has available. :-)

 He *is* introducing a font into a webapp

No. A character, a codepoint, yes, not a font.

-- 
Hassan Schroeder  [EMAIL PROTECTED]

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

2008-09-11 Thread David Smith

I'm willing to bet the symbol for the british pound is not part of the 
normal web character set like a US dollar symbol is and as a result 
needs to be expressed by entity notation ( pound; or #163; ).


--David

Johnny Kewl wrote:


- Original Message - From: Christopher Schultz 
[EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Thursday, September 11, 2008 6:42 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts 
problem




-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Johnny,

Johnny Kewl wrote:

I think it may be possible that something else is setting the font...
and then the JRE is using that.


I think you're totally confusing yourself about font issues. Java only
interacts with fonts of any kind when running AWT/Swing apps. Webapps
have no interactions with fonts of any kind.


Chris... exactly yes... it turns out he want setting headers, so you 
absolutely right...
but his code is introducing a font into a web app and thats what I'm 
wondering about...


Forget about the webapp for a moment and just look at his code...

 java.text.NumberFormat currencyFormat=
java.text.NumberFormat.getCurrencyInstance(Locale.UK);
 out.print(Formatted currency (GBP) :  + currencyFormat.format(
1623540.00 ) );

Its generating a pound... the question is, the webapp is not 
dicatation the font... so I'm asking what font is being used for the 
pound?


And then yes... it so happens that he has found the encoding that 
works in text plain... but its a flook, is lucky, its a problem 
waiting to happen
because if I change that locale of his to french, german, chinese... 
what font is that now going to be... and that will probably definitely 
not work in default US encoding...


Theres a few problem here...

He *is* introducing a font into a webapp and we dont even know 
what it is?


--- 


HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
--- 






-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

- Original Message - 
From: Hassan Schroeder [EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Thursday, September 11, 2008 8:58 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

On Thu, Sep 11, 2008 at 11:16 AM, Johnny Kewl [EMAIL PROTECTED] 
wrote:

Its generating a pound... the question is, the webapp is not dicatation 
the

font... so I'm asking what font is being used for the pound?

Whatever the browser picks from what it has available. :-)

He *is* introducing a font into a webapp

No. A character, a codepoint, yes, not a font.

I tell you Wils example has confused the hell out of me... ha ha
Wil... you have caused chaos... ha ha

I'm probably using definition incorrectly lets just say you 
internationalizing on a page...so you have

meta http-equiv=Content-Type content=text/html; charset=UTF-8

So you can display multiple langauges

Now you designing a web page... you pick Arial... you select ® (registered 
trade mark as a font if it doesnt come out)

And life is good

But when that gets done for you... not from your own resource bundles, but 
from a locale that can be using any character point in a font
and you dont know what the font actually is the charset wont even help 
you because how does the browser know it was Arial?

If it diplays it in MS Serif... surely its going to be wrong...

Its not really a browser problem thats bugging me... its the local gives 
you something, it varies, especially on a headless linux and you cant assume 
its anything

Even worse if a chinese font has not been installed... its probably a ?

I think one has to use pound because Java's localization in this area is 
unpredicatable...

So if you do want to use the pound symbols from localization... you also 
have to discover the font (some how) and then you have to add that HTML to 
CSS code to your page

Or maybe Java is a whole lot smarter than I'm giving it credit for and its 
embedding font attributed in the UTF8 or something...

I dont know... all I do know is that putting pound in your Resource bundle 
is a whole lot easier...

Totally confused... but I think if Wil is internationalizing that app... its 
going to give him a huge head ache

They disnt make pound and reg and all the rest for nothing... I think its 
because it is a major head ache otherwise...

... I dont know... Wils phantom font has got me... ;)
---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
--- 

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

On Thu, Sep 11, 2008 at 12:54 PM, Johnny Kewl [EMAIL PROTECTED] wrote:

 Now you designing a web page... you pick Arial...

 have to discover the font (some how) and then you have to add that HTML to
 CSS code to your page

Do you not understand that style information, including fonts, is just
a serving suggestion? A user-agent has *no* obligation to use any
given font, or any font at all.

If I'm looking at your page in Lynx, the font will be whatever my own
terminal window settings specify, be it Comic Sans or Copperplate
Gothic Bold.

If I use wget to grab a page and store it into a file or a DB, there is no
font information involved at any point whatsoever -- it's just character
data in some specified (or assumed!) encoding.

If a user-agent is intended to generate a visual display /and/  has a
font available to it with a glyph matching a specified code-point in a
specified encoding, great. If not -- so sorry.   Doesn't matter whether
you were using HTML entities or numeric representation: ? is it.

FWIW,
-- 
Hassan Schroeder  [EMAIL PROTECTED]

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem



- Original Message - 
From: Hassan Schroeder [EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Thursday, September 11, 2008 11:07 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem


On Thu, Sep 11, 2008 at 12:54 PM, Johnny Kewl [EMAIL PROTECTED] 
wrote:



Now you designing a web page... you pick Arial...


have to discover the font (some how) and then you have to add that HTML 
to

CSS code to your page


Do you not understand that style information, including fonts, is just
a serving suggestion? A user-agent has *no* obligation to use any
given font, or any font at all.


http://www.kewlstuff.co.za/test/test.htm

What do you see in this test page?

---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
--- 



-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

On Thu, Sep 11, 2008 at 2:41 PM, Johnny Kewl [EMAIL PROTECTED] wrote:

 http://www.kewlstuff.co.za/test/test.htm

 What do you see in this test page?

problems :-)

http://validator.w3.org/check?uri=http%3A%2F%2Fwww.kewlstuff.co.za%2Ftest%2Ftest.htmcharset=%28detect+automatically%29doctype=Inliness=1group=0verbose=1user-agent=W3C_Validator%2F1.591

-- 
Hassan Schroeder  [EMAIL PROTECTED]

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem



- Original Message - 
From: Johnny Kewl [EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Thursday, September 11, 2008 11:41 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem




- Original Message - 
From: Hassan Schroeder [EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Thursday, September 11, 2008 11:07 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts 
problem



On Thu, Sep 11, 2008 at 12:54 PM, Johnny Kewl [EMAIL PROTECTED] 
wrote:



Now you designing a web page... you pick Arial...


have to discover the font (some how) and then you have to add that HTML 
to

CSS code to your page


Do you not understand that style information, including fonts, is just
a serving suggestion? A user-agent has *no* obligation to use any
given font, or any font at all.


http://www.kewlstuff.co.za/test/test.htm

What do you see in this test page?


Hassan I not arguing, you know nothing about that font... how is your client 
going to display it?

I'm probably missing something... teach me.
---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
--- 



-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

On Thu, Sep 11, 2008 at 2:53 PM, Johnny Kewl [EMAIL PROTECTED] wrote:

 Hassan I not arguing, you know nothing about that font... how is your client
 going to display it?

If the page contains an invalid code-point, as the error message
points out, then what should a browser display??

-- 
Hassan Schroeder  [EMAIL PROTECTED]

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

2008-09-11 Thread Markus Schönhaber


Johnny Kewl wrote:


http://www.kewlstuff.co.za/test/test.htm

What do you see in this test page?


The output of a server that lies right to my face.
It says, it is serving UTF-8-encoded text, while it really serves text 
encoded with some 8-bit charset - probably ISO-8859-1.


Regards
  mks

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

- Original Message - 
From: Hassan Schroeder [EMAIL PROTECTED]

To: Tomcat Users List users@tomcat.apache.org
Sent: Thursday, September 11, 2008 11:59 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

On Thu, Sep 11, 2008 at 2:53 PM, Johnny Kewl [EMAIL PROTECTED] wrote:

Hassan I not arguing, you know nothing about that font... how is your 
client

going to display it?

If the page contains an invalid code-point, as the error message
points out, then what should a browser display??

Thats probably what I'm not getting...
All I did was set the Font to Verdana and drop a registered mark in...

And thats what I'm worried about because locale info will default to 
something similar

I dont think that local code of Wils, knows its in a webapp?

Anyway... look I dont get it... maybe the only thing to say is that if one 
introduces technology targeting GUI and Swing into a server, its probably 
got issues.
Whether that locale stuff is intelligent enuf not to make an invalid code 
point... thats the question.

... I dont know ;)
---
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem

On Thu, Sep 11, 2008 at 3:20 PM, Johnny Kewl [EMAIL PROTECTED] wrote:

 If the page contains an invalid code-point, as the error message
 points out, then what should a browser display??

 Thats probably what I'm not getting...
 All I did was set the Font to Verdana and drop a registered mark in...

However you created your test page, it /isn't valid UTF-8/. Until that's
resolved, it has no value as a test of anything.

 Whether that locale stuff is intelligent enuf not to make an invalid code
 point... thats the question.

If that were my question, I'd be testing Locale-based code  :-)

-- 
Hassan Schroeder  [EMAIL PROTECTED]

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Migrating to tomcat 6 gives formatted currency amounts problem