Hi,
I have a customer who regularly cuts text from Word documents before pasting
them into forms I created for him on his web site. The text often contains
non-UTF-8 characters such as u2019 for single quotes or u201C for
double-quotes. We were having some problems storing these characters in our
database, so I added a filter that replaces them with the standard quotes from
the UTF-8 set.
I tested my work by deploying to a local copy of JBoss on my workstation, which
is a Windows XP computer, and it worked fine. I did the conversion using the
String.replace function, for example:
s = s.replace('\u201C', '"');
However, when I deployed this to my production environment - which has the same
version of Java, and the same version of JBoss, but is Linux - it failed. To
see what was going on, I tried logging all the characters of the input string
using s.codePointAt(). It turns out that instead of getting characters 201C and
2019, I'm getting character FFFD in both cases.
Does anyone understand why this is happening? I have been working with Java for
almost 7 years, and I have never encountered an inconsistency between its
behavior on Linux and Windows before.
Thanks,
Frank
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4001916#4001916
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4001916
_______________________________________________
jboss-user mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/jboss-user