At 16:46 10/09/2004, you wrote:
>thanks for the input Mark.
>
>hmm, the webservice is built in CF pulling from a MySQL DB all on SunOS 5.8.

I'm not a MySQL user, but it's very possible that if MySQL is told to store
a character with code position 150, it will store it. It has no idea what
the visual representation of code position 150 is. So, a Windows user,
submitting a web form for example, could post a string of characters, one
of which might be chr(150), i.e. en dash in Win1252 and it would be stored
by MySQL (CFMX will do similar btw, there's nothing stopping you creating a
string containing chr(150) - it's only when you go to print it that you
don't see what you expect because as far as CFMX is concerned, it's not
printable). I might be on the wrong track here, I'm just guessing because
you refer to the - sign being a problem that it may be an issue with
Win1252 chars (I think word's auto-mangle feature converts minus to en dash
in certain circumstances btw).

>I was wondering, the DSN setup in CFAdmin has a ckbox for 'Enable
>Unicode...'    wondering if this might cure it.  Thoughts?

Adding "useUnicode=true&characterEncoding=UTF-8" to the connection string
for your DSN will tell the JDBC driver to use Unicode with UTF-8 encoding.
Might do the trick.

>using some stupid replacing like this function at the moment, but its
>not exhaustive.
>
>function cleantext(text) {
>text = Replace(text, Chr(65426), "'", "All");
>text = Replace(text, Chr(65425), "'", "All");
>text = Replace(text, Chr(65427), "'", "All");
>text = Replace(text, Chr(65428), "'", "All");
>text = Replace(text, Chr(65430), "-", "All");
>Return text;
>}

That's basically the same approach I take tbh, if I'm dealing with
character sets using the same code positions to represent different characters.

Mark
[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings] [Donations and Support]

Reply via email to