This isn't really a problem with MySQL; MySQL just trips on it. Let me caution you that this discussion applies to Windows. I did observe this in my applications on Linux, but the problem will show itself only when reading a file, whether uploaded or not; there will be no problem if the data is entered into a form in a web browser. I did most of my testing with 0x96, the Windows n-dash.
- Trying to INSERT a "funky" character will give you a "data truncated" message from a 4.x server, or a complaint about the character not being valid in the target character set from a 5.x server. The error message from the 5.x server will point you directly to the problem character. - The ultimate cause of the problem is that Windows applications use the CP-1252 character set. It is similar to, but not identical to, UTF-8. - MySQL does not recognize the CP-1252 character set, at least not out of the box. Therefore you cannot get MySQL to convert CP-1252 characters to UTF-8. You could try CP-1251, I didn't test that. - Sourcing a file containing one of these troublesome characters into the MySQL CLI will trigger the problem, because the data is sucked in as-is. - Copying and pasting into the CLI will sometimes avoid the problem because Windows automatically transliterates some characters during that operation. - Windows does NOT TRANSLATE the characters. Note that I said "transliterates". The 0x96 character, the CP-1252 n-dash, is silently changed to 0x2D: a hyphen when you paste it into a console application. - The reason that data entered into a web form works is that IE (and I suppose other browsers) maps the CP-1252 characters into their corresponding UTF-8 characters. Thus 0x96, when pasted into a browser, turns into 0xe28093 when pushed back to the server. That three-byte sequence is the UTF-8 n-dash character. What we have chosen to do is to transliterate the most common of these troublesome characters ourselves. In PHP, when working with a file it looks like this: $cp1252_special_chars = array( "\x96" => "-", "\x97" => "--", "\x91" => "\\'", "\x92" => "\\'", "\x85" => "...", "\x93" => "\"", "\x94" => "\"" ); On the web server side (UTF-8) it looks like this: $utf_special_chars = array( "\xe2\x80\x93" => "-", "\xe2\x80\x94" => "--", "\xe2\x80\x98" => "\\'", "\xe2\x80\x99" => "\\'", "\xe2\x80\xa6" => "...", "\xe2\x80\x9c" => "\"", "\xe2\x80\x9d" => "\"" ); This is not a perfect solution for everyone, but it suits our needs. The data we work with comes in higgledy-piggledy, and we want it consistent whether someone sends us a data file or pastes a string into one of our forms. If somebody (not I, given my level of knowledge and available time) adds CP-1252 to the character sets available with MySQL, then you could use CONVERT() with USING to handle this (assuming that you knew ahead of time where your data was coming from). I hope this helps, I grew a few rings and shed a lot of bark to get here. Regards, Jerry Schwartz The Infoshop by Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 www.the-infoshop.com www.giiexpress.com www.etudes-marche.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]