So today for the second time in six weeks we are faced with rolling back to mysql 4.0 because of dramas with character sets. I don't know about anyone else but this supposedly wonderful feature has been nothing but a nightmare for us.
So our Application servers use Unicode for our non US English products, and they talk to MySQL through Connector J with a flag set to use Unicode in the JDBC config. First time around we just dumped the data and then imported it into the 4.1 instance. Everything looked good, but it wasn't. The German folks were complaining their various umlauts and so on were missing, and there was more. Of course we're told to just bring the data over to mysql 4.1 and we'll have no problems, so we do that, and because we didn't specify a character set for the import, we got latin1, and our German and Chinese and... All broke. So six weeks of trial and experimentation later and we try for another update. This time in our create database statement when we begin to import the database, we set the default character set to utf8 for everything. Now after the import our Germans and Chinese folks still get the results they expect. A day later and we are getting complaints from Hong Kong that there are a whole bunch of messages appearing on their discussions with no message body. We look at the backend and right there in the database the messages are sitting and the body consists of exactly one space. Whatever content was sent to us, was turned into one space. We look at it and we see that there a more than a few messages that got migrated from 4.0 to 4.1 and their message bodies are also one space. Not all messages, just some. Not all messages from any individual user, just some... The 4.0 version of the data has content that consists of more than a single space... Can't quite tell what it is, but there's content there in 4.0 that disappears in 4.1. So I understand that having multiple character sets is a good thing, but to be honest, I pretty much thought we had it in 4.0.. We told the JDBC to us Unicode and away we went... Clearly someone was using something that wasn't unicode (some of the comments suggest that there is some Japanese in the missing messages, but I can't tell), and for whatever reason mysql 4.1 decided it should be repalced with a space character. I'm probably missing the point of the character set support along the way somewhere... But I need to know how to fix this (I understand that's difficult when all I have left is one blank space and don't know how to reproduce the problematic data). What did I miss in the simple "open your data files with 4.1 and it's good to go" instructions... What character set performs the same as MySQL 4.0, where it didn't care what character set you gave it, it would accept it? Can we have a character set that will give us this functionality? And why are we taking input data on an import and by the looks of it an insert, and turning it into a single space, can't we do something better with the data? 4.0 worked for us with products in 20+ languages. It worked with no great effort and no problems... Now we have the new enhanced version which provides "better" support for international character sets, and we find ourselves with lost data from the moment we import, and user posts disappearing as they come in. What do we do to not have this problem? Best Regards, Bruce -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]