Hi Mike, Thank you for all the details. I've seen the merits of the arguments and I guess the conclusion is these points:
1) The MySQL specific option of use_unicode is not needed anymore. The MySQL specific charset option also is deprecated. 2) The SQLAlchemy convert_unicode option is recommended, and is needed when you're autoloading tables and you want to get unicode objects. It obsoletes the use_unicode option. Correspondingly, the SQLAlchemy encoding option obsoletes the MySQL specific charset option. 3) No options will help you when doing raw engine.execute operations; you must convert for yourself. 4) You should use Unicode fields inside of your application in order for your internal encoding/decoding logic to not depend on the database configuration. I.e., whether you're using convert_unicode or not, when using Unicode field you will always get a unicode object from SQLAlchemy. 5) Pylons does not properly pass the convert_unicode option to create_engine. This has been discussed on this list in many forms, most recently by Mike himself, and everyone seems to be agreed it needs fixing and soon. I've created ticket #250 for this. There was one point you made that I believe is inaccurate and so did not include it in the above. Where you said that if you want to use the convert_unicode option, you must know what encoding the database is in, but my tests don't suggest that. My MySQL settings are currently: character_set_client latin1 character_set_connection latin1 character_set_database latin1 character_set_filesystem binary character_set_results latin1 character_set_server latin1 character_set_system utf8 character_sets_dir /usr/share/mysql/charsets/ collation_connection latin1_swedish_ci collation_database latin1_swedish_ci collation_server latin1_swedish_ci which are the defaults on Debian, and I'm happy in my ignorance. AFAIK as long as the character_set_system is utf-8, the data itself is stored in utf-8, and other variables like client and server encoding can be changed at will to force MySQL (client or server respectively) to convert from utf-8 to whatever else. I've verified that with the above MySQL settings and only using Unicode field types, no covert_unicode or use_unicode or any other settings, the data in the DB is utf-8 and I get unicode objects and everything is groovy. That suggests to me that if you really care, you should set the backend encoding to utf-8 or whatever with the SQLAlchemy encoding settings, but it is not necessary nor especially recommended. Does that tie up the loose ends on this discussion? Thanks again, David Mike Orr wrote: > > The first thing is to see what MySQL thinks its server and client > character sets are: > > $ paster shell development.ini >>>> import pylons.database >>>> engine = pylons.database.create_engine() >>>> engine.execute("SHOW VARIABLES").fetchall() > > Look for the variables that start with "character_set_" and > "collation_". On my workstation they show: > > (u'character_set_client', u'utf8'), > (u'character_set_connection', u'utf8'), > (u'character_set_database', u'utf8'), > (u'character_set_filesystem', u'binary'), > (u'character_set_results', u'utf8'), > (u'character_set_server', u'utf8'), > (u'character_set_system', u'utf8'), > (u'character_sets_dir', u'/usr/share/mysql/charsets/'), > (u'collation_connection', u'utf8_general_ci'), > (u'collation_database', u'utf8_general_ci'), > (u'collation_server', u'utf8_general_ci'), > > Then you have a choice of "?use_unicode=1" (a MySQLdb option in the DSN) > or: > > create_engine(dsn, convert_unicode=True, encoding="utf8") > > SQLAlchemy wants to standardize on 'convert_unicode' because it's the > same for all engines, but currently there are some problems with that: > > 1) You have to know what character set the MySQL connection is. > > 2) If you're using pylons.database.session_context, you can't add > arguments to the create_engine call without hacking pylons.database or > reimplementing the functions yourself. Hopefully this will be > improved in a future version of Pylons -- as soon as we figure out > what to do. > > 3) convert_unicode works at the Table level, so raw engine.execute() > queries will still be 'str' strings. > > * * * * > I have MySQLdb 1.2.1 because Kubuntu is still on the older version and > I can't get the newer one to compile: > > $ easy_install MySQL-python > File "/tmp/easy_install-p8zEac/MySQL-python-1.2.2/setup_posix.py", > line 24, in mysql_config > EnvironmentError: mysql_config not found > > Thank you, now what is "mysql_config"? > > * * * * > With autoloaded tables you'll get unicode values if you use either > use_unicode or convert_unicode, or str values if you don't. With > declared columns you'll also have to use the Unicode column type. > > I had to use a my.cnf file to get the client and server charsets to > match. Otherwise I was getting: > > (u'character_set_client', u'latin1'), > (u'character_set_connection', u'latin1'), > (u'character_set_database', u'utf8'), > (u'character_set_filesystem', u'binary'), > (u'character_set_results', u'latin1'), > (u'character_set_server', u'utf8'), > (u'character_set_system', u'utf8'), > (u'character_sets_dir', u'/usr/share/mysql/charsets/'), > (u'collation_connection', u'latin1_swedish_ci'), > (u'collation_database', u'utf8_general_ci'), > (u'collation_server', u'utf8_general_ci'), > > The my.cnf is a symlink to /etc/mysql/my.cnf and contains among other > things: > > [client] > default-character-set = utf8 > > [mysqld] > character-set-server = utf8 > default-character-set = utf8 > > The DSN looks like this: > > sqlalchemy.dbapi = > mysql://...?read_default_file=%(here)s/my.cnf&use_unicode=1 > > Unfortunately MySQLdb does not raise an exception if the file is > missing, so it's worth creating a unit test that checks whether all > the character sets are correct. I haven't tried that yet. > > * * * * > If you have existing records in the database which are in a different > charset than what the column purports to be, and you want to keep the > actual values as is, change the column type to BINARY in MySQL, then > change it to the desired charset: > > http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html > > If you want to change the data to the charset the column purports to > be, write a Python program that connects WITHOUT Unicode support, read > the values, decode them from the source charset, encode them into the > destination charset, and write them back to the database. > > With "mysql://...?use_unicode=1" and reflected tables, I get Unicode > values. Without use_unicode, I get string values. If I hack > pylons.database to pass 'convert_unicode=True' to create_engine() in a > way that it'll be propagated to the session_context, I get Unicode > values again. Curiously, in all three cases the column instances are > MSString with convert_unicode=False: > > Column(u'name',MSString(length=255,convert_unicode=False),nullable=False) > > * * * * > The following didn't work: > > "mysql://...?use_unicode=1&charset=utf8" > > sqlalchemy.exceptions.DBAPIError: (Connection failed) > (OperationalError) (2019, 'Can\'t initialize character set utf8" > (path: /usr/share/mysql/charsets/)') > > This may be because I'm using the older MySQLdb, or maybe it's a bug > Ubuntu's packaging of MySQL. > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pylons-discuss" group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~----------~----~----~----~------~----~------~--~---