[TurboGears] Re: SQLObject rant: unicode support sucks
I tried the test program and it failed for me on MySQL 4.1.15-1 from the Debian package in testing. The config of MySQL is largely unchanged from the defaults it comes with in Debian. Matthew, did you try the test program and it worked? can you send me your mysql config? I have no problem with unicode other than this cutting of text, the app easily shows all unicode data perfectly. Which is what your screenshots show. Baruch --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
Do your table columns show as having a the utf8_unicode_ci encoding? -Rob Baruch wrote: I tried the test program and it failed for me on MySQL 4.1.15-1 from the Debian package in testing. The config of MySQL is largely unchanged from the defaults it comes with in Debian. Matthew, did you try the test program and it worked? can you send me your mysql config? I have no problem with unicode other than this cutting of text, the app easily shows all unicode data perfectly. Which is what your screenshots show. Baruch --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
No. The default appears to be latin1. I can't seem to find how to turn MySQL to default to utf-8, there is the SET NAMES command, but there is no bvious way to use it in SQLObject. Baruch --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
ALTER DATABASE `database` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci ALTER TABLE `table` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci I think there might be a command you need to run for the columns as well, I'm not sure. I always use InnoDB which doesn't support column character sets (I think). I think all that's sufficient is to set the character set on the database and then all tables created within will inherit that. I believe in my.cnf you can do: [mysqld] default_character_set = utf8 I'm not sure though, I like to keep my servers as close to the reference as possible to avoid obscure bugs when moving to third party servers. -Rob Baruch wrote: No. The default appears to be latin1. I can't seem to find how to turn MySQL to default to utf-8, there is the SET NAMES command, but there is no bvious way to use it in SQLObject. Baruch --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
Tried to change the database to utf8 with your commands and there is no change with regard to my problems, the test program still fails for me. I am using MyISAM, if only because that was the default. Can you try the test program and see if it works for you on MySQL? I'd like to know if it's my problem or a MySQL problem. Baruch --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
I can confirm that on: MySQL 4.1.11-Debian_4sarge2-log With the slightly modified test case: #!/usr/bin/env python2.4 # -*- coding: utf-8 -*- import sqlobject connection = sqlobject.connectionForURI(mysql://*:[EMAIL PROTECTED]/*) sqlobject.sqlhub.processConnection = connection class MyTestClass(sqlobject.SQLObject): class sqlmeta: table = 'test_unicode_table' test_column = sqlobject.UnicodeCol(length = 5) #MyTestClass.dropTable() #MyTestClass.createTable() test123 = MyTestClass(test_column = u'\xe1\xe9\xed\xf3\xfa') print test123 print test123.test_column I get: [EMAIL PROTECTED]:/www/tg$ ./test.py Traceback (most recent call last): File ./test.py, line 18, in ? test123 = MyTestClass(test_column = u'\xe1\xe9\xed\xf3\xfa') File /usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1588-py2.4.egg/sqlobject/declarative.py, line 92, in _wrapper return_value = fn(self, *args, **kwargs) File /usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1588-py2.4.egg/sqlobject/main.py, line 1197, in __init__ self._create(id, **kw) File /usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1588-py2.4.egg/sqlobject/main.py, line 1224, in _create self._SO_finishCreate(id) File /usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1588-py2.4.egg/sqlobject/main.py, line 1251, in _SO_finishCreate self._init(id) File /usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1588-py2.4.egg/sqlobject/main.py, line 958, in _init self._SO_selectInit(selectResults) File /usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1588-py2.4.egg/sqlobject/main.py, line 1149, in _SO_selectInit colValue = col.to_python(colValue, self._SO_validatorState) File /usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1588-py2.4.egg/sqlobject/col.py, line 538, in to_python return unicode(value, self.db_encoding) File /usr/lib/python2.4/encodings/utf_8.py, line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 4: unexpected end of data [EMAIL PROTECTED]:/www/tg$ However if I change the column length (using MySQL) to something even I get: [EMAIL PROTECTED]:/www/tg$ ./test.py MyTestClass 2L test_column=u'\xe1\xe9\xed\xf3' áéíó (Truncated) or the full version at column length 10 This was with my tables running as latin1 and utf8_unicode_ci So at least with this version, it's a MySQL problem, which isn't surprising if I'm honest. MySQL is a great database, but this is one of those areas where people bitch about it being shit. I think if you want to reliably run with Unicode, at least in MySQL 4.1, you should use TEXT columns, or be prepared not to take in data more than 127 characters in length for varchar(255). I got this with InnoDB and MyISAM. MySQL 5 seems to make changes to varchar (max length is now 2^32), so this may be fixed, but the manuals for both major versions mention nothing about character sets. This should be filed as a bug against MySQL, but you almost certainly won't get it fixed in 4.1 (as MySQL can only use 1 byte to store the length of a varchar column), and I bet you stand a slim chance of getting it fixed in 5.x. I think internal functions make assumtions about character offsets that don't adhere to character sets. This is why MySQL columns have a collation, but not a characer set. A collation means this is what I will use when I want to do case insensitive sorting, not this *is* what format my data is in. MySQL is a high performance DB that makes sacrifices for speed - this is one of those sacrifices. I suggest you use PGSQL if you really need this, or use TEXT columns instead (drop the length from your model definition). Best of luck -Rob Baruch wrote: Tried to change the database to utf8 with your commands and there is no change with regard to my problems, the test program still fails for me. I am using MyISAM, if only because that was the default. Can you try the test program and see if it works for you on MySQL? I'd like to know if it's my problem or a MySQL problem. Baruch --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
Em Sexta 28 Abril 2006 14:07, Baruch escreveu: I do wonder how PostgreSQL handles this case? Does it not truncate the string, or does it truncate it smartly? It does truncate if the string is bigger than the column. In this case, as I've shown you, PostgreSQL does the right thing. The thing is that the database should know what encode to use. I haven't created an ISO-8859-1 to store this kind of data to see what happens... But, definitely, if you specified an unicode database you should have no problems with your unicode data up to the length of the columns you specified. It is, to me, another of those really annoying and stupid bugs with MySQL. Another reason for me to keep it away from my projects. -- Jorge Godoy [EMAIL PROTECTED] --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
Your exmaple above works ok if you add: connection.query('set names utf8') just after connection = ... I've also tried to add init_command=set names utf8 as a keyword argument to connection = sqlobject.connectionForURI(mysql://*:[EMAIL PROTECTED]/*) which supposedly should've been passed to MySQLdb.connect, but that didn't work, SQLObject seems to mangle it to set+names+utf8 for some reason. I also tried #!/usr/bin/env python2.4 # -*- coding: utf-8 -*- import sqlobject connection = sqlobject.connectionForURI(mysql://[EMAIL PROTECTED]/test) connection.query('set names utf8') sqlobject.sqlhub.processConnection = connection class MyTestClass(sqlobject.SQLObject): class sqlmeta: table = 'test_unicode_table' test_column = sqlobject.UnicodeCol(length = 5) MyTestClass.dropTable() MyTestClass.createTable() test123 = MyTestClass(test_column = u'\u0434\u0430\u043c\u0458\u0430\u043d') print test123 print test123.test_column (that's my name in cyrillic and it has 6 letters) but at the end I got only 5 letters: u'\u0434\u0430\u043c\u0458\u0430' ... which is good in a way since it's actually 12 bytes, so at least the length=5 is handled properly. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
It is, to me, another of those really annoying and stupid bugs with MySQL. Another reason for me to keep it away from my projects. Having to handle different charsets certanly complicated MySQL a bit, and a lot of programs/libraries aren't yet fully aware of this MySQL feature. But this is not a bug in MySQL, but just something you need to be aware of... I bet there are a lot of similar situations with PGSQL too. The thing is, you need to learn the tools you are working with. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
BTW This is what is in my /etc/my.cnf [client] default-character-set = utf8 [mysqld] character-set-server=utf8 collation-server=utf8_unicode_ci So my MySQL defaults to unicode databasess and tables, and to the more correct utf8_unicode_ci collation. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
Ah your example is very interesting. Can you make a page on trac.turbogears.org including one of the tracebacks, so people can find the answer in the future? -Rob Damjan wrote: Your exmaple above works ok if you add: connection.query('set names utf8') just after connection = ... I've also tried to add init_command=set names utf8 as a keyword argument to connection = sqlobject.connectionForURI(mysql://*:[EMAIL PROTECTED]/*) which supposedly should've been passed to MySQLdb.connect, but that didn't work, SQLObject seems to mangle it to set+names+utf8 for some reason. I also tried #!/usr/bin/env python2.4 # -*- coding: utf-8 -*- import sqlobject connection = sqlobject.connectionForURI(mysql://[EMAIL PROTECTED]/test) connection.query('set names utf8') sqlobject.sqlhub.processConnection = connection class MyTestClass(sqlobject.SQLObject): class sqlmeta: table = 'test_unicode_table' test_column = sqlobject.UnicodeCol(length = 5) MyTestClass.dropTable() MyTestClass.createTable() test123 = MyTestClass(test_column = u'\u0434\u0430\u043c\u0458\u0430\u043d') print test123 print test123.test_column (that's my name in cyrillic and it has 6 letters) but at the end I got only 5 letters: u'\u0434\u0430\u043c\u0458\u0430' ... which is good in a way since it's actually 12 bytes, so at least the length=5 is handled properly. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
Em Sexta 28 Abril 2006 15:41, Robin Haswell escreveu: IMO doing the right thing would be an SQL error. The Zen of Python, errors shouldn't go unnoticed? Except that the error is in MySQL, right? ;-) Truncating in SQL columns should either be explicit or the server should raise an exception: neo=# create table testest (testcol varchar(10)); CREATE TABLE neo=# insert into testest (testcol) values ('12345678910'); ERRO: valor muito longo para tipo character varying(10) neo=# (ERROR: value too large for type character varying(10)) Nah I disagree, MySQL is a great database as long as you are aware of the implications. MySQL tries to be an enterprise-class RDBMS, which it is not, however people who are familiar with it know the real truth: MySQL is blazingly fast BECAUSE of things like this. If you need 100% data integrity and advanced (read: rarely used) functionality then you use PG. If you want I use PG ;-) I need to trust that all data inside the database *is* valid. as fast as Oracle you use MySQL. If you want both, you use Oracle or DB2. I'd disagree... There are root servers (.org and .info) using PostgreSQL, there are none using MySQL. Speed, in this case, is very important. http://www.postgresql.org/about/users and for Afilias article: http://www.computerworld.com.au/index.php?id=760310963 I couldn't find the degradation graphic of PostgreSQL and MySQL for millions of rows, but... http://wiki.astrogrid.org/bin/view/Astrogrid/?topic=CrossMatchingReportsortcol=1table=1up=0 http://feedlounge.com/blog/2005/11/20/switched-to-postgresql/ Of course, we can keep discussing that and for each article I point you to, you can send me the double in favor of MySQL. And then, I can double you. And ... :-) Anyway, we're way off-topic here. If you want, I don't mind discussing this in private. MySQL is the PHP of RDBMS, it's great for 99% of uses, but there are fringe cases where another database will be more appropriate. For me, this problem How did you guess I don't like PHP as well? ;-) will not stop me using MySQL, I will just use TEXT columns when I want strings longer than 127 characters. If it isn't bad for you, then I don't see any problem with that. In fact, even for PG there are people that advocate the use of text columns anywhere you need something bigger than 1 char (some even for the 1 char column ;-)). It should be important to note, by the way, that this poses a mild risk to applications: Someone could insert corrupt data into the database, which would result in the application crashing with a UnicodeDecodeError every time it tries to select that data. This is also one of the reasons why integrity inside the database is crucial and required. You have to trust that your data is safe and is correct, if the RDBMS doesn't provide you that, I'm sorry, but it is not a good choice. I think, a solution to the crashing would be for SQLObject to encode UnicodeCol characters in to UTF-16, which is a 2-byte character set? So if you had even column lengths, then there is no chance of MySQL truncating the data in the middle of a character. Why penalize everyone that uses a database server that works by requiring twice the amount of space for a string like this? UTF-8 is a very good choice since it only uses 2-byte chars when needed. You save disk space, memory and other. I am of the opinion that what is broken is what should be fixed, no matter how many workarounds you can put in place to circumvent those failures. I'm sorry for expressing my opinion and taking this debate to an off-topic thread. I'll stop here and if you want, we can talk in private. -- Jorge Godoy [EMAIL PROTECTED] --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
Hi, jvanasco wrote: wrong? didn't we just say the exact same thing? no, we didn't. You said | unicode doesn't store every charcter with the same required space and I replied | Unicode doesn't store characters at all. In fact, there are many unicode *encodings* that use the same amount of space for every character. Cheers, --Jan Niklas --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
jvanasco wrote: Are you sure that truncation is done in sqlobject? what database are you using? if its mysql, chances are the truncation happened there ( you can set it to use 'traditional sql' to give you warnings on that if you're running mysql5+) mysql is awful about that. Aside from that, I think the issue is more of a problem with Unicode -- unicode doesn't store every charcter with the same required space ( http://en.wikipedia.org/wiki/UTF-8 ). The ascii strings aren't re-encoded for legacy support -- so if you wanted a UnicodeCol of 100 characters , you'd really have to set the schema to be ( 100 x Max Storage Requirement for UTF Version ) Most probably an useless reply but I've just found this link today regarding python and unicode: http://dalchemy.com/opensource/unicodedoc/ Ciao Michele --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
Hi, jvanasco wrote: Aside from that, I think the issue is more of a problem with Unicode -- unicode doesn't store every charcter with the same required space ( http://en.wikipedia.org/wiki/UTF-8 ). wrong. Unicode doesn't store characters at all. It's an UTF-8 issue, but UTF-8 is just one *encoding* for unicode characters. Under those circumstances where the byte length has to be calculated in advance without knowing content, one should use an encoding, that uses the same number of bytes for every unicode character. UTF-16 would be a good coice. On the other hand UTF-8 is better for legacy support: ASCII strings aren't any different in UTF-8. What I don't know is how to tell SQLObject (if at all possible) to use UTF-16 instead of UTF-8... Cheers, --Jan Niklas --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
On 4/27/06, Jan Niklas Fingerle [EMAIL PROTECTED] wrote: What I don't know is how to tell SQLObject (if at all possible) to use UTF-16 instead of UTF-8... I haven't used it myself, but UnicodeCol accepts a dbEncoding parameter, which defaults to UTF-8. -- Tim Lesher [EMAIL PROTECTED] --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
wrong? didn't we just say the exact same thing? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
What I don't know is how to tell SQLObject (if at all possible) to use UTF-16 instead of UTF-8... I haven't used it myself, but UnicodeCol accepts a dbEncoding parameter, which defaults to UTF-8. Seems like that should be the solution, looking at the source there is indeed a dbEncoding attribute to UnicodeCol and using UTF-16 will make more sense than UTF-8 for me. The data I have is always kept in Unicode inside my app anyway and I don't care much about non-Python code reading the data. I still feel that SQLObject is not doing the obvious thing here, it should use the UTF-16 encoding from the start especially when it has this comment for UnicodeCol: Note: parameters in queries will not be automatically encoded, so if you do a query matching a UnicodeCol column you must apply the encoding yourself. This will bite the ass of anyone who naively uses UnicodeCol's and thinks that his queries will just work properly. I'm pretty sure the Identity system uses UnicodeCol throughout, did anyone really checked if they work for non-ascii characters? esepcially those cases where the unicode string is encoded into a too long username string? Baruch --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
On 4/27/06, Baruch [EMAIL PROTECTED] wrote: I still feel that SQLObject is not doing the obvious thing here, it should use the UTF-16 encoding from the start especially when it has this comment for UnicodeCol: Note: parameters in queries will not be automatically encoded, so if you do a query matching a UnicodeCol column you must apply the encoding yourself. This will bite the ass of anyone who naively uses UnicodeCol's and thinks that his queries will just work properly. This is just conjecture, but on most Python distributions, the internal representation for a ufoo is UTF-8, so that's probably why SQLObject defaults to it. I'm guessing that if you use the default UTF-8 encoding for a column and pass a Python Unicode string in a select, the comparison will work; if the SQLObject encoding is different from the internal Python encoding, it won't. Sounds like an important unit test to write, though. -- Tim Lesher [EMAIL PROTECTED] --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
Still, whoever uses UnicodeCol should be wary of the case when the string is truncated. It leads to exceptions you don't normally expect. Baruch --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
Em Quinta 27 Abril 2006 19:23, Baruch escreveu: Still, whoever uses UnicodeCol should be wary of the case when the string is truncated. It leads to exceptions you don't normally expect. I'm using UnicodeCol here with PostgreSQL and I've just written a test case. Lets see what happens here: === #!/bin/python # -*- coding: utf-8 -*- import sqlobject #!/bin/python # -*- coding: utf-8 -*- import sqlobject conn_str = 'postgres://godoy:[EMAIL PROTECTED]/test_unicode' connection = sqlobject.connectionForURI(conn_str) sqlobject.sqlhub.processConnection = connection class MyTestClass(sqlobject.SQLObject): class sqlmeta: table = 'test_unicode_table' test_column = sqlobject.UnicodeCol(length = 5) test123 = MyTestClass(test_column = u'áéíóú') print test123 print test123.test_column === And here's the output === [EMAIL PROTECTED] ~/tmp/tempo % python tempo.py MyTestClass 1L test_column=u'\\xe1\\xe9\\xed\\xf...' áéíóú [EMAIL PROTECTED] ~/tmp/tempo % === Just to be sure, here's what I get in PostgreSQL after a few runs: === test_unicode=# select * from test_unicode_table; id | test_column +- 1 | áéíóú 2 | áéíóú 3 | áéíóú 4 | áéíóú 5 | áéíóú 6 | áéíóú (6 registros) test_unicode=# === And here's the table structure to corroborate that the size was declared correctly: === test_unicode=# \d test_unicode_table Tabela public.test_unicode_table Coluna| Tipo | Modificadores -+--+- id | integer | not null default nextval('test_unicode_table_id_seq'::regclass) test_column | character varying(5) | Índices: test_unicode_table_pkey PRIMARY KEY, btree (id) test_unicode=# === So, your problem is not with SQL Object but with your database messing with things. Of course, I have setup mine to work with Unicode: === test_unicode=# \set VERSION = 'PostgreSQL 8.1.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.0.2 20050901 (prerelease) (SUSE Linux)' AUTOCOMMIT = 'on' VERBOSITY = 'default' PROMPT1 = '%/%R%# ' PROMPT2 = '%/%R%# ' PROMPT3 = ' ' DBNAME = 'test_unicode' USER = 'godoy' HOST = 'internet' PORT = '5432' ENCODING = 'UTF8' HISTSIZE = '500' LASTOID = '0' test_unicode=# === -- Jorge Godoy [EMAIL PROTECTED] --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
On Apr 27, 2006, at 2:45 PM, Tim Lesher wrote: On 4/27/06, Baruch [EMAIL PROTECTED] wrote: I still feel that SQLObject is not doing the obvious thing here, it should use the UTF-16 encoding from the start especially when it has this comment for UnicodeCol: Note: parameters in queries will not be automatically encoded, so if you do a query matching a UnicodeCol column you must apply the encoding yourself. This will bite the ass of anyone who naively uses UnicodeCol's and thinks that his queries will just work properly. This is just conjecture, but on most Python distributions, the internal representation for a ufoo is UTF-8, so that's probably why SQLObject defaults to it. Uh, no. Totally wrong. Exactly zero Python representations use UTF-8 internally. They either use UCS-2 or UCS-4. The reason for the SQLObject default is that UTF-8 is the most likely unicode codec for unicode data coming to and from databases (and also network traffic). I know PostgreSQL deals in UTF-8, and I'd guess that MySQL is the same. -bob --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
I'm using UnicodeCol here with PostgreSQL and I've just written a test case. Lets see what happens here: I, too, am having no trouble with SQLObject and Unicode text, over MySQL 4.x, even. See the following two screenshots: --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
I'm using UnicodeCol here with PostgreSQL and I've just written a test case. Lets see what happens here: I, too, am having no trouble with SQLObject and Unicode text, over MySQL 4.x, even. See the following two screenshots: Accented Language Test: http://flickr.com/photos/gothcandy/135991899/ Japanese Language Test: http://flickr.com/photos/gothcandy/135991900/ (Sorry for the broken first post... wrong key! XD ) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
Em Quinta 27 Abril 2006 20:40, Matthew Bevan escreveu: I'm using UnicodeCol here with PostgreSQL and I've just written a test case. Lets see what happens here: I, too, am having no trouble with SQLObject and Unicode text, over MySQL 4.x, even. See the following two screenshots: So his problem is either with the configuration of his server or with the database server he's using (if it is not PostgreSQL and MySQL... I believe SQLite works too.) -- Jorge Godoy [EMAIL PROTECTED] --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
On 4/27/06, Bob Ippolito [EMAIL PROTECTED] wrote: Uh, no. Totally wrong. Exactly zero Python representations use UTF-8 internally. They either use UCS-2 or UCS-4. I'm absolutely sure that's not correct. Check PEPs 100 and 261... Hmm. Looks like you're absolutely correct. :-) For some reason I was thinking that it was originally UCS-2, but PEP261 added UTF-8 (rather than UCS-4). Thanks for the heads-up. -- Tim Lesher [EMAIL PROTECTED] --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
On 4/27/06, Jorge Godoy [EMAIL PROTECTED] wrote: test_unicode=# select * from test_unicode_table; id | test_column +- 1 | áéíóú 2 | áéíóú 3 | áéíóú 4 | áéíóú 5 | áéíóú 6 | áéíóú (6 registros) Do those characters require more than 1 byte for UTF-8 encoding? Kevin --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
On Apr 27, 2006, at 7:09 PM, Kevin Dangoor wrote: On 4/27/06, Jorge Godoy [EMAIL PROTECTED] wrote: test_unicode=# select * from test_unicode_table; id | test_column +- 1 | áéíóú 2 | áéíóú 3 | áéíóú 4 | áéíóú 5 | áéíóú 6 | áéíóú (6 registros) Do those characters require more than 1 byte for UTF-8 encoding? Yeah... len(unicodedata.normalize('NFC', u'\xe1\xe9\xed\xf3\xfa').encode ('utf-8')) 10 len(unicodedata.normalize('NFD', u'\xe1\xe9\xed\xf3\xfa').encode ('utf-8')) 15 -bob --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---
[TurboGears] Re: SQLObject rant: unicode support sucks
Em Quinta 27 Abril 2006 23:26, Bob Ippolito escreveu: On Apr 27, 2006, at 7:09 PM, Kevin Dangoor wrote: Do those characters require more than 1 byte for UTF-8 encoding? Yeah... len(unicodedata.normalize('NFC', u'\xe1\xe9\xed\xf3\xfa').encode ('utf-8')) 10 len(unicodedata.normalize('NFD', u'\xe1\xe9\xed\xf3\xfa').encode ('utf-8')) 15 Thanks Bob! I was testing and trying to solve some problems with the new decorator syntax and haven't seen this before. -- Jorge Godoy [EMAIL PROTECTED] --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups TurboGears group. To post to this group, send email to turbogears@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears -~--~~~~--~~--~--~---