[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-28 Thread Baruch

I tried the test program and it failed for me on MySQL 4.1.15-1 from
the Debian package in testing. The config of MySQL is largely unchanged
from the defaults it comes with in Debian.

Matthew, did you try the test program and it worked? can you send me
your mysql config?

I have no problem with unicode other than this cutting of text, the app
easily shows all unicode data perfectly. Which is what your screenshots
show.

Baruch


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-28 Thread Robin Haswell

Do your table columns show as having a the utf8_unicode_ci encoding?

-Rob

Baruch wrote:
 I tried the test program and it failed for me on MySQL 4.1.15-1 from
 the Debian package in testing. The config of MySQL is largely unchanged
 from the defaults it comes with in Debian.
 
 Matthew, did you try the test program and it worked? can you send me
 your mysql config?
 
 I have no problem with unicode other than this cutting of text, the app
 easily shows all unicode data perfectly. Which is what your screenshots
 show.
 
 Baruch
 
 
  

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-28 Thread Baruch

No. The default appears to be latin1.

I can't seem to find how to turn MySQL to default to utf-8, there is
the SET NAMES command, but there is no bvious way to use it in
SQLObject.

Baruch


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-28 Thread Robin Haswell

ALTER DATABASE `database` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci
ALTER TABLE `table` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci

I think there might be a command you need to run for the columns as
well, I'm not sure. I always use InnoDB which doesn't support column
character sets (I think).

I think all that's sufficient is to set the character set on the
database and then all tables created within will inherit that. I believe
in my.cnf you can do:

[mysqld]
default_character_set = utf8

I'm not sure though, I like to keep my servers as close to the reference
as possible to avoid obscure bugs when moving to third party servers.

-Rob

Baruch wrote:
 No. The default appears to be latin1.
 
 I can't seem to find how to turn MySQL to default to utf-8, there is
 the SET NAMES command, but there is no bvious way to use it in
 SQLObject.
 
 Baruch
 
 
  

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-28 Thread Baruch

Tried to change the database to utf8 with your commands and there is no
change with regard to my problems, the test program still fails for me.

I am using MyISAM, if only because that was the default. Can you try
the test program and see if it works for you on MySQL? I'd like to know
if it's my problem or a MySQL problem.

Baruch


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-28 Thread Robin Haswell

I can confirm that on:

MySQL 4.1.11-Debian_4sarge2-log

With the slightly modified test case:



#!/usr/bin/env python2.4
# -*- coding: utf-8 -*-

import sqlobject
connection = sqlobject.connectionForURI(mysql://*:[EMAIL PROTECTED]/*)
sqlobject.sqlhub.processConnection = connection

class MyTestClass(sqlobject.SQLObject):
class sqlmeta:
table = 'test_unicode_table'


test_column = sqlobject.UnicodeCol(length = 5)

#MyTestClass.dropTable()
#MyTestClass.createTable()

test123 = MyTestClass(test_column = u'\xe1\xe9\xed\xf3\xfa')
print test123
print test123.test_column



I get:

[EMAIL PROTECTED]:/www/tg$ ./test.py
Traceback (most recent call last):
  File ./test.py, line 18, in ?
test123 = MyTestClass(test_column = u'\xe1\xe9\xed\xf3\xfa')
  File
/usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1588-py2.4.egg/sqlobject/declarative.py,
line 92, in _wrapper
return_value = fn(self, *args, **kwargs)
  File
/usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1588-py2.4.egg/sqlobject/main.py,
line 1197, in __init__
self._create(id, **kw)
  File
/usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1588-py2.4.egg/sqlobject/main.py,
line 1224, in _create
self._SO_finishCreate(id)
  File
/usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1588-py2.4.egg/sqlobject/main.py,
line 1251, in _SO_finishCreate
self._init(id)
  File
/usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1588-py2.4.egg/sqlobject/main.py,
line 958, in _init
self._SO_selectInit(selectResults)
  File
/usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1588-py2.4.egg/sqlobject/main.py,
line 1149, in _SO_selectInit
colValue = col.to_python(colValue, self._SO_validatorState)
  File
/usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1588-py2.4.egg/sqlobject/col.py,
line 538, in to_python
return unicode(value, self.db_encoding)
  File /usr/lib/python2.4/encodings/utf_8.py, line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 4:
unexpected end of data
[EMAIL PROTECTED]:/www/tg$

However if I change the column length (using MySQL) to something even I get:

[EMAIL PROTECTED]:/www/tg$ ./test.py
MyTestClass 2L test_column=u'\xe1\xe9\xed\xf3'
áéíó

(Truncated) or the full version at column length 10

This was with my tables running as latin1 and utf8_unicode_ci

So at least with this version, it's a MySQL problem, which isn't
surprising if I'm honest. MySQL is a great database, but this is one of
those areas where people bitch about it being shit.

I think if you want to reliably run with Unicode, at least in MySQL 4.1,
you should use TEXT columns, or be prepared not to take in data more
than 127 characters in length for varchar(255).

I got this with InnoDB and MyISAM. MySQL 5 seems to make changes to
varchar (max length is now 2^32), so this may be fixed, but the manuals
for both major versions mention nothing about character sets.

This should be filed as a bug against MySQL, but you almost certainly
won't get it fixed in 4.1 (as MySQL can only use 1 byte to store the
length of a varchar column), and I bet you stand a slim chance of
getting it fixed in 5.x. I think internal functions make assumtions
about character offsets that don't adhere to character sets. This is why
MySQL columns have a collation, but not a characer set. A collation
means this is what I will use when I want to do case insensitive
sorting, not this *is* what format my data is in. MySQL is a high
performance DB that makes sacrifices for speed - this is one of those
sacrifices. I suggest you use PGSQL if you really need this, or use TEXT
columns instead (drop the length from your model definition).

Best of luck

-Rob

Baruch wrote:
 Tried to change the database to utf8 with your commands and there is no
 change with regard to my problems, the test program still fails for me.
 
 I am using MyISAM, if only because that was the default. Can you try
 the test program and see if it works for you on MySQL? I'd like to know
 if it's my problem or a MySQL problem.
 
 Baruch
 
 
  

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-28 Thread Jorge Godoy

Em Sexta 28 Abril 2006 14:07, Baruch escreveu:

 I do wonder how PostgreSQL handles this case? Does it not truncate the
 string, or does it truncate it smartly?

It does truncate if the string is bigger than the column.  In this case, as 
I've shown you, PostgreSQL does the right thing.

The thing is that the database should know what encode to use.  I haven't 
created an ISO-8859-1 to store this kind of data to see what happens...

But, definitely, if you specified an unicode database you should have no 
problems with your unicode data up to the length of the columns you 
specified. 

It is, to me, another of those really annoying and stupid bugs with MySQL.  
Another reason for me to keep it away from my projects.

-- 
Jorge Godoy  [EMAIL PROTECTED]


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-28 Thread Damjan

Your exmaple above works ok if you add:
connection.query('set names utf8')
just after  connection = ...

I've also tried to add
init_command=set names utf8 as a keyword argument to
  connection = sqlobject.connectionForURI(mysql://*:[EMAIL PROTECTED]/*)
which supposedly should've been passed to MySQLdb.connect, but that
didn't work, SQLObject seems to mangle it to set+names+utf8 for some
reason.

I also tried
#!/usr/bin/env python2.4
# -*- coding: utf-8 -*-

import sqlobject
connection =
sqlobject.connectionForURI(mysql://[EMAIL PROTECTED]/test)
connection.query('set names utf8')

sqlobject.sqlhub.processConnection = connection

class MyTestClass(sqlobject.SQLObject):
  class sqlmeta:
table = 'test_unicode_table'
  test_column = sqlobject.UnicodeCol(length = 5)

MyTestClass.dropTable()
MyTestClass.createTable()
test123 = MyTestClass(test_column =
u'\u0434\u0430\u043c\u0458\u0430\u043d')
print test123
print test123.test_column

(that's my name in cyrillic and it has 6 letters) but at the end I got
only 5 letters:
u'\u0434\u0430\u043c\u0458\u0430'

... which is good in a way since it's actually 12 bytes, so at least
the length=5 is handled properly.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-28 Thread Damjan

 It is, to me, another of those really annoying and stupid bugs with MySQL.
 Another reason for me to keep it away from my projects.

Having to handle different charsets certanly complicated MySQL a bit,
and a lot of programs/libraries aren't yet fully aware of this MySQL
feature.

But this is not a bug in MySQL, but just something you need to be aware
of... I bet there are a lot of similar situations with PGSQL too. The
thing is, you need to learn the tools you are working with.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-28 Thread Damjan

BTW
This is what is in my /etc/my.cnf

[client]
default-character-set = utf8

[mysqld]
character-set-server=utf8
collation-server=utf8_unicode_ci

So my MySQL defaults to unicode databasess and tables, and to the more
correct utf8_unicode_ci collation.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-28 Thread Robin Haswell

Ah your example is very interesting. Can you make a page on trac.turbogears.org 
including one of the 
tracebacks, so people can find the answer in the future?

-Rob

Damjan wrote:
 Your exmaple above works ok if you add:
 connection.query('set names utf8')
 just after  connection = ...
 
 I've also tried to add
 init_command=set names utf8 as a keyword argument to
   connection = sqlobject.connectionForURI(mysql://*:[EMAIL PROTECTED]/*)
 which supposedly should've been passed to MySQLdb.connect, but that
 didn't work, SQLObject seems to mangle it to set+names+utf8 for some
 reason.
 
 I also tried
 #!/usr/bin/env python2.4
 # -*- coding: utf-8 -*-
 
 import sqlobject
 connection =
 sqlobject.connectionForURI(mysql://[EMAIL PROTECTED]/test)
 connection.query('set names utf8')
 
 sqlobject.sqlhub.processConnection = connection
 
 class MyTestClass(sqlobject.SQLObject):
   class sqlmeta:
 table = 'test_unicode_table'
   test_column = sqlobject.UnicodeCol(length = 5)
 
 MyTestClass.dropTable()
 MyTestClass.createTable()
 test123 = MyTestClass(test_column =
 u'\u0434\u0430\u043c\u0458\u0430\u043d')
 print test123
 print test123.test_column
 
 (that's my name in cyrillic and it has 6 letters) but at the end I got
 only 5 letters:
 u'\u0434\u0430\u043c\u0458\u0430'
 
 ... which is good in a way since it's actually 12 bytes, so at least
 the length=5 is handled properly.
 
 
  

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-28 Thread Jorge Godoy

Em Sexta 28 Abril 2006 15:41, Robin Haswell escreveu:

 IMO doing the right thing would be an SQL error. The Zen of Python,
 errors shouldn't go unnoticed?

Except that the error is in MySQL, right? ;-)  Truncating in SQL columns 
should either be explicit or the server should raise an exception:

neo=# create table testest (testcol varchar(10));
CREATE TABLE
neo=# insert into testest (testcol) values ('12345678910');
ERRO:  valor muito longo para tipo character varying(10)
neo=# 

(ERROR: value too large for type character varying(10))

 Nah I disagree, MySQL is a great database as long as you are aware of the
 implications. MySQL tries to be an enterprise-class RDBMS, which it is not,
 however people who are familiar with it know the real truth: MySQL is
 blazingly fast BECAUSE of things like this. If you need 100% data integrity
 and advanced (read: rarely used) functionality then you use PG. If you want

I use PG ;-)  I need to trust that all data inside the database *is* valid.

 as fast as Oracle you use MySQL. If you want both, you use Oracle or DB2.

I'd disagree...  There are root servers (.org and .info) using PostgreSQL, 
there are none using MySQL.  Speed, in this case, is very important.

http://www.postgresql.org/about/users
and for Afilias article: 
http://www.computerworld.com.au/index.php?id=760310963

I couldn't find the degradation graphic of PostgreSQL and MySQL for millions 
of rows, but...

http://wiki.astrogrid.org/bin/view/Astrogrid/?topic=CrossMatchingReportsortcol=1table=1up=0
http://feedlounge.com/blog/2005/11/20/switched-to-postgresql/


Of course, we can keep discussing that and for each article I point you to, 
you can send me the double in favor of MySQL.  And then, I can double you.  
And ... :-)

Anyway, we're way off-topic here.  If you want, I don't mind discussing this 
in private.

 MySQL is the PHP of RDBMS, it's great for 99% of uses, but there are fringe
 cases where another database will be more appropriate. For me, this problem

How did you guess I don't like PHP as well? ;-)

 will not stop me using MySQL, I will just use TEXT columns when I want
 strings longer than 127 characters.

If it isn't bad for you, then I don't see any problem with that.  In fact, 
even for PG there are people that advocate the use of text columns anywhere 
you need something bigger than 1 char (some even for the 1 char column ;-)). 

 It should be important to note, by the way, that this poses a mild risk to
 applications: Someone could insert corrupt data into the database, which
 would result in the application crashing with a UnicodeDecodeError every
 time it tries to select that data.

This is also one of the reasons why integrity inside the database is crucial 
and required.  You have to trust that your data is safe and is correct, if 
the RDBMS doesn't provide you that, I'm sorry, but it is not a good choice.

 I think, a solution to the crashing would be for SQLObject to encode
 UnicodeCol characters in to UTF-16, which is a 2-byte character set? So if
 you had even column lengths, then there is no chance of MySQL truncating
 the data in the middle of a character.

Why penalize everyone that uses a database server that works by requiring 
twice the amount of space for a string like this?  UTF-8 is a very good 
choice since it only uses 2-byte chars when needed.  You save disk space, 
memory and other.

I am of the opinion that what is broken is what should be fixed, no matter how 
many workarounds you can put in place to circumvent those failures.

I'm sorry for expressing my opinion and taking this debate to an off-topic 
thread.  I'll stop here and if you want, we can talk in private.

-- 
Jorge Godoy  [EMAIL PROTECTED]


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-28 Thread Jan Niklas Fingerle

Hi,

jvanasco wrote:
 wrong?  didn't we just say the exact same thing?

no, we didn't. You said

| unicode doesn't store every charcter with the same required space

and I replied

| Unicode doesn't store characters at all.

In fact, there are many unicode *encodings* that use the same amount
of space for every character.

Cheers,
  --Jan Niklas

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread Michele Cella

jvanasco wrote:
 Are you sure that truncation is done in sqlobject?  what database are
 you using?  if its mysql, chances are the truncation happened there (
 you can set it to use 'traditional sql' to give you warnings on that if
 you're running mysql5+)  mysql is awful about that.

 Aside from that, I think the issue is more of a problem with Unicode --
 unicode doesn't store every charcter with the same required space (
 http://en.wikipedia.org/wiki/UTF-8 ).  The ascii strings aren't
 re-encoded for legacy support -- so if you wanted a UnicodeCol of 100
 characters , you'd really have to set the schema to be ( 100 x Max
 Storage Requirement for UTF Version )

Most probably an useless reply but I've just found this link today
regarding python and unicode:

http://dalchemy.com/opensource/unicodedoc/

Ciao
Michele


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread Jan Niklas Fingerle

Hi,

jvanasco wrote:
 Aside from that, I think the issue is more of a problem with Unicode
 -- unicode doesn't store every charcter with the same required space
 ( http://en.wikipedia.org/wiki/UTF-8 ).  

wrong. Unicode doesn't store characters at all. It's an UTF-8 issue, but
UTF-8 is just one *encoding* for unicode characters.

Under those circumstances where the byte length has to be calculated in
advance without knowing content, one should use an encoding, that uses
the same number of bytes for every unicode character. UTF-16 would be a
good coice. On the other hand UTF-8 is better for legacy support: ASCII
strings aren't any different in UTF-8.

What I don't know is how to tell SQLObject (if at all possible) to use
UTF-16 instead of UTF-8...

Cheers,
  --Jan Niklas

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread Tim Lesher

On 4/27/06, Jan Niklas Fingerle [EMAIL PROTECTED] wrote:
 What I don't know is how to tell SQLObject (if at all possible) to use
 UTF-16 instead of UTF-8...

I haven't used it myself, but UnicodeCol accepts a dbEncoding
parameter, which defaults to UTF-8.

--
Tim Lesher [EMAIL PROTECTED]

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread jvanasco

wrong?  didn't we just say the exact same thing?


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread Baruch

 What I don't know is how to tell SQLObject (if at all possible) to use
 UTF-16 instead of UTF-8...

 I haven't used it myself, but UnicodeCol accepts a dbEncoding
 parameter, which defaults to UTF-8.

Seems like that should be the solution, looking at the source there is
indeed a dbEncoding attribute to UnicodeCol and using UTF-16 will make
more sense than UTF-8 for me. The data I have is always kept in Unicode
inside my app anyway and I don't care much about non-Python code
reading the data.

I still feel that SQLObject is not doing the obvious thing here, it
should use the UTF-16 encoding from the start especially when it has
this comment for UnicodeCol: Note: parameters in queries will not be
automatically encoded, so if you do a query matching a UnicodeCol
column you must apply the encoding yourself. This will bite the ass of
anyone who naively uses UnicodeCol's and thinks that his queries will
just work properly.

I'm pretty sure the Identity system uses UnicodeCol throughout, did
anyone really checked if they work for non-ascii characters? esepcially
those cases where the unicode string is encoded into a too long
username string?

Baruch


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread Tim Lesher

On 4/27/06, Baruch [EMAIL PROTECTED] wrote:
 I still feel that SQLObject is not doing the obvious thing here, it
 should use the UTF-16 encoding from the start especially when it has
 this comment for UnicodeCol: Note: parameters in queries will not be
 automatically encoded, so if you do a query matching a UnicodeCol
 column you must apply the encoding yourself. This will bite the ass of
 anyone who naively uses UnicodeCol's and thinks that his queries will
 just work properly.

This is just conjecture, but on most Python distributions, the
internal representation for a ufoo is UTF-8, so that's probably why
SQLObject defaults to it.

I'm guessing that if you use the default UTF-8 encoding for a column
and pass a Python Unicode string in a select, the comparison will
work; if the SQLObject encoding is different from the internal Python
encoding, it won't.

Sounds like an important unit test to write, though.
--
Tim Lesher [EMAIL PROTECTED]

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread Baruch

Still, whoever uses UnicodeCol should be wary of the case when the
string is truncated. It leads to exceptions you don't normally expect.

Baruch


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread Jorge Godoy

Em Quinta 27 Abril 2006 19:23, Baruch escreveu:
 Still, whoever uses UnicodeCol should be wary of the case when the
 string is truncated. It leads to exceptions you don't normally expect.

I'm using UnicodeCol here with PostgreSQL and I've just written a test case.  
Lets see what happens here:

===
#!/bin/python
# -*- coding: utf-8 -*-

import sqlobject
#!/bin/python
# -*- coding: utf-8 -*-

import sqlobject

conn_str = 'postgres://godoy:[EMAIL PROTECTED]/test_unicode'
connection = sqlobject.connectionForURI(conn_str)
sqlobject.sqlhub.processConnection = connection

class MyTestClass(sqlobject.SQLObject):
class sqlmeta:
table = 'test_unicode_table'

test_column = sqlobject.UnicodeCol(length = 5)


test123 = MyTestClass(test_column = u'áéíóú')
print test123
print test123.test_column
===

And here's the output

===
[EMAIL PROTECTED] ~/tmp/tempo % python tempo.py
MyTestClass 1L test_column=u'\\xe1\\xe9\\xed\\xf...'
áéíóú
[EMAIL PROTECTED] ~/tmp/tempo % 
===

Just to be sure, here's what I get in PostgreSQL after a few runs:

===
test_unicode=# select * from test_unicode_table;
 id | test_column 
+-
  1 | áéíóú
  2 | áéíóú
  3 | áéíóú
  4 | áéíóú
  5 | áéíóú
  6 | áéíóú
(6 registros)

test_unicode=# 
===

And here's the table structure to corroborate that the size was declared 
correctly:

===
test_unicode=# \d test_unicode_table
  Tabela public.test_unicode_table
   Coluna| Tipo |  Modificadores
  
-+--+-
 id  | integer  | not null default 
nextval('test_unicode_table_id_seq'::regclass)
 test_column | character varying(5) | 
Índices:
test_unicode_table_pkey PRIMARY KEY, btree (id)

test_unicode=# 
===


So, your problem is not with SQL Object but with your database messing with 
things.

Of course, I have setup mine to work with Unicode:

===
test_unicode=# \set
VERSION = 'PostgreSQL 8.1.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 
4.0.2 20050901 (prerelease) (SUSE Linux)'
AUTOCOMMIT = 'on'
VERBOSITY = 'default'
PROMPT1 = '%/%R%# '
PROMPT2 = '%/%R%# '
PROMPT3 = ' '
DBNAME = 'test_unicode'
USER = 'godoy'
HOST = 'internet'
PORT = '5432'
ENCODING = 'UTF8'
HISTSIZE = '500'
LASTOID = '0'
test_unicode=# 
===



-- 
Jorge Godoy  [EMAIL PROTECTED]

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread Bob Ippolito


On Apr 27, 2006, at 2:45 PM, Tim Lesher wrote:


 On 4/27/06, Baruch [EMAIL PROTECTED] wrote:
 I still feel that SQLObject is not doing the obvious thing here, it
 should use the UTF-16 encoding from the start especially when it has
 this comment for UnicodeCol: Note: parameters in queries will not be
 automatically encoded, so if you do a query matching a UnicodeCol
 column you must apply the encoding yourself. This will bite the  
 ass of
 anyone who naively uses UnicodeCol's and thinks that his queries will
 just work properly.

 This is just conjecture, but on most Python distributions, the
 internal representation for a ufoo is UTF-8, so that's probably why
 SQLObject defaults to it.

Uh, no.  Totally wrong.  Exactly zero Python representations use  
UTF-8 internally.  They either use UCS-2 or UCS-4.

The reason for the SQLObject default is that UTF-8 is the most likely  
unicode codec for unicode data coming to and from databases (and also  
network traffic).  I know PostgreSQL deals in UTF-8, and I'd guess  
that MySQL is the same.

-bob


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread Matthew Bevan

 I'm using UnicodeCol here with PostgreSQL and I've just written a test
 case. Lets see what happens here:

I, too, am having no trouble with SQLObject and Unicode text, over MySQL 4.x, 
even.  See the following two screenshots:

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread Matthew Bevan

 I'm using UnicodeCol here with PostgreSQL and I've just written a test
 case. Lets see what happens here:

I, too, am having no trouble with SQLObject and Unicode text, over MySQL 4.x, 
even.  See the following two screenshots:

Accented Language Test: http://flickr.com/photos/gothcandy/135991899/
Japanese Language Test: http://flickr.com/photos/gothcandy/135991900/

(Sorry for the broken first post... wrong key! XD )

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread Jorge Godoy

Em Quinta 27 Abril 2006 20:40, Matthew Bevan escreveu:
  I'm using UnicodeCol here with PostgreSQL and I've just written a test
  case. Lets see what happens here:

 I, too, am having no trouble with SQLObject and Unicode text, over MySQL
 4.x, even.  See the following two screenshots:

So his problem is either with the configuration of his server or with the 
database server he's using (if it is not PostgreSQL and MySQL...  I believe 
SQLite works too.)

-- 
Jorge Godoy  [EMAIL PROTECTED]


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread Tim Lesher

On 4/27/06, Bob Ippolito [EMAIL PROTECTED] wrote:
 Uh, no.  Totally wrong.  Exactly zero Python representations use
 UTF-8 internally.  They either use UCS-2 or UCS-4.

I'm absolutely sure that's not correct.  Check PEPs 100 and 261...

Hmm.

Looks like you're absolutely correct. :-)

For some reason I was thinking that it was originally UCS-2, but
PEP261 added UTF-8 (rather than UCS-4).  Thanks for the heads-up.
--
Tim Lesher [EMAIL PROTECTED]

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread Kevin Dangoor

On 4/27/06, Jorge Godoy [EMAIL PROTECTED] wrote:
 test_unicode=# select * from test_unicode_table;
  id | test_column
 +-
   1 | áéíóú
   2 | áéíóú
   3 | áéíóú
   4 | áéíóú
   5 | áéíóú
   6 | áéíóú
 (6 registros)

Do those characters require more than 1 byte for UTF-8 encoding?

Kevin

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread Bob Ippolito


On Apr 27, 2006, at 7:09 PM, Kevin Dangoor wrote:


 On 4/27/06, Jorge Godoy [EMAIL PROTECTED] wrote:
 test_unicode=# select * from test_unicode_table;
  id | test_column
 +-
   1 | áéíóú
   2 | áéíóú
   3 | áéíóú
   4 | áéíóú
   5 | áéíóú
   6 | áéíóú
 (6 registros)

 Do those characters require more than 1 byte for UTF-8 encoding?

Yeah...

  len(unicodedata.normalize('NFC', u'\xe1\xe9\xed\xf3\xfa').encode 
('utf-8'))
10
  len(unicodedata.normalize('NFD', u'\xe1\xe9\xed\xf3\xfa').encode 
('utf-8'))
15

-bob


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: SQLObject rant: unicode support sucks

2006-04-27 Thread Jorge Godoy

Em Quinta 27 Abril 2006 23:26, Bob Ippolito escreveu:
 On Apr 27, 2006, at 7:09 PM, Kevin Dangoor wrote:
  Do those characters require more than 1 byte for UTF-8 encoding?

 Yeah...

   len(unicodedata.normalize('NFC', u'\xe1\xe9\xed\xf3\xfa').encode

 ('utf-8'))
 10

   len(unicodedata.normalize('NFD', u'\xe1\xe9\xed\xf3\xfa').encode

 ('utf-8'))
 15

Thanks Bob!


I was testing and trying to solve some problems with the new decorator syntax 
and haven't seen this before.

-- 
Jorge Godoy  [EMAIL PROTECTED]


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---