[sqlalchemy] Re: convert_unicode=True results in double encoding

Shannon -jj Behrens Fri, 03 Nov 2006 18:21:35 -0800

On 11/3/06, Shannon -jj Behrens <[EMAIL PROTECTED]> wrote:
> I'm using convert_unicode=True.  Everything is fine as long as I'm the
> one reading and writing the data.  However, if I look at what's
> actually being stored in the database, it's like the data has been
> encoded twiced.  If I switch to use_unicode=True, which I believe is
> MySQL specific, things work just fine and what's being stored in the
> database looks correct.
>
> I started looking through the SQLAlchemy code, and I came across this:
>
>     def convert_bind_param(self, value, dialect):
>         if not dialect.convert_unicode or value is None or not
> isinstance(value, unicode):
>             return value
>         else:
>             return value.encode(dialect.encoding)
>     def convert_result_value(self, value, dialect):
>         if not dialect.convert_unicode or value is None or
> isinstance(value, unicode):
>             return value
>         else:
>             return value.decode(dialect.encoding)
>
> The logic looks backwards.  It says, "If it's not a unicode object,
> return it.  Otherwise, encode it."  Later, "If it is a unicode object,
> return it.  Otherwise decode it."
>
> Am I correct that this is backwards?  If so, this is going to be
> *painful* to update all the databases out there!


Ok, MySQLdb doesn't have a mailing list, so I can't ask there.  Here
are some things I've learned:

Changing from convert_unicode=True to use_unicode=True doesn't do what
you'd expect.  SQLAlchemy is passing keyword arguments all over the
place, and use_unicode actually gets ignored.  <minor rant>I
personally think that you should be strict *somewhere* when you're
passing around keyword arguments.  I've been bitten in this way too
many times.  Unknown keyword arguments should result in
exceptions.</minor rant>

Anyway, I'm still a bit worried about that code above like I said.
However, here's what's even scarier.  If I use the following code:

import MySQLdb


for use_unicode in (True, False):
    connection = MySQLdb.connect(host="localhost", user="user",
                                 passwd='dataase', db="users",
                                 use_unicode=use_unicode)
    cursor = connection.cursor()
    cursor.execute("select firstName from users where username='test'")
    row = cursor.fetchone()
    print "use_unicode:%s %r" % (use_unicode, row)

I get

use_unicode:True (u'test \xc3\xa7',)
use_unicode:False ('test \xc3\xa7',)

Notice the result is the same, but one has a unicode object and the
other doesn't.  Notice that it's \xc3\xa7 each time?  It shouldn't be.
 Consider:

>>> s = 'test \xc3\xa7'
>>> s.decode('utf-8')
u'test \xe7'

*It's creating a unicode object without actually doing any decoding!*

This is somewhere low level.  Like I said, this is lower level than
SQLAlchemy, but I don't have anywhere else to turn.

SQLAlchemy: 0.2.8
MySQLdb: 1.36.2.4
mysql client and server: 5.0.22
Ubuntu: 6.0.6

Help!
-jj

-- 
http://jjinux.blogspot.com/

--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: convert_unicode=True results in double encoding

Reply via email to