[sqlalchemy] Re: convert_unicode=True results in double encoding

Andrija Zarić Sat, 04 Nov 2006 06:07:57 -0800

Logic is just fine... I guess you forgot to consider first param "not
dialect.convert_unicode".


On the sidenote, the doc string
(http://www.sqlalchemy.org/docs/docstrings.myt#docstrings_sqlalchemy.engine)
says that:

convert_unicode=False : True if unicode conversion should be applied
to all str types

which is little bit misleading -- it will be applied on String types
(or similar), but not on _all_ str. E.g. this will fail on both assert
tests:

db.engine.dialect.convert_unicode = True
rawdata = 'Alors vous imaginez ma surprise, au lever du jour, quand
une dr\xc3\xb4le de petit voix m\xe2\x80\x99a r\xc3\xa9veill\xc3\xa9.
Elle disait: \xc2\xab S\xe2\x80\x99il vous pla\xc3\xaet\xe2\x80\xa6
dessine-moi un mouton! \xc2\xbb\n'
unicodedata = rawdata.decode('utf-8')
unicode_table.insert().execute(unicode_data=unicodedata, plain_data=rawdata)
conn = db.engine.connect()
x = conn.execute("select * from unicode_table").fetchone()
self.assert_(isinstance(x['unicode_data'], unicode) and
x['unicode_data'] == unicodedata)
self.assert_(isinstance(x['plain_data'], unicode) and x['plain_data']
== unicodedata)

so if you are using engine/connection directly, you must "manually" do
the conversion if you need one.

For the MySQL problem, consider this:

>>> s = u'test \xc3\xa7'
>>> s.decode('utf-8')
u'test \xc3\xa7'

so the question is, how did you put the data into database?



On 04/11/06, Shannon -jj Behrens <[EMAIL PROTECTED]> wrote:
>
> On 11/3/06, Shannon -jj Behrens <[EMAIL PROTECTED]> wrote:
> > I'm using convert_unicode=True.  Everything is fine as long as I'm the
> > one reading and writing the data.  However, if I look at what's
> > actually being stored in the database, it's like the data has been
> > encoded twiced.  If I switch to use_unicode=True, which I believe is
> > MySQL specific, things work just fine and what's being stored in the
> > database looks correct.
> >
> > I started looking through the SQLAlchemy code, and I came across this:
> >
> >     def convert_bind_param(self, value, dialect):
> >         if not dialect.convert_unicode or value is None or not
> > isinstance(value, unicode):
> >             return value
> >         else:
> >             return value.encode(dialect.encoding)
> >     def convert_result_value(self, value, dialect):
> >         if not dialect.convert_unicode or value is None or
> > isinstance(value, unicode):
> >             return value
> >         else:
> >             return value.decode(dialect.encoding)
> >
> > The logic looks backwards.  It says, "If it's not a unicode object,
> > return it.  Otherwise, encode it."  Later, "If it is a unicode object,
> > return it.  Otherwise decode it."
> >
> > Am I correct that this is backwards?  If so, this is going to be
> > *painful* to update all the databases out there!
>
> Ok, MySQLdb doesn't have a mailing list, so I can't ask there.  Here
> are some things I've learned:
>
> Changing from convert_unicode=True to use_unicode=True doesn't do what
> you'd expect.  SQLAlchemy is passing keyword arguments all over the
> place, and use_unicode actually gets ignored.  <minor rant>I
> personally think that you should be strict *somewhere* when you're
> passing around keyword arguments.  I've been bitten in this way too
> many times.  Unknown keyword arguments should result in
> exceptions.</minor rant>
>
> Anyway, I'm still a bit worried about that code above like I said.
> However, here's what's even scarier.  If I use the following code:
>
> import MySQLdb
>
>
> for use_unicode in (True, False):
>     connection = MySQLdb.connect(host="localhost", user="user",
>                                  passwd='dataase', db="users",
>                                  use_unicode=use_unicode)
>     cursor = connection.cursor()
>     cursor.execute("select firstName from users where username='test'")
>     row = cursor.fetchone()
>     print "use_unicode:%s %r" % (use_unicode, row)
>
> I get
>
> use_unicode:True (u'test \xc3\xa7',)
> use_unicode:False ('test \xc3\xa7',)
>
> Notice the result is the same, but one has a unicode object and the
> other doesn't.  Notice that it's \xc3\xa7 each time?  It shouldn't be.
>  Consider:
>
> >>> s = 'test \xc3\xa7'
> >>> s.decode('utf-8')
> u'test \xe7'
>
> *It's creating a unicode object without actually doing any decoding!*
>
> This is somewhere low level.  Like I said, this is lower level than
> SQLAlchemy, but I don't have anywhere else to turn.
>
> SQLAlchemy: 0.2.8
> MySQLdb: 1.36.2.4
> mysql client and server: 5.0.22
> Ubuntu: 6.0.6
>
> Help!
> -jj
>
> --
> http://jjinux.blogspot.com/
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: convert_unicode=True results in double encoding

Reply via email to