subject:"\[sqlalchemy\] Re\: convert_unicode=True results in double encoding"

[sqlalchemy] Re: convert_unicode=True results in double encoding

2006-11-13 Thread Shannon -jj Behrens


On 11/12/06, Michael Bayer [EMAIL PROTECTED] wrote:
 since create_engine deals with class constructors, i went with this
 approach:

 def get_cls_kwargs(cls):
 return the full set of legal kwargs for the given cls
 kw = []
 for c in cls.__mro__:
 cons = c.__init__
 if hasattr(cons, 'func_code'):
 for vn in cons.func_code.co_varnames:
 if vn != 'self':
 kw.append(vn)
 return kw

 so now you get these luxurious TypeErrors if you send any combination
 of invalid arguments:

Thanks!  You'll probably never hear about this again, but I bet this
will save many people hours of frustration. :)

Best Regards,
-jj

-- 
http://jjinux.blogspot.com/

--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---

[sqlalchemy] Re: convert_unicode=True results in double encoding

2006-11-07 Thread Shannon -jj Behrens


The following results in correct data going into and coming out of the
database, but the data in the database itself looks double encoded:

import MySQLdb


connection = MySQLdb.connect(host=fmapp03, user=foxmarks,
 passwd='ChunkyBacon', db=users)
cursor = connection.cursor()
cursor.execute(
INSERT INTO users
VALUES (12345678, 'jjtest1234', '[EMAIL PROTECTED]', 'pass', %s,
'asdf', 'N/A', 'N/A', 0, NOW(), NOW())
, ('\xc3\xa7',))
cursor.execute(SELECT * FROM users WHERE id = 12345678)
row = cursor.fetchone()
print `row`
connection.commit()

The following results in correct data going into and out of the
database, but does not result in the data in the database itself being
double encoded:

import MySQLdb


connection = MySQLdb.connect(host=fmapp03, user=foxmarks,
 passwd='ChunkyBacon', db=users,
 charset='utf8')
cursor = connection.cursor()
cursor.execute(
INSERT INTO users
VALUES (12345678, 'jjtest1234', '[EMAIL PROTECTED]', 'pass', %s,
'asdf', 'N/A', 'N/A', 0, NOW(), NOW())
, (u'\xe7',))
cursor.execute(SELECT * FROM users WHERE id = 12345678)
row = cursor.fetchone()
print `row`
connection.commit()

It looks like for the version of MySQLdb I'm using, 1.2.1p2, a lot of
this stuff has changed. If you don't let MySQLdb take care of encoding
and decoding, it ends up double encoding things in the database. This
must be a bug in MySQLdb. The clear way to work around the bug is to
let the driver take care of encoding and decoding instead of
SQLAlchemy.

Yuck,
-jj

--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---

[sqlalchemy] Re: convert_unicode=True results in double encoding

2006-11-07 Thread Shannon -jj Behrens


On 11/7/06, Michael Bayer [EMAIL PROTECTED] wrote:
 yeah, or use introspection to consume the args, i thought of that too.
 i guess we can do that.  of course i dont like having to go there but i
 guess not a big deal.

 as far as kwargs collisions, yeah, thats a potential issue too.  but
 the number of dialects/pools is not *that* varied, theyre generally
 pretty conservative with the kwargs.  if we broke out create_engine()
 to take in dialect, pool, etc., well id probably just create a new
 function for that first off so create_engine() can just remain...im not
 sure if i want to force users to be that exposed to the details as
 people might get a little intimidated by all that.

By the way, as if to prove my point, I had another problem.  I was
trying to pass a create_args keyword argument per the example
http://www.sqlalchemy.org/docs/dbengine.myt#dbengine_establishing_custom.
 SQLAlchemy didn't complain.  My code didn't even complain when I used
a cedilla (ç).  My code crashed when I used Japanese.  It turns out
that MySQLdb was defaulting to Latin-1 or something like that.
SQLAlchemy was ignoring my create_args keyword argument.  It turns out
that that example is wrong (should I file a bug?).  The documentation
above it is correct; the correct keyword argument is named
connect_args.

Best Regards,
-jj

-- 
http://jjinux.blogspot.com/

--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---

[sqlalchemy] Re: convert_unicode=True results in double encoding

2006-11-07 Thread Michael Bayer


ugh, well i fixed that one.  random doc bugs you can file tickets  
for, or ill just give you commit access if you want to knock some out  
yourself.



--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---

[sqlalchemy] Re: convert_unicode=True results in double encoding

2006-11-04 Thread Michael Bayer



Shannon -jj Behrens wrote:
 I'm using convert_unicode=True.  Everything is fine as long as I'm the
 one reading and writing the data.  However, if I look at what's
 actually being stored in the database, it's like the data has been
 encoded twiced.  If I switch to use_unicode=True, which I believe is
 MySQL specific, things work just fine and what's being stored in the
 database looks correct.

yes, if mysql client lib is encoding, and SA is also encoding, the data
will get encoded twice.  im not familiar with how i could look at the
encoded data to tell if it was already encoded (and not sure if i
should be...the unicode encoding option should only be enabled in one
place, not two)


 I started looking through the SQLAlchemy code, and I came across this:

 def convert_bind_param(self, value, dialect):
 if not dialect.convert_unicode or value is None or not
 isinstance(value, unicode):
 return value
 else:
 return value.encode(dialect.encoding)
 def convert_result_value(self, value, dialect):
 if not dialect.convert_unicode or value is None or
 isinstance(value, unicode):
 return value
 else:
 return value.decode(dialect.encoding)

 The logic looks backwards.  It says, If it's not a unicode object,
 return it.  Otherwise, encode it.  Later, If it is a unicode object,
 return it.  Otherwise decode it.

sending unicode values to databases whose client APIs dont handle
unicode involves taking a python unicode object from the application,
encoding it into an encoded series of bytes, and sending it to the
database.  receieving a result value involves taking the encoded series
of bytes and decoding into a unicode object.  so you have *non* unicode
instances going into the DB, and *non* unicode coming out - the DBAPI
is assumed to not have any idea what a python unicode object is (such
as pscopg's).

We've been doing the unicode thing for a while now, and you should
notice that we have unit tests for just about every function in SA,
especially important ones like this.  the unicode unit test runs
unicode and raw encoded values in and out in numerous ways, which pass
for at least mysql,sqlite, postgres, oracle, and ms-sql.  we have had
some people having issues with MySQL specifically, which seems to be
because some folks have a mysql config that is stuck in convert
unicode mode and experience the double-encoding issue.  the one
improvement that could be made here is for MySQL to provide a
subclassed unicode type that disables conversion if the dialect is
known to have convert_unicode=True alreadythen again i sort of like
that this forces people to understand their database config.


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---

[sqlalchemy] Re: convert_unicode=True results in double encoding

2006-11-04 Thread Michael Bayer



Shannon -jj Behrens wrote:
 Changing from convert_unicode=True to use_unicode=True doesn't do what
 you'd expect.  SQLAlchemy is passing keyword arguments all over the
 place, and use_unicode actually gets ignored.  minor rantI
 personally think that you should be strict *somewhere* when you're
 passing around keyword arguments.  I've been bitten in this way too
 many times.  Unknown keyword arguments should result in
 exceptions./minor rant

uhh.where is use_unicode=True documented as a create_engine
keyword ?  if used as DBAPI-specific keywords are documented (in the
FAQ, in the docs) it works just fine.  We do try be strict about kwargs
as much as possible but this is a case where its difficult since
create_engine() is passing along kwargs to any number of dialect
implementations.  ill add a ticket to put in more kwargs.pop() and
remove **kwargs from the base dialect class.

from sqlalchemy import *

import MySQLdb as mysql

class FakeMySQLDBAPI(object):
def __init__(self):
self.paramstyle = mysql.paramstyle
def connect(self, *args, **kwargs):
print Args: + repr(args) + Kwargs: + repr(kwargs)
return mysql.connect(*args, **kwargs)

e = create_engine('mysql://scott:[EMAIL PROTECTED]/test',
connect_args={'use_unicode':True}, module=FakeMySQLDBAPI())
c = e.connect()

Args:()Kwargs:{'passwd': 'tiger', 'host': 'localhost', 'db': 'test',
'user': 'scott', 'use_unicode': True}

 for use_unicode in (True, False):
 connection = MySQLdb.connect(host=localhost, user=user,
  passwd='dataase', db=users,
  use_unicode=use_unicode)
 cursor = connection.cursor()
 cursor.execute(select firstName from users where username='test')
 row = cursor.fetchone()
 print use_unicode:%s %r % (use_unicode, row)

 I get

 use_unicode:True (u'test \xc3\xa7',)
 use_unicode:False ('test \xc3\xa7',)

 Notice the result is the same, but one has a unicode object and the
 other doesn't.  Notice that it's \xc3\xa7 each time?  It shouldn't be.

yeah id say screw using use_unicode, make sure your MySQL isnt doing
any unicode conversion (check out all your my.cnf files etc), and use
SA's convert_unicode or Unicode type instead.


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---

[sqlalchemy] Re: convert_unicode=True results in double encoding

2006-11-04 Thread Michael Bayer


Shannon -

im trying to figure a way to make create_engine() behave the way you
want.  but let me show you a highly simplified version of how
create_engine() works.  how would you change it ?

def create_engine(url, **kwargs):
pool = pool.Pool(**kwargs)
dialect = module.Dialect(url, **kwargs)
engine = Engine(dialect, pool, **kwargs)
return engine

As you can see, the returned Engine is composed of itself, a connection
pool, and a Dialect.  the **kwargs you pass to create_engine() can be
used by any three of those objects.  Also, there are many different
types of Pool, Dialect, and Engine which take different arguments..so
placing explicit awareness of every argument within create_engine
itself is excessively brittle.

since the constructors of all the classes declare their keyword
arguments as normal inline arguments, we need to pass them around using
** and therefore the kwargs dictionary cannot be consumed.  if they
could be consumed, we could then check at the end that the kwargs dict
is in fact empty so we know that no unknown arguments were sent.

but as it is, the full dict gets passed to all three types of objects
where they have to just ignore the kwargs they dont know about.  this
is not unlike WSGI where it passes around an environ dictionary, and
its fully likely that many erroneous keyword arguments could be present
in that dictionary as well.

the only way i can see to fix this is to modify the constructors of all
the objects to take a single dictionary argument from which they
consume their arguments, and we dont pass the arguments with the **
operator anymore.   which means the nice constructors we have which
clearly lay out the arguments they take now will just have one
non-self-documenting arguments dictionary.  is that the only option ?


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---

[sqlalchemy] Re: convert_unicode=True results in double encoding

2006-11-03 Thread Shannon -jj Behrens


On 11/3/06, Shannon -jj Behrens [EMAIL PROTECTED] wrote:
 I'm using convert_unicode=True.  Everything is fine as long as I'm the
 one reading and writing the data.  However, if I look at what's
 actually being stored in the database, it's like the data has been
 encoded twiced.  If I switch to use_unicode=True, which I believe is
 MySQL specific, things work just fine and what's being stored in the
 database looks correct.

 I started looking through the SQLAlchemy code, and I came across this:

 def convert_bind_param(self, value, dialect):
 if not dialect.convert_unicode or value is None or not
 isinstance(value, unicode):
 return value
 else:
 return value.encode(dialect.encoding)
 def convert_result_value(self, value, dialect):
 if not dialect.convert_unicode or value is None or
 isinstance(value, unicode):
 return value
 else:
 return value.decode(dialect.encoding)

 The logic looks backwards.  It says, If it's not a unicode object,
 return it.  Otherwise, encode it.  Later, If it is a unicode object,
 return it.  Otherwise decode it.

 Am I correct that this is backwards?  If so, this is going to be
 *painful* to update all the databases out there!

Ok, MySQLdb doesn't have a mailing list, so I can't ask there.  Here
are some things I've learned:

Changing from convert_unicode=True to use_unicode=True doesn't do what
you'd expect.  SQLAlchemy is passing keyword arguments all over the
place, and use_unicode actually gets ignored.  minor rantI
personally think that you should be strict *somewhere* when you're
passing around keyword arguments.  I've been bitten in this way too
many times.  Unknown keyword arguments should result in
exceptions./minor rant

Anyway, I'm still a bit worried about that code above like I said.
However, here's what's even scarier.  If I use the following code:

import MySQLdb


for use_unicode in (True, False):
connection = MySQLdb.connect(host=localhost, user=user,
 passwd='dataase', db=users,
 use_unicode=use_unicode)
cursor = connection.cursor()
cursor.execute(select firstName from users where username='test')
row = cursor.fetchone()
print use_unicode:%s %r % (use_unicode, row)

I get

use_unicode:True (u'test \xc3\xa7',)
use_unicode:False ('test \xc3\xa7',)

Notice the result is the same, but one has a unicode object and the
other doesn't.  Notice that it's \xc3\xa7 each time?  It shouldn't be.
 Consider:

 s = 'test \xc3\xa7'
 s.decode('utf-8')
u'test \xe7'

*It's creating a unicode object without actually doing any decoding!*

This is somewhere low level.  Like I said, this is lower level than
SQLAlchemy, but I don't have anywhere else to turn.

SQLAlchemy: 0.2.8
MySQLdb: 1.36.2.4
mysql client and server: 5.0.22
Ubuntu: 6.0.6

Help!
-jj

-- 
http://jjinux.blogspot.com/

--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---

[sqlalchemy] Re: convert_unicode=True results in double encoding

[sqlalchemy] Re: convert_unicode=True results in double encoding

[sqlalchemy] Re: convert_unicode=True results in double encoding

[sqlalchemy] Re: convert_unicode=True results in double encoding

[sqlalchemy] Re: convert_unicode=True results in double encoding

[sqlalchemy] Re: convert_unicode=True results in double encoding

[sqlalchemy] Re: convert_unicode=True results in double encoding

[sqlalchemy] Re: convert_unicode=True results in double encoding

8 matches

Site Navigation

Mail list logo

Footer information