[sqlalchemy] Re: convert_unicode=True results in double encoding
On 11/12/06, Michael Bayer [EMAIL PROTECTED] wrote: since create_engine deals with class constructors, i went with this approach: def get_cls_kwargs(cls): return the full set of legal kwargs for the given cls kw = [] for c in cls.__mro__: cons = c.__init__ if hasattr(cons, 'func_code'): for vn in cons.func_code.co_varnames: if vn != 'self': kw.append(vn) return kw so now you get these luxurious TypeErrors if you send any combination of invalid arguments: Thanks! You'll probably never hear about this again, but I bet this will save many people hours of frustration. :) Best Regards, -jj -- http://jjinux.blogspot.com/ --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: convert_unicode=True results in double encoding
The following results in correct data going into and coming out of the database, but the data in the database itself looks double encoded: import MySQLdb connection = MySQLdb.connect(host=fmapp03, user=foxmarks, passwd='ChunkyBacon', db=users) cursor = connection.cursor() cursor.execute( INSERT INTO users VALUES (12345678, 'jjtest1234', '[EMAIL PROTECTED]', 'pass', %s, 'asdf', 'N/A', 'N/A', 0, NOW(), NOW()) , ('\xc3\xa7',)) cursor.execute(SELECT * FROM users WHERE id = 12345678) row = cursor.fetchone() print `row` connection.commit() The following results in correct data going into and out of the database, but does not result in the data in the database itself being double encoded: import MySQLdb connection = MySQLdb.connect(host=fmapp03, user=foxmarks, passwd='ChunkyBacon', db=users, charset='utf8') cursor = connection.cursor() cursor.execute( INSERT INTO users VALUES (12345678, 'jjtest1234', '[EMAIL PROTECTED]', 'pass', %s, 'asdf', 'N/A', 'N/A', 0, NOW(), NOW()) , (u'\xe7',)) cursor.execute(SELECT * FROM users WHERE id = 12345678) row = cursor.fetchone() print `row` connection.commit() It looks like for the version of MySQLdb I'm using, 1.2.1p2, a lot of this stuff has changed. If you don't let MySQLdb take care of encoding and decoding, it ends up double encoding things in the database. This must be a bug in MySQLdb. The clear way to work around the bug is to let the driver take care of encoding and decoding instead of SQLAlchemy. Yuck, -jj --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: convert_unicode=True results in double encoding
On 11/7/06, Michael Bayer [EMAIL PROTECTED] wrote: yeah, or use introspection to consume the args, i thought of that too. i guess we can do that. of course i dont like having to go there but i guess not a big deal. as far as kwargs collisions, yeah, thats a potential issue too. but the number of dialects/pools is not *that* varied, theyre generally pretty conservative with the kwargs. if we broke out create_engine() to take in dialect, pool, etc., well id probably just create a new function for that first off so create_engine() can just remain...im not sure if i want to force users to be that exposed to the details as people might get a little intimidated by all that. By the way, as if to prove my point, I had another problem. I was trying to pass a create_args keyword argument per the example http://www.sqlalchemy.org/docs/dbengine.myt#dbengine_establishing_custom. SQLAlchemy didn't complain. My code didn't even complain when I used a cedilla (รง). My code crashed when I used Japanese. It turns out that MySQLdb was defaulting to Latin-1 or something like that. SQLAlchemy was ignoring my create_args keyword argument. It turns out that that example is wrong (should I file a bug?). The documentation above it is correct; the correct keyword argument is named connect_args. Best Regards, -jj -- http://jjinux.blogspot.com/ --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: convert_unicode=True results in double encoding
ugh, well i fixed that one. random doc bugs you can file tickets for, or ill just give you commit access if you want to knock some out yourself. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: convert_unicode=True results in double encoding
Shannon -jj Behrens wrote: I'm using convert_unicode=True. Everything is fine as long as I'm the one reading and writing the data. However, if I look at what's actually being stored in the database, it's like the data has been encoded twiced. If I switch to use_unicode=True, which I believe is MySQL specific, things work just fine and what's being stored in the database looks correct. yes, if mysql client lib is encoding, and SA is also encoding, the data will get encoded twice. im not familiar with how i could look at the encoded data to tell if it was already encoded (and not sure if i should be...the unicode encoding option should only be enabled in one place, not two) I started looking through the SQLAlchemy code, and I came across this: def convert_bind_param(self, value, dialect): if not dialect.convert_unicode or value is None or not isinstance(value, unicode): return value else: return value.encode(dialect.encoding) def convert_result_value(self, value, dialect): if not dialect.convert_unicode or value is None or isinstance(value, unicode): return value else: return value.decode(dialect.encoding) The logic looks backwards. It says, If it's not a unicode object, return it. Otherwise, encode it. Later, If it is a unicode object, return it. Otherwise decode it. sending unicode values to databases whose client APIs dont handle unicode involves taking a python unicode object from the application, encoding it into an encoded series of bytes, and sending it to the database. receieving a result value involves taking the encoded series of bytes and decoding into a unicode object. so you have *non* unicode instances going into the DB, and *non* unicode coming out - the DBAPI is assumed to not have any idea what a python unicode object is (such as pscopg's). We've been doing the unicode thing for a while now, and you should notice that we have unit tests for just about every function in SA, especially important ones like this. the unicode unit test runs unicode and raw encoded values in and out in numerous ways, which pass for at least mysql,sqlite, postgres, oracle, and ms-sql. we have had some people having issues with MySQL specifically, which seems to be because some folks have a mysql config that is stuck in convert unicode mode and experience the double-encoding issue. the one improvement that could be made here is for MySQL to provide a subclassed unicode type that disables conversion if the dialect is known to have convert_unicode=True alreadythen again i sort of like that this forces people to understand their database config. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: convert_unicode=True results in double encoding
Shannon -jj Behrens wrote: Changing from convert_unicode=True to use_unicode=True doesn't do what you'd expect. SQLAlchemy is passing keyword arguments all over the place, and use_unicode actually gets ignored. minor rantI personally think that you should be strict *somewhere* when you're passing around keyword arguments. I've been bitten in this way too many times. Unknown keyword arguments should result in exceptions./minor rant uhh.where is use_unicode=True documented as a create_engine keyword ? if used as DBAPI-specific keywords are documented (in the FAQ, in the docs) it works just fine. We do try be strict about kwargs as much as possible but this is a case where its difficult since create_engine() is passing along kwargs to any number of dialect implementations. ill add a ticket to put in more kwargs.pop() and remove **kwargs from the base dialect class. from sqlalchemy import * import MySQLdb as mysql class FakeMySQLDBAPI(object): def __init__(self): self.paramstyle = mysql.paramstyle def connect(self, *args, **kwargs): print Args: + repr(args) + Kwargs: + repr(kwargs) return mysql.connect(*args, **kwargs) e = create_engine('mysql://scott:[EMAIL PROTECTED]/test', connect_args={'use_unicode':True}, module=FakeMySQLDBAPI()) c = e.connect() Args:()Kwargs:{'passwd': 'tiger', 'host': 'localhost', 'db': 'test', 'user': 'scott', 'use_unicode': True} for use_unicode in (True, False): connection = MySQLdb.connect(host=localhost, user=user, passwd='dataase', db=users, use_unicode=use_unicode) cursor = connection.cursor() cursor.execute(select firstName from users where username='test') row = cursor.fetchone() print use_unicode:%s %r % (use_unicode, row) I get use_unicode:True (u'test \xc3\xa7',) use_unicode:False ('test \xc3\xa7',) Notice the result is the same, but one has a unicode object and the other doesn't. Notice that it's \xc3\xa7 each time? It shouldn't be. yeah id say screw using use_unicode, make sure your MySQL isnt doing any unicode conversion (check out all your my.cnf files etc), and use SA's convert_unicode or Unicode type instead. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: convert_unicode=True results in double encoding
Shannon - im trying to figure a way to make create_engine() behave the way you want. but let me show you a highly simplified version of how create_engine() works. how would you change it ? def create_engine(url, **kwargs): pool = pool.Pool(**kwargs) dialect = module.Dialect(url, **kwargs) engine = Engine(dialect, pool, **kwargs) return engine As you can see, the returned Engine is composed of itself, a connection pool, and a Dialect. the **kwargs you pass to create_engine() can be used by any three of those objects. Also, there are many different types of Pool, Dialect, and Engine which take different arguments..so placing explicit awareness of every argument within create_engine itself is excessively brittle. since the constructors of all the classes declare their keyword arguments as normal inline arguments, we need to pass them around using ** and therefore the kwargs dictionary cannot be consumed. if they could be consumed, we could then check at the end that the kwargs dict is in fact empty so we know that no unknown arguments were sent. but as it is, the full dict gets passed to all three types of objects where they have to just ignore the kwargs they dont know about. this is not unlike WSGI where it passes around an environ dictionary, and its fully likely that many erroneous keyword arguments could be present in that dictionary as well. the only way i can see to fix this is to modify the constructors of all the objects to take a single dictionary argument from which they consume their arguments, and we dont pass the arguments with the ** operator anymore. which means the nice constructors we have which clearly lay out the arguments they take now will just have one non-self-documenting arguments dictionary. is that the only option ? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: convert_unicode=True results in double encoding
On 11/3/06, Shannon -jj Behrens [EMAIL PROTECTED] wrote: I'm using convert_unicode=True. Everything is fine as long as I'm the one reading and writing the data. However, if I look at what's actually being stored in the database, it's like the data has been encoded twiced. If I switch to use_unicode=True, which I believe is MySQL specific, things work just fine and what's being stored in the database looks correct. I started looking through the SQLAlchemy code, and I came across this: def convert_bind_param(self, value, dialect): if not dialect.convert_unicode or value is None or not isinstance(value, unicode): return value else: return value.encode(dialect.encoding) def convert_result_value(self, value, dialect): if not dialect.convert_unicode or value is None or isinstance(value, unicode): return value else: return value.decode(dialect.encoding) The logic looks backwards. It says, If it's not a unicode object, return it. Otherwise, encode it. Later, If it is a unicode object, return it. Otherwise decode it. Am I correct that this is backwards? If so, this is going to be *painful* to update all the databases out there! Ok, MySQLdb doesn't have a mailing list, so I can't ask there. Here are some things I've learned: Changing from convert_unicode=True to use_unicode=True doesn't do what you'd expect. SQLAlchemy is passing keyword arguments all over the place, and use_unicode actually gets ignored. minor rantI personally think that you should be strict *somewhere* when you're passing around keyword arguments. I've been bitten in this way too many times. Unknown keyword arguments should result in exceptions./minor rant Anyway, I'm still a bit worried about that code above like I said. However, here's what's even scarier. If I use the following code: import MySQLdb for use_unicode in (True, False): connection = MySQLdb.connect(host=localhost, user=user, passwd='dataase', db=users, use_unicode=use_unicode) cursor = connection.cursor() cursor.execute(select firstName from users where username='test') row = cursor.fetchone() print use_unicode:%s %r % (use_unicode, row) I get use_unicode:True (u'test \xc3\xa7',) use_unicode:False ('test \xc3\xa7',) Notice the result is the same, but one has a unicode object and the other doesn't. Notice that it's \xc3\xa7 each time? It shouldn't be. Consider: s = 'test \xc3\xa7' s.decode('utf-8') u'test \xe7' *It's creating a unicode object without actually doing any decoding!* This is somewhere low level. Like I said, this is lower level than SQLAlchemy, but I don't have anywhere else to turn. SQLAlchemy: 0.2.8 MySQLdb: 1.36.2.4 mysql client and server: 5.0.22 Ubuntu: 6.0.6 Help! -jj -- http://jjinux.blogspot.com/ --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---