On 11/27/2015 01:33 PM, Dominik Ruf wrote:

Great.
BTW I made another test and it seems the key thing is charset=utf8.


TLDR: Lars is right that a default Kallithea installation on MySQL stores utf-8 in the database instead of storing unicode and letting the database deal with the encoding. I was also right that it generally works fine anyway. ;-)

I also tested (with Fedora, mariadb and mysql-python). I tested by creating a new database, changing the admin users name to blåbærgrød, creating a blåbærgrød repository, and inspecting database and file system content.

Everything worked flawlessly with the default mysql url. Only with the caveat that it stores utf-8 in the database. Sqlalchemy will however encode and decode it consistently so everything just works ... but I guess collation order and other "details" might be wrong and direct database hacking will be tricky - as Lars found out the hard way in the initial post.

I agree that
sqlalchemy.db1.url = mysql://kallithea:foobar@localhost/kallithea?charset=utf8 seems to be the right "solution". It works and the database content is as expected. (Except that this however apparently not is fully unicode compliant and it would be better to use utf8mb4 ...)

I don't know the root cause of the weirdness. It might be some (old and fixed?) MySQL deficiencies and workarounds in SqlAlchemy ... or something in Kallithea that triggers it. I guess it could be the combination of mysql not being unicode compliant by default and convert_unicode thus triggering the unnecessary utf8 encoding. (http://docs.sqlalchemy.org/en/latest/core/engines.html#sqlalchemy.create_engine.params.encoding could also seem to play a role ... but probably only relevant for understanding.)

I guess we should change the default mysql uri in the .ini files to use charset=utf8?

Each table already specifies mysql_charset utf8 ... but that is apparently for something else?

We should probably also improve the documentation to give some advice of which "DBAPI" to use. Any recommendations?

I guess we also should get rid of all the explicit convert_unicode in db.py and .ini and just use Unicode and UnicodeText fields.

Changes in this area could however cause pain for installations that happily are using mysql with double encoding.

/Mads
|
|||
_______________________________________________
kallithea-general mailing list
[email protected]
http://lists.sfconservancy.org/mailman/listinfo/kallithea-general

Reply via email to