#18392: Use utf8mb4 encoding with MySQL 5.5
-------------------------------------+-------------------------------------
Reporter: EmilStenstrom | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Design
Keywords: utf8mb4 mysql | decision needed
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by rogeliorv):
* keywords: hack utf8mb4 mysql => utf8mb4 mysql
* needs_better_patch: 0 => 1
* has_patch: 0 => 1
* needs_tests: 0 => 1
Comment:
Replying to [comment:8 EmilStenstrom]:
> Replying to [comment:7 rogeliorv]:
> > As a way to test it. The hack consists in adding self.query('SET NAMES
utf8mb4') in MySQLdb.connections in Connection.set_character_set function
as shown here: http://pastebin.com/MW5BgRgP
> >
> > Of course the correct way would be to change this in django when
setting up the cursor connection.
>
> Did your hack remove the exception? What was the rationale behind the
hack? What's the next step?
Yes, the hack removed the exception. The rationale followed was to make
the mysql client to use a certain encoding.
The next step is to make django's mysql connections to use utf8mb4 by
default or otherwise make it more configurable. Since utf8bm4 is utf8
compatible, there should be no extra changes in that regard.
To achieve this django.db.base.cursor should be changed in class
DatabaseWrapper function _cursor, (complete function definition here
http://pastebin.com/A6dMEMd4):
''kwargs = {
"conv": django_conversions,
"charset": "utf8mb4",
"use_unicode": True,
}
''
Unfortunately this won't work unless we also change MySQLdb.connections
class Connection function set_character_set:
Change the two bottom lines to (complete function definition here:
http://pastebin.com/AMN1B8za)
#Hack so data can be decoded/encoded using python's utf8 since
# python does not know about mysql utf8mb4
''if charset == 'utf8mb4':''
''charset = 'utf8'''
''self.string_decoder.charset = charset''
''self.unicode_literal.charset = charset''
This will guarantee you can use special characets like πππβΊππππ
Unlike the previous hack, which worked on reading/writing data, this patch
only allows me to read data in utfmb4 format, but now I've hit an error on
insertion/creation where I get 'Cursor' object has no attribute
'_last_executed'. I will report evidence on this error as I find it. All
your help regarding this error is appreciated.
You can reach me via twitter, @rogeliorv
--
Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:9>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/django-updates?hl=en.