#8102: get_or_create is unable to handle unicode characters
---------------------------------------+------------------------------------
Reporter: davenaff | Owner: mtredinnick
Status: closed | Milestone: 1.0 beta
Component: Database wrapper | Version: SVN
Resolution: fixed | Keywords:
Stage: Accepted | Has_patch: 0
Needs_docs: 0 | Needs_tests: 0
Needs_better_patch: 1 |
---------------------------------------+------------------------------------
Comment (by kmtracey):
Just a followup in case anyone ever considers trying to actually support
both case-sensitive & case-insensitive lookups on Django/MySQL by
specifying COLLATE in the SQL generated by Django:
I experimented with this a bit and there seem to be nasty performance
implications to specifying a collation that differs from a column's
default collation. My DB is stored with latin1 charset and default case-
insensitive collation. I have a view that issues quite a few queries
(hundreds) which with current Django code takes a couple seconds maybe to
complete. Changing the Django code to specify one of the UTF collations
caused that view to start taking a minute or so. Problem seems to be if
you explicitly specify a collation that does not match the column's
default, MySQL cannot use any indexes (indices?) it has for that column.
Here's the output from EXPLAIN on Django's current SQL for the kinds of
queries used in this view:
{{{
mysql> explain SELECT COUNT(*) FROM `Clues` INNER JOIN `Entries` ON
(`Clues`.`Entry ID` = `Entries`.`Entry ID`) WHERE (NOT (`Clues`.`Derived`
= 1 ) AND `Entries`.`Entry` = 'ERIC' AND `Clues`.`Clue` = 'Idle Monty
Python member?' );
+----+-------------+---------+-------+-----------------+------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+-------------+---------+-------+-----------------+------------+---------+-------+------+-------------+
| 1 | SIMPLE | Entries | const | PRIMARY,Entry | Entry | 52
| const | 1 | |
| 1 | SIMPLE | Clues | ref | EntryIndex,Clue | EntryIndex | 4
| const | 208 | Using where |
+----+-------------+---------+-------+-----------------+------------+---------+-------+------+-------------+
2 rows in set (0.00 sec)
}}}
MySQL uses the Entry index to speed up the query. But if Django tries to
specify a collation:
{{{
mysql> explain SELECT COUNT(*) FROM `Clues` INNER JOIN `Entries` ON
(`Clues`.`Entry ID` = `Entries`.`Entry ID`) WHERE (NOT (`Clues`.`Derived`
= 1 ) AND `Entries`.`Entry` = 'ERIC' COLLATE UTF8_BIN AND `Clues`.`Clue`
= 'Idle Monty Python member?' COLLATE UTF8_BIN );
+----+-------------+---------+------+---------------+------------+---------+------------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len
| ref | rows | Extra |
+----+-------------+---------+------+---------------+------------+---------+------------------------------+--------+-------------+
| 1 | SIMPLE | Entries | ALL | PRIMARY | NULL | NULL
| NULL | 138731 | Using where |
| 1 | SIMPLE | Clues | ref | EntryIndex | EntryIndex | 4
| crosswordDB.Entries.Entry ID | 7 | Using where |
+----+-------------+---------+------+---------------+------------+---------+------------------------------+--------+-------------+
2 rows in set (0.00 sec)
}}}
The Entry index is not eligible, key=NULL and resulting rows=138731 on the
first line is bad, resulting query takes about 25 times longer than the
non-collate version. (It doesn't matter if the actual collation specified
is BIN or case-insensitive, the fact that it is for UTF8 instead of latin1
seems to prevent use of the index.)
So while specifying collate might be a way to get correct case-sensitive +
case-insensitive results out of the Django/MySQL combo, it looks to have a
pretty serious performance impact for anyone not using a utf8 coding in
their database. Not sure it would be worth it unless someone knows a way
to fix that performance hit. [I don't know that anyone (besides maybe me,
just out of curiosity to see if I could get it to work) was going to
pursue that idea at all, but just in case anyone does I figured I'd
mention this while it was fresh in my mind.]
--
Ticket URL: <http://code.djangoproject.com/ticket/8102#comment:23>
Django Code <http://code.djangoproject.com/>
The web framework for perfectionists with deadlines
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/django-updates?hl=en
-~----------~----~----~----~------~----~------~--~---