#8102: get_or_create is unable to handle unicode characters
---------------------------------------+------------------------------------
          Reporter:  davenaff          |         Owner:  mtredinnick
            Status:  closed            |     Milestone:  1.0 beta   
         Component:  Database wrapper  |       Version:  SVN        
        Resolution:  fixed             |      Keywords:             
             Stage:  Accepted          |     Has_patch:  0          
        Needs_docs:  0                 |   Needs_tests:  0          
Needs_better_patch:  1                 |  
---------------------------------------+------------------------------------
Comment (by kmtracey):

 Just a followup in case anyone ever considers trying to actually support
 both case-sensitive & case-insensitive lookups on Django/MySQL by
 specifying COLLATE in the SQL generated by Django:

 I experimented with this a bit and there seem to be nasty performance
 implications to specifying a collation that differs from a column's
 default collation.  My DB is stored with latin1 charset and default case-
 insensitive collation.  I have a view that issues quite a few queries
 (hundreds) which with current Django code takes a couple seconds maybe to
 complete.  Changing the Django code to specify one of the UTF collations
 caused that view to start taking a minute or so.  Problem seems to be if
 you explicitly specify a collation that does not match the column's
 default, MySQL cannot use any indexes (indices?) it has for that column.
 Here's the output from EXPLAIN on Django's current SQL for the kinds of
 queries used in this view:

 {{{
 mysql> explain  SELECT COUNT(*) FROM `Clues` INNER JOIN `Entries` ON
 (`Clues`.`Entry ID` = `Entries`.`Entry ID`) WHERE (NOT (`Clues`.`Derived`
 = 1 ) AND `Entries`.`Entry` = 'ERIC'  AND `Clues`.`Clue` = 'Idle Monty
 Python member?' );
 
+----+-------------+---------+-------+-----------------+------------+---------+-------+------+-------------+
 | id | select_type | table   | type  | possible_keys   | key        |
 key_len | ref   | rows | Extra       |
 
+----+-------------+---------+-------+-----------------+------------+---------+-------+------+-------------+
 |  1 | SIMPLE      | Entries | const | PRIMARY,Entry   | Entry      | 52
 | const |    1 |             |
 |  1 | SIMPLE      | Clues   | ref   | EntryIndex,Clue | EntryIndex | 4
 | const |  208 | Using where |
 
+----+-------------+---------+-------+-----------------+------------+---------+-------+------+-------------+
 2 rows in set (0.00 sec)

 }}}

 MySQL uses the Entry index to speed up the query.  But if Django tries to
 specify a collation:

 {{{
 mysql> explain SELECT COUNT(*) FROM `Clues` INNER JOIN `Entries` ON
 (`Clues`.`Entry ID` = `Entries`.`Entry ID`) WHERE (NOT (`Clues`.`Derived`
 = 1  ) AND `Entries`.`Entry` = 'ERIC' COLLATE UTF8_BIN  AND `Clues`.`Clue`
 = 'Idle Monty Python member?' COLLATE UTF8_BIN );
 
+----+-------------+---------+------+---------------+------------+---------+------------------------------+--------+-------------+
 | id | select_type | table   | type | possible_keys | key        | key_len
 | ref                          | rows   | Extra       |
 
+----+-------------+---------+------+---------------+------------+---------+------------------------------+--------+-------------+
 |  1 | SIMPLE      | Entries | ALL  | PRIMARY       | NULL       | NULL
 | NULL                         | 138731 | Using where |
 |  1 | SIMPLE      | Clues   | ref  | EntryIndex    | EntryIndex | 4
 | crosswordDB.Entries.Entry ID |      7 | Using where |
 
+----+-------------+---------+------+---------------+------------+---------+------------------------------+--------+-------------+
 2 rows in set (0.00 sec)

 }}}

 The Entry index is not eligible, key=NULL and resulting rows=138731 on the
 first line is bad, resulting query takes about 25 times longer than the
 non-collate version. (It doesn't matter if the actual collation specified
 is BIN or case-insensitive, the fact that it is for UTF8 instead of latin1
 seems to prevent use of the index.)

 So while specifying collate might be a way to get correct case-sensitive +
 case-insensitive results out of the Django/MySQL combo, it looks to have a
 pretty serious performance impact for anyone not using a utf8 coding in
 their database.  Not sure it would be worth it unless someone knows a way
 to fix that performance hit.  [I don't know that anyone (besides maybe me,
 just out of curiosity to see if I could get it to work) was going to
 pursue that idea at all, but just in case anyone does I figured I'd
 mention this while it was fresh in my mind.]

-- 
Ticket URL: <http://code.djangoproject.com/ticket/8102#comment:23>
Django Code <http://code.djangoproject.com/>
The web framework for perfectionists with deadlines
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to