#9682: icontains can be case-sensitive on MySQL
--------------------------------------------+-------------------------------
 Reporter:  [EMAIL PROTECTED]  |       Owner:  nobody    
   Status:  new                             |   Milestone:            
Component:  Database layer (models, ORM)    |     Version:  1.0       
 Keywords:                                  |       Stage:  Unreviewed
Has_patch:  0                               |  
--------------------------------------------+-------------------------------
 Firstly, this is a known problem, see groups.google.com/group/django-
 users/browse_thread/thread/2936ad3387c4b369.

 Secondly, I disagree with the ‘You set the database collation and Django
 respects that’ approach shown there. For icontains and MySQL, Django
 generates ‘x LIKE '%y%'’. So, the programmer has told Django explicitly,
 ‘I want case-insensitive comparison’, and Django tells MySQL, ‘We want
 default comparison’. This is not field_icontains but rather some
 field_''usingdefaultsettings''contains.

 In fact, here’s how Django defines operators for the MySQL backend:
 {{{
     operators = {
         'exact': '= %s',
         'iexact': 'LIKE %s',
         'contains': 'LIKE BINARY %s',
         'icontains': 'LIKE %s',
         'regex': 'REGEXP BINARY %s',
         'iregex': 'REGEXP %s',
         'gt': '> %s',
         'gte': '>= %s',
         'lt': '< %s',
         'lte': '<= %s',
         'startswith': 'LIKE BINARY %s',
         'endswith': 'LIKE BINARY %s',
         'istartswith': 'LIKE %s',
         'iendswith': 'LIKE %s',
     }
 }}}
 So, case-sensitivity is explicitly requested, while case-insensitivity is
 implied.

 The aforementioned thread says that ‘case-sensitive matching with the
 default utf_*_ci collations’ and overall making ‘things like that work
 regardless of collation’ is ‘really, really hard’. That sounds strange,
 because case-sensitive matching is no harder than a simple memcpy, it’s
 case-insensitive matching that is problematic.

 However, MySQL doesn’t seem to implement the Unicode collation trickery
 correctly. It thinks that 'ı' is LIKE 'i' and 'groß' is not LIKE 'GROSS'.
 (It does perform LOWER('COMMITTED' COLLATE utf8_turkish_ci) correctly,
 however.) What about altering the above table, making it 'COLLATE
 utf8_general_ci LIKE %s' for cases where the default collation isn’t case-
 insensitive?

 You’re right Unicode is quite hard to implement properly. However, for
 something_icontains the database does have to determine some collation,
 and if the default collation is case insensitive, some fallback has to be
 derived. Maybe the most reasonable solution would be a setting like
 FORCE_COLLATION_FOR_CI (maybe MySQL-specific, maybe it can be usable for
 other DBMSs), that would determine how to collate values for which no
 collation is specified?

 It would solve a practical problem: I want iexact to differ from exact and
 icontains to differ from contains (please note these completely reasonable
 requirements ''conflict'' on current MySQL under current Django
 implementation ''regardless'' of collation settings). I’ve read what
 Django documentation has to say about collation, and while you might be
 justified in having ‘exact’ return what the database considers exact (for
 example, for purposes of UNIQUE indices), it’s wrong not to provide a
 lookup option that would consistently test for case-sensitive equality
 across DBMSs. Maybe in addition to contains and icontains something like
 scontains is also needed?

 ----

 To summarize, I propose two things:

 1. For those who have case-insensitive collation in MySQL: a lookup option
 that would always invoke case-sensitive matching. Maybe the ‘s’ prefix for
 ‘sensitive’, or ‘b’ for binary: bcontains, bexact, bregex and so on.

 2. For case-sensitive people: an option to set a collation for case-
 insensitive matches, for example in the settings module.

 Both options preserve backwards compatibility and both should be easy to
 implement. (People on KDE bugzilla have told me I should never tell them
 whether I think something is easy to implement, but nevertheless it’s a
 couple of lines for both changes. :-)

-- 
Ticket URL: <http://code.djangoproject.com/ticket/9682>
Django <http://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to django-updates@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to