#9682: icontains can be case-sensitive on MySQL --------------------------------------------+------------------------------- Reporter: [EMAIL PROTECTED] | Owner: nobody Status: new | Milestone: Component: Database layer (models, ORM) | Version: 1.0 Keywords: | Stage: Unreviewed Has_patch: 0 | --------------------------------------------+------------------------------- Firstly, this is a known problem, see groups.google.com/group/django- users/browse_thread/thread/2936ad3387c4b369.
Secondly, I disagree with the ‘You set the database collation and Django respects that’ approach shown there. For icontains and MySQL, Django generates ‘x LIKE '%y%'’. So, the programmer has told Django explicitly, ‘I want case-insensitive comparison’, and Django tells MySQL, ‘We want default comparison’. This is not field_icontains but rather some field_''usingdefaultsettings''contains. In fact, here’s how Django defines operators for the MySQL backend: {{{ operators = { 'exact': '= %s', 'iexact': 'LIKE %s', 'contains': 'LIKE BINARY %s', 'icontains': 'LIKE %s', 'regex': 'REGEXP BINARY %s', 'iregex': 'REGEXP %s', 'gt': '> %s', 'gte': '>= %s', 'lt': '< %s', 'lte': '<= %s', 'startswith': 'LIKE BINARY %s', 'endswith': 'LIKE BINARY %s', 'istartswith': 'LIKE %s', 'iendswith': 'LIKE %s', } }}} So, case-sensitivity is explicitly requested, while case-insensitivity is implied. The aforementioned thread says that ‘case-sensitive matching with the default utf_*_ci collations’ and overall making ‘things like that work regardless of collation’ is ‘really, really hard’. That sounds strange, because case-sensitive matching is no harder than a simple memcpy, it’s case-insensitive matching that is problematic. However, MySQL doesn’t seem to implement the Unicode collation trickery correctly. It thinks that 'ı' is LIKE 'i' and 'groß' is not LIKE 'GROSS'. (It does perform LOWER('COMMITTED' COLLATE utf8_turkish_ci) correctly, however.) What about altering the above table, making it 'COLLATE utf8_general_ci LIKE %s' for cases where the default collation isn’t case- insensitive? You’re right Unicode is quite hard to implement properly. However, for something_icontains the database does have to determine some collation, and if the default collation is case insensitive, some fallback has to be derived. Maybe the most reasonable solution would be a setting like FORCE_COLLATION_FOR_CI (maybe MySQL-specific, maybe it can be usable for other DBMSs), that would determine how to collate values for which no collation is specified? It would solve a practical problem: I want iexact to differ from exact and icontains to differ from contains (please note these completely reasonable requirements ''conflict'' on current MySQL under current Django implementation ''regardless'' of collation settings). I’ve read what Django documentation has to say about collation, and while you might be justified in having ‘exact’ return what the database considers exact (for example, for purposes of UNIQUE indices), it’s wrong not to provide a lookup option that would consistently test for case-sensitive equality across DBMSs. Maybe in addition to contains and icontains something like scontains is also needed? ---- To summarize, I propose two things: 1. For those who have case-insensitive collation in MySQL: a lookup option that would always invoke case-sensitive matching. Maybe the ‘s’ prefix for ‘sensitive’, or ‘b’ for binary: bcontains, bexact, bregex and so on. 2. For case-sensitive people: an option to set a collation for case- insensitive matches, for example in the settings module. Both options preserve backwards compatibility and both should be easy to implement. (People on KDE bugzilla have told me I should never tell them whether I think something is easy to implement, but nevertheless it’s a couple of lines for both changes. :-) -- Ticket URL: <http://code.djangoproject.com/ticket/9682> Django <http://code.djangoproject.com/> The Web framework for perfectionists with deadlines. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django updates" group. To post to this group, send email to django-updates@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-updates?hl=en -~----------~----~----~----~------~----~------~--~---