On Fri, Jul 25, 2008 at 1:27 AM, Jeremiah <[EMAIL PROTECTED]> wrote:

>
> With a little research today I discovered that it is possible to do a
> case insensitive match on a binary column by using COLLATE.


Yes, specifying COLLATE is what's needed here.  In fact specifying the
BINARY attribute on a character column is just changing the default
collation to the binary one for the column's character set.  Then to get
case-insensitive matching on a column where you've done this you need to
specify a case-insensitive collation to override the column default.  I
actually tried that before sending my note yesterday but ran into errors
with illegal mixtures of collations and ran out of time to look at it so
just posted the problem.  At that point I was afraid there was no good way
for Django code to know the right collation to specify, given that the
table/column charset could be anything.  But trying again this morning it
seems things might not be that dire: apparently if the connection charset is
set to utf8 (which I believe Django does always), then you can specify a
collation of utf8_general_ci regardless of the table/column charset.  At
least, that worked for me for a table that was using latin1 charset.

I still have some concerns with Django specifying a binary column type on
table creation:

First, what about tables that Django did not create (or created before the
change was made)?   They likely have a case-insensitive default collation
since that is the default for MySQL.  Do we just document the kind of
trouble you can run into with get_or_create and unique on columns that don't
have a binary default collation specified?

Second, specifying a binary collation for the column affects ordering as
well.  So ordered results will change from using a case-insensitive ordering
by default to the binary one.  This could be an issue, since I think binary
ordering gives somewhat non-intuitive results whereas case-insensitive more
naturally matches what people expect.  But maybe that's just my impression?
Do the other databases use binary ordering by default?

Third, need to verify that specifying a collation of utf8_general_ci will
really always work regardless of the table/column charset.  It does seem to
work for latin1, but I haven't tested beyond that and again have run out of
time to play with this today...

Karen

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to