Author: mtredinnick
Date: 2008-08-25 20:59:25 -0500 (Mon, 25 Aug 2008)
New Revision: 8568

Modified:
   django/trunk/docs/ref/databases.txt
   django/trunk/docs/ref/models/fields.txt
   django/trunk/docs/ref/models/querysets.txt
Log:
Added documentation to explain the gains and losses when using utf8_bin
collation in MySQL. This should help people to make a reasonably informed
decision. Usually, leaving the MySQL collation alone will be the best solution,
but if you must change it, this gives a start to the information you need and
pointers to the appropriate place in the MySQL docs.

There's a small chance I also got all the necessary Sphinx markup correct, too
(it builds without errors, but I may have missed some chances for glory and
linkage).

Fixed #2335, #8506.


Modified: django/trunk/docs/ref/databases.txt
===================================================================
--- django/trunk/docs/ref/databases.txt 2008-08-26 00:52:55 UTC (rev 8567)
+++ django/trunk/docs/ref/databases.txt 2008-08-26 01:59:25 UTC (rev 8568)
@@ -95,6 +95,65 @@
 
 .. _create your database: 
http://dev.mysql.com/doc/refman/5.0/en/create-database.html
 
+.. _mysql-collation:
+
+Collation settings
+~~~~~~~~~~~~~~~~~~
+
+The collation setting for a column controls the order in which data is sorted
+as well as what strings compare as equal. It can be set on a database-wide
+level and also per-table and per-column. This is `documented thoroughly`_ in
+the MySQL documentation. In all cases, you set the collation by directly
+manipulating the database tables; Django doesn't provide a way to set this on
+the model definition.
+
+.. _documented thoroughly: http://dev.mysql.com/doc/refman/5.0/en/charset.html
+
+By default, with a UTF-8 database, MySQL will use the
+``utf8_general_ci_swedish`` collation. This results in all string equality
+comparisons being done in a *case-insensitive* manner. That is, ``"Fred"`` and
+``"freD"`` are considered equal at the database level. If you have a unique
+constraint on a field, it would be illegal to try to insert both ``"aa"`` and
+``"AA"`` into the same column, since they compare as equal (and, hence,
+non-unique) with the default collation.
+
+In many cases, this default will not be a problem. However, if you really want
+case-sensitive comparisons on a particular column or table, you would change
+the column or table to use the ``utf8_bin`` collation. The main thing to be
+aware of in this case is that if you are using MySQLdb 1.2.2, the database 
backend in Django will then return
+bytestrings (instead of unicode strings) for any character fields it returns
+receive from the database. This is a strong variation from Django's normal
+practice of *always* returning unicode strings. It is up to you, the
+developer, to handle the fact that you will receive bytestrings if you
+configure your table(s) to use ``utf8_bin`` collation. Django itself should 
work
+smoothly with such columns, but if your code must be prepared to call
+``django.utils.encoding.smart_unicode()`` at times if it really wants to work
+with consistent data -- Django will not do this for you (the database backend
+layer and the model population layer are separated internally so the database
+layer doesn't know it needs to make this conversion in this one particular
+case).
+
+If you're using MySQLdb 1.2.1p2, Django's standard
+:class:`~django.db.models.CharField` class will return unicode strings even
+with ``utf8_bin`` collation. However, :class:`~django.db.models.TextField`
+fields will be returned as an ``array.array`` instance (from Python's standard
+``array`` module). There isn't a lot Django can do about that, since, again,
+the information needed to make the necessary conversions isn't available when
+the data is read in from the database. This problem was `fixed in MySQLdb
+1.2.2`_, so if you want to use :class:`~django.db.models.TextField` with
+``utf8_bin`` collation, upgrading to version 1.2.2 and then dealing with the
+bytestrings (which shouldn't be too difficult) is the recommended solution.
+
+Should you decide to use ``utf8_bin`` collation for some of your tables with
+MySQLdb 1.2.1p2, you should still use ``utf8_collation_ci_swedish`` (the
+default) collation for the :class:`django.contrib.sessions.models.Session`
+table (usually called ``django_session`` and the table
+:class:`django.contrib.admin.models.LogEntry` table (usually called
+``django_admin_log``). Those are the two standard tables that use
+:class:`~django.db.model.TextField` internally.
+
+.. _fixed in MySQLdb 1.2.2: 
http://sourceforge.net/tracker/index.php?func=detail&aid=1495765&group_id=22307&atid=374932
+
 Connecting to the database
 --------------------------
 

Modified: django/trunk/docs/ref/models/fields.txt
===================================================================
--- django/trunk/docs/ref/models/fields.txt     2008-08-26 00:52:55 UTC (rev 
8567)
+++ django/trunk/docs/ref/models/fields.txt     2008-08-26 01:59:25 UTC (rev 
8568)
@@ -340,6 +340,14 @@
     The maximum length (in characters) of the field. The max_length is enforced
     at the database level and in Django's validation.
 
+.. admonition:: MySQL users
+
+    If you are using this field with MySQLdb 1.2.2 and the ``utf8_bin``
+    collation (which is *not* the default), there are some issues to be aware
+    of. Refer to the :ref:`MySQL database notes <mysql-collation>` for
+    details.
+
+
 ``CommaSeparatedIntegerField``
 ------------------------------
 
@@ -689,6 +697,13 @@
 A large text field. The admin represents this as a ``<textarea>`` (a multi-line
 input).
 
+.. admonition:: MySQL users
+
+    If you are using this field with MySQLdb 1.2.1p2 and the ``utf8_bin``
+    collation (which is *not* the default), there are some issues to be aware
+    of. Refer to the :ref:`MySQL database notes <mysql-collation>` for
+    details.
+
 ``TimeField``
 -------------
 

Modified: django/trunk/docs/ref/models/querysets.txt
===================================================================
--- django/trunk/docs/ref/models/querysets.txt  2008-08-26 00:52:55 UTC (rev 
8567)
+++ django/trunk/docs/ref/models/querysets.txt  2008-08-26 01:59:25 UTC (rev 
8568)
@@ -729,17 +729,14 @@
 
 .. admonition:: MySQL comparisons
 
-    In MySQL, whether or not ``exact`` comparisons are case-sensitive depends
-    upon the collation setting of the table involved. The default is usually
-    ``latin1_swedish_ci`` or ``utf8_swedish_ci``, which results in
-    case-insensitive comparisons. Change the collation to
-    ``latin1_swedish_cs`` or ``utf8_bin`` for case sensitive comparisons.
+    In MySQL, whether or not ``exact`` comparisons are case-insensitive by
+    default. This is controlled by the collation setting on the database
+    tables (this is a database setting, *not* a Django setting).  It is
+    possible to configured you MySQL tables to use case-sensitive comparisons,
+    however there are some trade-offs involved. For more information about
+    this, see the :ref:`collation section <mysql-collation>` in the
+    :ref:`databases <ref-databases>` documentation.
 
-    For more details, refer to the MySQL manual section about `character sets
-    and collations`_.
-
-.. _character sets and collations: 
http://dev.mysql.com/doc/refman/5.0/en/charset.html
-
 iexact
 ~~~~~~
 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to