#28949: Multibyte table name or column name causes miscalculation of the length 
of
index name.
-------------------------------------+-------------------------------------
               Reporter:  Pak        |          Owner:  nobody
  Youngrok                           |
                   Type:  Bug        |         Status:  new
              Component:             |        Version:  2.0
  Migrations                         |       Keywords:  migration multibyte
               Severity:  Normal     |  index
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 Django migration automatically creates index with name consists of table
 name, column names, hash, and suffix. When the length of generated index
 name is greater than `self.connection.ops.max_name_length()`, it shortens
 the name. However, it calculate length as python string type, so it's
 length doesn't match with the length of databases. The length should be
 calculated after encoded with the database encoding. Because of this
 issue, migration fails with these conditions below:

  * long multibyte model names
  * two multibyte model related with foreign key
  * the foreign key field is CharField(or it's child class)

 With these conditions, django migration tries to create two index(one for
 normal index, one for `like` index), and the name of those are same except
 suffix(the latter has suffix `_like`), and the lengths of both index names
 as string are less than max name length but the length of both index names
 as bytes are greater than max name length, so name conflict is raised.

 long multibyte table name and foreign key name.

 Here is the code:
 
https://github.com/django/django/blob/4420761ea9457d386b2000cf9df5b2f6f88f8f91/django/db/backends/base/schema.py#L873
 {{{#!python
         index_name = '%s_%s_%s' % (table_name, '_'.join(column_names),
 hash_suffix_part)
         if len(index_name) <= max_length:
             return index_name
 }}}

 [https://docs.djangoproject.com/en/2.0/ref/databases/#encoding Django
 assumes that all databases use UTF-8 encoding], so the code should be
 fixed like this:
 {{{#!python
         index_name = '%s_%s_%s' % (table_name, '_'.join(column_names),
 hash_suffix_part)
         if len(index_name.encode('utf8')) <= max_length:
             return index_name
 }}}

 The code that shorten the name should be also fixed. Getting a third of
 each part and re-joining is not good strategy in multibyte world, it can
 also cause miscalculation. I think getting very small amount of table and
 column names like 2 or 3 characters and joining them with original hash
 can be a safe solution.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/28949>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/051.2fc172bc19cce7f443ac819204671d71%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to