Re: [Django] #8391: slugify template filter poorly encodes non-English strings

Django Thu, 15 Sep 2011 10:03:54 -0700

#8391: slugify template filter poorly encodes non-English strings
------------------------------------+---------------------------------
               Reporter:  bjornkri  |          Owner:  nobody
                   Type:  Bug       |         Status:  reopened
              Milestone:            |      Component:  Template system
                Version:  SVN       |       Severity:  Normal
             Resolution:            |       Keywords:
           Triage Stage:  Accepted  |      Has patch:  0
    Needs documentation:  0         |    Needs tests:  0
Patch needs improvement:  0         |  Easy pickings:  0
                  UI/UX:  0         |
------------------------------------+---------------------------------


Comment (by yasar11732@…):

 Above slugify2 function won't fix #16853.

 {{{
 # -*- coding: utf-8 -*-
 import sys
 import re

 from django.utils import encoding

 TURKISH_MAP = {
     u'ş':'s', u'Ş':'S', u'ı':'i', u'İ':'I', u'ç':'c', u'Ç':'C', u'ü':'u',
 u'Ü':'U',
     u'ö':'o', u'Ö':'O', u'ğ':'g', u'Ğ':'G'
 }

 ALL_DOWNCODE_MAPS = [
     TURKISH_MAP,
 ]

 class Downcoder(object):
     map = {}
     regex = None

     def __init__(self):
         self.map = {}
         chars = u''

         for lookup in ALL_DOWNCODE_MAPS:
             for c, l in lookup.items():
                 self.map[c] = l
                 chars += encoding.force_unicode(c)

         self.regex = re.compile(ur'[' + chars + ']|[^' + chars + ']+',
 re.U)

 downcoder = Downcoder()

 def downcode(value):
     downcoded = u''
     pieces = downcoder.regex.findall(value)

     if pieces:
         for p in pieces:
             mapped = downcoder.map.get(p)
             if mapped:
                 downcoded += mapped
             else:
                 downcoded += p
     else:
         downcoded = value

     return value

 def slugify2(value):
     """
     Normalizes string, converts to lowercase, removes non-alpha
 characters,
     and converts spaces to hyphens.
     """
     import unicodedata
     value = downcode(value)
     value = unicodedata.normalize('NFD', value).encode('ascii', 'ignore')
     value = unicode(re.sub('[^\w\s-]', '', value).strip().lower())
     return re.sub('[-\s]+', '-', value)

 print(slugify2(u"Işık ılık süt iç"))


 }}}

 This prints "isk-lk-sut-ic", but expected value is, "isik-ilik-sut-ic".

-- 
Ticket URL: <https://code.djangoproject.com/ticket/8391#comment:34>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en.

Re: [Django] #8391: slugify template filter poorly encodes non-English strings

Reply via email to