On 17 Jul 2006, at 8:25, tsuyuki makoto wrote:
> We Japanese know that we can't transarate Japanese to ASCII.
> So I want to do it as follows at least.
> A letter does not disappear and is restored.
> #FileField and ImageField have same letters disappear problem.
>
> def slug_ja(word) :
>     try :
>         unicode(word, 'ASCII')
>         import re
>         slug = re.sub('[^\w\s-]', '', word).strip().lower()
>         slug = re.sub('[-\s]+', '-', slug)
>         return slug
>     except UnicodeDecodeError :
>         from encodings import idna
>         painful_slug = word.strip().lower().decode('utf-8').encode 
> ('IDNA')
>         return painful_slug

I’m not convinced by this approach, but I would suggest using the  
“punycode” instead of the “idna” encoder anyway. The results don’t  
include the initial “xn--” marks which are only useful in a domain  
name, not in a URI path. Also, the “from encodings […]” line appears  
to be unnecessary on my Python 2.3.5 and 2.4.1 on OSX.

[[[
 >>> p = u"perché"
 >>> from encodings import idna
 >>> p.encode('idna')
'xn--perch-fsa'
 >>> p.encode('punycode')
'perch-fsa'
 >>> puny = 'perch-fsa'
 >>> puny.decode('punycode')
u'perch\xe9'
 >>> print puny.decode('punycode')
perché
 >>> pu = puny.decode('punycode') # it's reversible
 >>> print pu
perché
]]]

More on Punycode: http://en.wikipedia.org/wiki/Punycode

Cheers.
-- 
Antonio



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~----------~----~----~----~------~----~------~--~---

Reply via email to