Okay I think that fixes one fundamental issue... I've got a unittest,
however that fails for a function:

def normalize(tag):
    """
    >>> normalize(u'cafe')
    u'cafe'
    >>> normalize(u'caf e')
    u'cafe'
    >>> normalize(u' cafe ')
    u'cafe'
    >>> normalize(u' café ')
    u'cafe'
    >>> normalize(u'cAFe')
    u'cafe'
    >>> normalize(u'%sss%s')
    u'ssss'
    """
    try:
        tag = remove_diacritics(tag)
    except:
        pass

    tag = reTagnormalizer.sub('', tag).lower()
    return tag

It fails on the ' café' and translates it to cafa instead of cafe.
THis is only through the unittest framework (doctest) since I can run
it from django shell and it works as intended.

Is this just an issue with doctest?



On Dec 6, 4:30 pm, "Karen Tracey" <[EMAIL PROTECTED]> wrote:
> On Sat, Dec 6, 2008 at 7:23 PM, Dave Dash <[EMAIL PROTECTED]> wrote:
>
> > I'm experiencing some strange behavior, and I think it has to do with
> > how django deals with utf strings:
>
> > When I write a test.py file:
>
> > import re, unicodedata
>
> > reCombining = re.compile(u'[\u0300-\u036f\u1dc0-\u1dff\u20d0-\u20ff
> > \ufe20-\ufe2f]',re.U)
>
> > def remove_diacritics(s):
> >    return reCombining.sub('',unicodedata.normalize('NFD',unicode
> > (s)) )
>
> > and then open the python shell, I get:
>
> > Python 2.5.2 (r252:60911, Aug 10 2008, 00:43:40)
> > [GCC 4.0.1 (Apple Inc. build 5484)] on darwin
> > Type "help", "copyright", "credits" or "license" for more information.
> > >>> from test import *
> > >>> remove_diacritics(u'café')
> > u'cafe'
>
> > as intended.
>
> > When I do the same thing with the django shell:
> > $ python manage.py shell
> > Python 2.5.2 (r252:60911, Aug 10 2008, 00:43:40)
> > [GCC 4.0.1 (Apple Inc. build 5484)] on darwin
> > Type "help", "copyright", "credits" or "license" for more information.
> > (InteractiveConsole)
> > >>> from test import *
> > >>> remove_diacritics(u'café')
> > u'cafA\xa9'
>
> > Which isn't quite what I expected.
>
> > My questions are:
>
> > 1. How do I properly remove accents from strings in Django
> > 2. What is django (this is using trunk) doing to strings differently
> > than python?
>
> > Even typing u'é' in the shell returns different things.
>
> That's a Python bug:http://bugs.python.org/issue1288615 (manage.py shell
> uses Python's code.interact())
>
> I believe it's fixed in 2.6.
>
> Karen
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to