Sorry Karen my mistake for leaving that out, that reTagnormalizer just
filtered everything that wasn't alphanumeric, the full code is below.
Also here's the error from manage.py test
File "/restaurant/models.py", line 33, in
mealadvisor.restaurant.models.normalize
Failed example:
normalize(u' café ')
Expected:
u'cafe'
Got:
u'cafa'
***
import unicodedata, re
reTagnormalizer= re.compile(r'[^a-zA-Z0-9]')
reCombining = re.compile(u'[\u0300-\u036f\u1dc0-\u1dff\u20d0-\u20ff
\ufe20-\ufe2f]',re.U)
def remove_diacritics(s):
" Decomposes string, then removes combining characters "
return reCombining.sub('',unicodedata.normalize('NFD',unicode
(s)) )
# tag normalizer
def normalize(tag):
"""
>>> normalize(u'cafe')
u'cafe'
>>> normalize(u'caf e')
u'cafe'
>>> normalize(u' cafe ')
u'cafe'
For now this is wrong I think it's an error with doctest, not the
actual function.
>>> normalize(u' café ')
u'cafe'
>>> normalize(u'cAFe')
u'cafe'
>>> normalize(u'%sss%s')
u'ssss'
"""
try:
tag = remove_diacritics(tag)
except:
pass
tag = reTagnormalizer.sub('', tag).lower()
return tag
On Dec 6, 9:42 pm, "Karen Tracey" <[EMAIL PROTECTED]> wrote:
> On Sat, Dec 6, 2008 at 9:00 PM, Dave Dash <[EMAIL PROTECTED]> wrote:
>
> > Okay I think that fixes one fundamental issue... I've got a unittest,
> > however that fails for a function:
>
> > def normalize(tag):
> > """
> > >>> normalize(u'cafe')
> > u'cafe'
> > >>> normalize(u'caf e')
> > u'cafe'
> > >>> normalize(u' cafe ')
> > u'cafe'
> > >>> normalize(u' café ')
> > u'cafe'
> > >>> normalize(u'cAFe')
> > u'cafe'
> > >>> normalize(u'%sss%s')
> > u'ssss'
> > """
> > try:
> > tag = remove_diacritics(tag)
> > except:
> > pass
>
> > tag = reTagnormalizer.sub('', tag).lower()
> > return tag
>
> > It fails on the ' café' and translates it to cafa instead of cafe.
> > THis is only through the unittest framework (doctest) since I can run
> > it from django shell and it works as intended.
>
> > Is this just an issue with doctest?
>
> If I cut and paste your code and take out reTagnormalizer (since you didn't
> post that) and all the tests that seem to depend on what it does vs.
> remove_diacritics, and just test:
>
> """
> >>> normalize(u'café')
> u'cafe'
> """
> plain Python doctesting it works fine, as does 'manage.py test someapp' (if
> I put the code in somapp's models.py file).
>
> So I can't recreate the error you are reporting based on what you have
> posted. What's in reTagnormalizer?
>
> Karen
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Django users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---