hi,
would it be a good idea to add support to django to unicode-normalize
incoming get/post-data?
the normalization problem is basically this:
for example my name, 'gábor' can be written in 2 different ways in
unicode: u'g\xe1bor' and u'ga\u0301bor'.
the first one uses the 'LATIN SMALL LETTER A WITH ACUTE' character, and
the second one uses two characters to describe the same info: 'LATIN
SMALL LETTER A' + 'COMBINING ACUTE ACCENT'.
both strings are more&less equal from an 'unicode' perspective, and also
usually from an end-user's perspective.
as you can imagine, this can make problems, when in a web-app the user
searches using the first format, but the data is stored in the second
format.
the issue can be solved by normalizing the text. it means that you
convert all your strings into the 'same format'. there are several
normalization forms, some interesting ones are "NFC" (the most compact
representation) and "NFD" (the most decomposed representation).
(in my name-examples the first one is in NFC, and the second one is in NFD)
it's easy to normalize in python:
norm1 = unicodedata.normalize('NFC',text)
norm2 = unicodedata.normalize('NFD',text)
so i wanted to implement this with django in a way where i do not have
to normalize the get/post-data manually in every view. unfortunately,
the only way i found involved patching django:
it could be implemented like:
1. a new setting in settings.py called something like
'NORMALIZE_REQUEST_DATA', defaults to None, and can be set to 'NFC' or
'NFD' or to the other available normalization-forms.
2. do the normalization when the request-data is converted from
binary-strings to unicode-strings. this can be achieved either by adding
a new optional parameter to http.QueryDict, or by creating a
helper-function that converts a QueryDict into an unicode-normalized
QueryDict. and modifying the mod-python/wsgi handlers to call that code.
it's pretty simple to implement, but, before i submit an
enhancement-ticket... is there a chance for such a change to be accepted
into django?
p.s: if there is a way to achieve this without touching
django-internals, please tell me :)
p.s.2: the other interesting question of course is normalizing all
writes to the db in the django-orm, but that can be implemented
relatively simply in userland-code (signals and/or save() methods), and
maybe even more simpler when model-inheritance arrives.
thanks,
gabor
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Django developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---