hi,

would it be a good idea to add support to django to unicode-normalize 
incoming get/post-data?

the normalization problem is basically this:

for example my name, 'gábor' can be written in 2 different ways in 
unicode: u'g\xe1bor' and u'ga\u0301bor'.
the first one uses the 'LATIN SMALL LETTER A WITH ACUTE' character, and 
the second one uses two characters to describe the same info: 'LATIN 
SMALL LETTER A' + 'COMBINING ACUTE ACCENT'.

both strings are more&less equal from an 'unicode' perspective, and also 
usually from an end-user's perspective.

as you can imagine, this can make problems, when in a web-app the user 
searches using the first format, but the data is stored in the second 
format.

the issue can be solved by normalizing the text. it means that you 
convert all your strings into the 'same format'. there are several 
normalization forms, some interesting ones are "NFC" (the most compact 
representation) and "NFD" (the most decomposed representation).
(in my name-examples the first one is in NFC, and the second one is in NFD)

it's easy to normalize in python:

norm1 = unicodedata.normalize('NFC',text)
norm2 = unicodedata.normalize('NFD',text)

so i wanted to implement this with django in a way where i do not have 
to normalize the get/post-data manually in every view. unfortunately, 
the only way i found involved patching django:

it could be implemented like:

1. a new setting in settings.py called something like 
'NORMALIZE_REQUEST_DATA', defaults to None, and can be set to 'NFC' or 
'NFD' or to the other available normalization-forms.

2. do the normalization when the request-data is converted from 
binary-strings to unicode-strings. this can be achieved either by adding 
a new optional parameter to http.QueryDict, or by creating a 
helper-function that converts a QueryDict into an unicode-normalized 
QueryDict. and modifying the mod-python/wsgi handlers to call that code.

it's pretty simple to implement, but, before i submit an 
enhancement-ticket... is there a chance for such a change to be accepted 
into django?

p.s: if there is a way to achieve this without touching 
django-internals, please tell me :)

p.s.2: the other interesting question of course is normalizing all 
writes to the db in the django-orm, but that can be implemented 
relatively simply in userland-code (signals and/or save() methods), and 
maybe even more simpler when model-inheritance arrives.

thanks,
gabor

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to