On 4/14/06, Jeroen Ruigrok van der Werven <[EMAIL PROTECTED]> wrote: > I would sooner blame your setup or software for not properly > supporting such links.
The thing is, this *is* "proper support". The URL is still accessed correctly, and the page is displayed correctly. But the URL string itself is displayed encoded in some browsers, and this is arguably a useful thing -- as I'd hope we're all aware, there have been serious security issues with displaying unencoded non-ASCII URLs in the past. > With all due respect, the world is much larger than English. Who are > we to dictate their slugs are to be encoded in ASCII only? There are really two different issues going on here, and you're arguing about the other one. While I personally think there are usability problems with UTF-8 URLs, if people want them then Django should support them. I'm not trying to argue that everyone should be forced to use ASCII. However, the other issue (which was mentioned in the original post to this thread) is that the JavaScript "URLify" function which automatically generates slugs based on the 'prepopulate_from' attribute doesn't handle UTF-8. And I think it ought to stay that way, for one simple reason: it'd be impossible to make it truly support UTF-8. To see why, remember that URLify doesn't just lowercase all the words, kill the punctuation and replace spaces with hyphens. It also strategically drops out common English words that don't have any place in the slug: "the", "an", "this", etc. Admittedly we're already in trouble because we only do that with English; we don't drop "le", "un", "cela", etc. in French, for example. Opening up to anything in UTF-8 would only exacerbate the problem, because then we'd have an even bigger can of worms. Should a slug allow Japanese "particles"? Which ones? Should pronouns be dropped from Greek slugs, since they can be deduced from the case, gender and number of the verbs? Who would decide this? We'd need a whole new i18n system just to make the URLify function behave properly for the languages we support (and the existing gettext-like jsi18n stuff wouldn't work, because the set of excluded words would not be the same for each language -- rather than a single set of words with translations for each language, we'd need a separate set of words for every single language). And what about scripts where there's no concept of "lowercase"? So far as I know, JavaScript's regular-expression system doesn't support matching a generic "Unicode uppercase character" (and I'm not sure about Python; I know you can do it in Perl, though), so that's another language-specific switch to implement. So I don't see any advantages to making the URLify function support UTF-8, and see only headaches if we actually try to do it. If you're using a non-ASCII script, that means prepopulate_from probably isn't going to help you any and you'll have to fill in slug fields yourself. But I think, given the disadvantages of trying to do auto-populated UTF-8 slugs, that it's the better solution. -- "May the forces of evil become confused on the way to your house." -- George Carlin --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers -~----------~----~----~----~------~----~------~--~---