Re: SlugField utf-8 support

James Bennett Fri, 14 Apr 2006 09:55:52 -0700

On 4/14/06, Jeroen Ruigrok van der Werven <[EMAIL PROTECTED]> wrote:
> I would sooner blame your setup or software for not properly
> supporting such links.


The thing is, this *is* "proper support". The URL is still accessed
correctly, and the page is displayed correctly. But the URL string
itself is displayed encoded in some browsers, and this is arguably a
useful thing -- as I'd hope we're all aware, there have been serious
security issues with displaying unencoded non-ASCII URLs in the past.

> With all due respect, the world is much larger than English. Who are
> we to dictate their slugs are to be encoded in ASCII only?

There are really two different issues going on here, and you're
arguing about the other one. While I personally think there are
usability problems with UTF-8 URLs, if people want them then Django
should support them. I'm not trying to argue that everyone should be
forced to use ASCII.

However, the other issue (which was mentioned in the original post to
this thread) is that the JavaScript "URLify" function which
automatically generates slugs based on the 'prepopulate_from'
attribute doesn't handle UTF-8. And I think it ought to stay that way,
for one simple reason: it'd be impossible to make it truly support
UTF-8.

To see why, remember that URLify doesn't just lowercase all the words,
kill the punctuation and replace spaces with hyphens. It also
strategically drops out common English words that don't have any place
in the slug: "the", "an", "this", etc. Admittedly we're already in
trouble because we only do that with English; we don't drop "le",
"un", "cela", etc. in French, for example. Opening up to anything in
UTF-8 would only exacerbate the problem, because then we'd have an
even bigger can of worms. Should a slug allow Japanese "particles"?
Which ones? Should pronouns be dropped from Greek slugs, since they
can be deduced from the case, gender and number of the verbs? Who
would decide this?

We'd need a whole new i18n system just to make the URLify function
behave properly for the languages we support (and the existing
gettext-like jsi18n stuff wouldn't work, because the set of excluded
words would not be the same for each language -- rather than a single
set of words with translations for each language, we'd need a separate
set of words for every single language).

And what about scripts where there's no concept of "lowercase"? So far
as I know, JavaScript's regular-expression system doesn't support
matching a generic "Unicode uppercase character" (and I'm not sure
about Python; I know you can do it in Perl, though), so that's another
language-specific switch to implement.

So I don't see any advantages to making the URLify function support
UTF-8, and see only headaches if we actually try to do it. If you're
using a non-ASCII script, that means prepopulate_from probably isn't
going to help you any and you'll have to fill in slug fields yourself.
But I think, given the disadvantages of trying to do auto-populated
UTF-8 slugs, that it's the better solution.

--
"May the forces of evil become confused on the way to your house."
  -- George Carlin

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~----------~----~----~----~------~----~------~--~---

Re: SlugField utf-8 support

Reply via email to