On Tue, 2006-05-16 at 09:00 +0200, Gábor Farkas wrote:
> Jeroen Ruigrok van der Werven wrote:
> > On 5/16/06, Ville Säävuori <[EMAIL PROTECTED]> wrote:
> >> I think that this problem applies in most european languages, too.
> >> Like, say, Swedish, German and French.
> > 
> > The same appliesa for Dutch where we use trema's (sort of umlauts) to
> > denote any possible ambiguity in reading. So having the accent
> > stripped would be way better than having the entire letter stripped.
> > The same applies of course to say Spanish with the tilde-n, or even
> > some slavic languages or Romanian.
> > 
> 
> also in Hungarian and Slovak the preferred way is to just strip the accents.
> 
> maybe the best way would be to make this locale-dependent...

At the risk of offending everybody who uses a language requiring
accents, but this one of those "it's harder than it looks" problems in
Unicode. You need to have a mapping from every accented character (or a
reasonable set of them) to their unadorned equivalents. Many characters
are a single unicode character, not a unicode composition of two
characters, so it's not just a matter of "stripping the accent". So
either we're going to end up carrying around a fairly large mapping
table in the Javascript or we need a better solution.

To put the problem into context: it's only a small generalisation to
attempt to do the same thing for mapping Japanese characters to
ASCII-based URLs. The accent version just sucks you in by making you
think the characters are close because they look the same. Your computer
has a different "vision".

Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~----------~----~----~----~------~----~------~--~---

Reply via email to