On Tue, 2006-05-16 at 09:00 +0200, Gábor Farkas wrote: > Jeroen Ruigrok van der Werven wrote: > > On 5/16/06, Ville Säävuori <[EMAIL PROTECTED]> wrote: > >> I think that this problem applies in most european languages, too. > >> Like, say, Swedish, German and French. > > > > The same appliesa for Dutch where we use trema's (sort of umlauts) to > > denote any possible ambiguity in reading. So having the accent > > stripped would be way better than having the entire letter stripped. > > The same applies of course to say Spanish with the tilde-n, or even > > some slavic languages or Romanian. > > > > also in Hungarian and Slovak the preferred way is to just strip the accents. > > maybe the best way would be to make this locale-dependent...
At the risk of offending everybody who uses a language requiring accents, but this one of those "it's harder than it looks" problems in Unicode. You need to have a mapping from every accented character (or a reasonable set of them) to their unadorned equivalents. Many characters are a single unicode character, not a unicode composition of two characters, so it's not just a matter of "stripping the accent". So either we're going to end up carrying around a fairly large mapping table in the Javascript or we need a better solution. To put the problem into context: it's only a small generalisation to attempt to do the same thing for mapping Japanese characters to ASCII-based URLs. The accent version just sucks you in by making you think the characters are close because they look the same. Your computer has a different "vision". Malcolm --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers -~----------~----~----~----~------~----~------~--~---
