A user on one of the sites that I run has managed to create two user accounts for themselves:
Yoshi %EF%BC%B9%EF%BD%8F%EF%BD%93%EF%BD%88%EF%BD%89 (UTF-8 using URL encoding) When rendered in a web browser they both appear as "Yoshi", but from the point of view of my code and the database they are, of course, different. I allow people to have unrestricted usernames rather than restricting them to ASCII-printable-only characters because this makes sense on a Japanese site. The problem though is that this user cannot log in to the non-ASCII account. Or at least they could do if I could explain in length what has happened, and if they understood my explanation, but they shouldn't have to do this to use a web site. Is there a way to solve this? For example, is it feasible to work out if a general UTF-8 string has a lossless representation in ASCII and do this conversion? [Note in the second string above, it looks as if the Japanese part of Unicode contains a second mapping of the Roman character set, so presumably this is not a straightforward conversion] Alternately (and I don't really want to do this) is it possible to have an HTML form which accepts UTF-8 charset in most fields, but one field is limited to ASCII-only? Is it a good idea to allow unrestricted usernames in any case? Rich. -- Richard Jones. http://www.annexia.org/ http://www.j-london.com/ Merjis Ltd. http://www.merjis.com/ - improving website return on investment MONOLITH is an advanced framework for writing web applications in C, easier than using Perl & Java, much faster and smaller, reusable widget-based arch, database-backed, discussion, chat, calendaring: http://www.annexia.org/freeware/monolith/ -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
