On 2011-06-14 02:51, David Nadlinger wrote: > On 6/14/11 11:20 AM, Jonathan M Davis wrote: > > On 2011-06-14 01:51, David Nadlinger wrote: > >> But the functions in<ctype.h> do. And there can be some > >> locale-dependent problems even if you use only ASCII, the most prominent > >> being the different handling of »i« in the Turkish locale: > >> http://www.i18nguy.com/unicode/turkish-i18n.html > >> > >> This is probably another reason why it shouldn't be called std.ctype… > >> > > From the looks of it, that affects extended ASCII but not ASCII (since > > the > > > > Turkish uppercase I isn't even in ASCII). It's definitely a great link > > though. Thanks! > > Oh, I was probably a bit unclear – what I meant is that it affects you > also if you use only ASCII input, since toupper('i') == 221 when your > locale is tr_TR.ISO-8859-9.
Yes, but the result is extended ASCII, so it doesn't affect anything which only deals with pure ASCII. ctype.h deals with extended ASCII, so locales actually affect what it's doing. std.ctype only deals in pure ASCII, so it wouldn't do anything which would result in a non-ASCII character, and so locales shouldn't matter at all. However, if you _do_ want to bring locales into it, then a locale like tr_TR.ISO_8859-9 is not going to be able to operate purely in ASCII, since the uppercase value of i is 221, which is extended ASCII. So, yes I understood. It's just that as far as I can tell, locales don't matter if you're completely restricting yourself to ASCII like std.ctype does. And std.ctype is not going to try and deal with locales at this point (and likely not ever). I think that that is far better left to unicode. The Turkish locale is a great example of why you _want_ to be dealing with unicode when dealing with locales. std.ctype is for when you're specifically restricting yourself to ASCII (which sometimes can be very useful - e.g. with formatting strings or regex strings where all of the special characters are ASCII; using unicode functions would just make them slower at no benefit and would risk changing behavior based on locale if you brought locales into it). If you're not restricting yourself to ASCII, then std.uni is the way to go. - Jonathan M Davis
