[Issue 15440] std.uni outputs \u0069\u0307 as the lower case of \u0130

via Digitalmars-d-bugs Mon, 11 Jan 2016 12:11:47 -0800

https://issues.dlang.org/show_bug.cgi?id=15440


Ali Cehreli <acehr...@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |acehr...@yahoo.com

--- Comment #3 from Ali Cehreli <acehr...@yahoo.com> ---
It looks like I am outdated on this issue because I had never heard of the 0069
0307 sequence before H. S. Teoh brought the following change to my attention:

  https://github.com/D-Programming-Language/phobos/pull/3848

I've learned since then that the two-character sequence should be the default
but TR locale should still use just 0069. According to the following quote,
Java 7 behaves differently depending on locale:

  http://grepalex.com/2013/02/14/java-7-and-the-dotted--and-dotless-i/

<quote>
CODE       LOWER   TITLE   UPPER  LANGUAGE
0130;  0069 0307;   0130;   0130;
0130;  0069;        0130;   0130;       tr;
0130;  0069;        0130;   0130;       az;

Entries with a language take precedence over those without, so in my JVM where
the default locale is English, the first row of the mapping is used, which
lines-up with the codepoints that we saw outputted in our Java 7 example.
Therefore to make Java do the right thing here for Turkish, we need to
explicitly specify the Turkish locale (“tr” is the ISO 639 alpha-2 language
code for Turkish) to the toLowerCase method
</quote>

Should std.uni be locale-aware?

--

[Issue 15440] std.uni outputs \u0069\u0307 as the lower case of \u0130

Reply via email to