On Thu, Sep 6, 2018 at 4:45 PM Dukc via Digitalmars-d < digitalmars-d@puremagic.com> wrote:
> On Thursday, 6 September 2018 at 14:17:28 UTC, aliak wrote: > > // D > > auto a = "á"; > > auto b = "á"; > > auto c = "\u200B"; > > auto x = a ~ c ~ a; > > auto y = b ~ c ~ b; > > > > writeln(a.length); // 2 wtf > > writeln(b.length); // 3 wtf > > writeln(x.length); // 7 wtf > > writeln(y.length); // 9 wtf > > > > writeln(a == b); // false wtf > > writeln("ááááááá".canFind("á")); // false wtf > > > > I had to copy-paste that because I wondered how the last two can > be false. They are because á is encoded differently. if you > replace all occurences of it with a grapheme that fits to one > code point, the results are: > > 2 > 2 > 7 > 7 > true > true > import std.stdio; import std.algorithm : canFind; import std.uni : normalize; void main() { auto a = "á".normalize; auto b = "á".normalize; auto c = "\u200B".normalize; auto x = a ~ c ~ a; auto y = b ~ c ~ b; writeln(a.length); // 2 writeln(b.length); // 2 writeln(x.length); // 7 writeln(y.length); // 7 writeln(a == b); // true writeln("ááááááá".canFind("á".normalize)); // true }