Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

Joakim via Digitalmars-d Thu, 06 Sep 2018 10:20:41 -0700

On Thursday, 6 September 2018 at 16:44:11 UTC, H. S. Teoh wrote:

On Thu, Sep 06, 2018 at 02:42:58PM +0000, Dukc viaDigitalmars-d wrote:
On Thursday, 6 September 2018 at 14:17:28 UTC, aliak wrote:
> // D
> auto a = "á";
> auto b = "á";
> auto c = "\u200B";
> auto x = a ~ c ~ a;
> auto y = b ~ c ~ b;
>> writeln(a.length); // 2 wtf
> writeln(b.length); // 3 wtf
> writeln(x.length); // 7 wtf
> writeln(y.length); // 9 wtf
[...]
This is an unfair comparison. In the Swift version you used.count, but here you used .length, which is the length of thearray, NOT the number of characters or whatever you expect itto be. You should rather use .count and specify exactly whatyou want to count, e.g., byCodePoint or byGrapheme.
I suspect the Swift version will give you unexpected results ifyou did something like compare "á" to "a\u301", for example(which, in case it isn't obvious, are visually identical toeach other, and as far as an end user is concerned, should onlycount as 1 grapheme).
Not even normalization will help you if you have a string like"a\u301\u302": in that case, the *only* correct way to countthe number of visual characters is byGrapheme, and I highlydoubt Swift's .count will give you the correct answer in thatcase. (I expect that Swift's .count will count code points, asis the usual default in many languages, which is unfortunatelywrong when you're thinking about visual characters, which arecalled graphemes in Unicode parlance.)

No, Swift counts grapheme clusters by default, so it gives 1. Isuggest you read the linked Swift chapter above. I think it's thewrong choice for performance, but they chose to emphasizeintuitiveness for the common case.

I agree with most of the rest of what you wrote about programmershaving no silver bullet to avoid Unicode's and languages'complexity.

Re: Dicebot on leaving D: It is anarchy driven development in all its glory.

Reply via email to