On the meaning of string.length

Adam D. Ruppe via Digitalmars-d-announce Wed, 19 Nov 2014 06:35:53 -0800

I answered a random C# stackoverflow question about whystring.length returns the value it does with some rationaledefending code units instead of "characters" - basically, I typedup a defense of D's string-as-array behavior.

To my surprise, my answer got an enormous number of votes* so Idecided to post it to reddit too.


http://www.reddit.com/r/programming/comments/2mqghp/why_does_stringlength_count_code_units_instead_of/

This is really encouraging to me that there's been such apositive response. The question every so often comes up here too,people saying string.length should give number of characters, andof course, we have the automatic UTF decoding done in Phobos thatcomes up from time to time.


It looks like D, the language, made the right decisions here.

This reddit comment applies to the phobos thing though:

"Most people like to pick on surrogate pairs here, and decrylanguages which don't handle them "properly", but I think it'simportant to point out that handling surrogate pairs as a singlecharacter doesn't in any way fix the underlying issue -- manymultiple-codepoint sequences are a single logical glyph even ifyou use 32 bit wide chars."

I know this has been said a lot of times... but I think the autodecoding in phobos was and is a mistake. The bigger question iswhat I posited on stackoverflow: "Moreover, what's the point? Whydoes these metrics matter?" Similarly with std.algorithm onstrings, why would you ever want to call sort on a string? Well,I can think of a few reasons, like checking on the frequency ofletter, but I think we should see what happens if Phobos changesfrom autodecoding to compile error when that would occur. Then wecan fix it by casting to .representation or whatever to work withcode units or manually adding a .utfDecode to work with dcharsand make the decision explicitly.

That'd offer a way forward and I suspect would break less codethan we might think.

* stack overflow votes are a silly thing, a somewhat easy answerlike this gets a bazillion whereas difficult questions withdifficult answers get me one, maybe two votes. oh well.

On the meaning of string.length

Reply via email to