On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
http://jackstouffer.com/blog/d_auto_decoding_and_you.html


Thanks for writing this. Great article.

Some remarks:

   static assert(is(typeof(s.front()) == dchar));

I believe .front is a property (so some ranges can implement it as a field, not a @property function). Hence, no parens.

So, why is typeof(s.front) == dchar.

Question mark?

In plain English, this means when iterating over strings in D, D will look ahead in the string and combine any code units that make up a single code point.

Perhaps clarify that this only applies to ranges. `foreach` on a string will iterate over chars, but you can iterate over code points if you specify the dchar type explicitly.

More confusing text on the same issue lower, and in the intro:

Iterating a char array with C style for loops produces different results than foreach loops due to auto decoding.

One feature of D that is confusing to a lot of new comers is the behavior of strings in relation to range based features like the foreach statement and range algorithms.

---

E.g. for ë the code units C3 AB (for UTF-8) would turn into a single code point.

Perhaps choose a character that is not also expressable via composite characters, to avoid potential for confusion.

string s = "cassé";

Ditto (unless the goal was to complement the example from my .d file below)

These glaring inconsistencies are the cause of a lot of confusion for new comers.

(Opinion) I would say that they also cause issues in generic code.

Every time one wants a generic algorithm to work with both strings and ranges, you wind up special casing via static if-ing narrow strings to defeat the auto decoding, or to decode the ranges. Case in point.

Link to the exact SHA to prevent the link from getting outdated. On Github, just hit 'y' on your keyboard to go to the "permalink" version.

Auto decoding has two choices when encountering invalid code units: throw, or produce an error dchar like std.utf.byUTF does.

(Aside) This was an interesting discussion on the subject: https://issues.dlang.org/show_bug.cgi?id=14519

However, in my opinion D is too far along to to suddenly ask people

"to to"

---

Some more info / links on the subject I collected a few years ago:

http://wiki.dlang.org/Language_issues#Unicode_and_ranges

Reply via email to