On Monday, 10 March 2014 at 19:48:34 UTC, H. S. Teoh wrote:
On Mon, Mar 10, 2014 at 07:49:04PM +0100, Johannes Pfau wrote:
Am Mon, 10 Mar 2014 11:30:07 -0700
schrieb Walter Bright <[email protected]>:

> On 3/10/2014 6:35 AM, Steven Schveighoffer wrote:
> > An idea to fix the whole problems I see with char[] being > > treated > > specially by phobos: introduce an actual string type, with > > char[] > > as backing, that is a dchar range, that actually dictates > > the > > rules we want. Then, make the compiler use this type for > > literals. > > Proposals to make a string class for D have come up many > times. I > have a kneejerk dislike for it. It's a really strong feature > for D > to have strings be an array type, and I'll go to great > lengths to
> keep it that way.

I'm on the fence about this one. The nice thing about strings being an array type, is that it is a familiar concept to C coders, and it allows array slicing for extracting substrings, etc., which fits nicely with the C view of strings as character arrays. As a C coder myself, I like it this way too. But the bad thing about strings being an array type, is that it's a holdover from C, and it allows slicing for extracting substrings -- malformed substrings by permitting slicing a multibyte
(multiword) character.

Basically, the nice aspects of strings being arrays only apply when you're dealing with ASCII (or mostly-ASCII) strings. These very same "nice" aspects turn into problems when dealing with anything non-ASCII. The only way the user can get it right using only array operations, is if they understand the whole of Unicode in their head and are willing to reinvent Unicode algorithms every time they slice a string or do some operation on it. Since D purportedly supports Unicode by default, it shouldn't be this way. D should *actually* support Unicode all the way -- use proper Unicode algorithms for substring extraction, collation, line-breaking, normalization, etc.. Being a systems language, of course, means that D should allow you to get under the hood and do things directly with the raw string representation -- but this shouldn't be the *default* modus operandi. The default should be a properly-encapsulated string type with Unicode algorithms to operate on it (with the option of
reaching into the raw representation where necessary).



You started off on the fence, but you seem pretty convinced by the end!

Reply via email to