Re: Proposal for fixing dchar ranges

John Colvin Mon, 10 Mar 2014 14:02:23 -0700

On Monday, 10 March 2014 at 19:48:34 UTC, H. S. Teoh wrote:

On Mon, Mar 10, 2014 at 07:49:04PM +0100, Johannes Pfau wrote:
Am Mon, 10 Mar 2014 11:30:07 -0700
schrieb Walter Bright <[email protected]>:
> On 3/10/2014 6:35 AM, Steven Schveighoffer wrote:
> > An idea to fix the whole problems I see with char[] being> > treated> > specially by phobos: introduce an actual string type, with> > char[]> > as backing, that is a dchar range, that actually dictates> > the> > rules we want. Then, make the compiler use this type for> > literals.>> Proposals to make a string class for D have come up many> times. I> have a kneejerk dislike for it. It's a really strong feature> for D> to have strings be an array type, and I'll go to great> lengths to
> keep it that way.
I'm on the fence about this one. The nice thing about stringsbeing anarray type, is that it is a familiar concept to C coders, andit allowsarray slicing for extracting substrings, etc., which fitsnicely withthe C view of strings as character arrays. As a C coder myself,I likeit this way too. But the bad thing about strings being an arraytype, isthat it's a holdover from C, and it allows slicing forextractingsubstrings -- malformed substrings by permitting slicing amultibyte
(multiword) character.
Basically, the nice aspects of strings being arrays only applywhenyou're dealing with ASCII (or mostly-ASCII) strings. These verysame"nice" aspects turn into problems when dealing with anythingnon-ASCII.The only way the user can get it right using only arrayoperations, isif they understand the whole of Unicode in their head and arewilling toreinvent Unicode algorithms every time they slice a string ordo someoperation on it. Since D purportedly supports Unicode bydefault, itshouldn't be this way. D should *actually* support Unicode allthe way-- use proper Unicode algorithms for substring extraction,collation,line-breaking, normalization, etc.. Being a systems language,of course,means that D should allow you to get under the hood and dothingsdirectly with the raw string representation -- but thisshouldn't be the*default* modus operandi. The default should be aproperly-encapsulatedstring type with Unicode algorithms to operate on it (with theoption of
reaching into the raw representation where necessary).

You started off on the fence, but you seem pretty convinced bythe end!

Re: Proposal for fixing dchar ranges

Reply via email to