Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Steven Schveighoffer Sat, 15 Jan 2011 13:30:58 -0800

On Sat, 15 Jan 2011 15:55:48 -0500, Michel Fortin<[email protected]> wrote:

On 2011-01-15 15:20:08 -0500, "Steven Schveighoffer"<[email protected]> said:
I'm not suggesting we impose it, just that we make it the default. Ifyou want to iterate by dchar, wchar, or char, just write:
        foreach (dchar c; "exposé") {}
        foreach (wchar c; "exposé") {}
        foreach (char c; "exposé") {}
        // or
        foreach (dchar c; "exposé".by!dchar()) {}
        foreach (wchar c; "exposé".by!wchar()) {}
        foreach (char c; "exposé".by!char()) {}
and it'll work. But the default would be a slice containing thegrapheme, because this is the right way to represent a Unicodecharacter.
I think this is a good idea. I previously was nervous about it, butI'm not sure it makes a huge difference. Returning a char[] iscertainly less work than normalizing a grapheme into one or more codepoints, and then returning them. All that it takes is to detect allthe code points within the grapheme. Normalization can be done ifneeded, but would probably have to output another char[], since anormalized grapheme can occupy more than one dchar.
I'm glad we agree on that now.

It's a matter of me slowly wrapping my brain around unicode and how it'sused. It seems like it's a typical committee defined standard where thereare 10 ways to do everything, I was trying to weed out the lesser used (orso I perceived) pieces to allow a more implementable library. It's doublyhard for me since I have limited experience with other languages, and I'venever tried to write them with a computer (my language classes in highschool were back in the days of actually writing stuff down on paper).

I once told a colleague who was on a standards committee that theirproposed KLV standard (key length value) was ridiculous. The wisecommittee had decided that in order to avoid future issues, the lengthwould be encoded as a single byte if < 128, or 128 + length of the lengthfield for anything higher. This means you could potentially have to parseand process a 127-byte integer!

What if I modified my proposed string_t type to return T[] as itselement type, as you say, and string literals are typed asstring_t!(whatever)? In addition, the restrictions I imposed onslicing a code point actually get imposed on slicing a grapheme. Thatis, it is illegal to substring a string_t in a way that slices througha grapheme (and by deduction, a code point)?
I'm not opposed to that on principle. I'm a little uneasy about havingso many types representing a string however. Some other raw comments:
I agree that things would be more coherent if char[], wchar[], anddchar[] behaved like other arrays, but I can't really see ajustification for those types to be in the language if there's nothingspecial about them (why not a library type?).

I would not be opposed to getting rid of those types. But I am veryopposed to char[] not being an array. If you want a string to besomething other than an array, make it have a different syntax. We alsohave to consider C compatibility.

However, we are in radical-change mode then, and this is probably pushedto D3 ;) If we can find some way to fix the situation withoutinvalidating TDPL, we should strive for that first IMO.

If strings and arrays of code units are distinct, slicing in the middleof a grapheme or in the middle of a code point could throw an error, butfor performance reasons it should probably check for that only whenarray bounds checking is turned on (that would require compiler supporthowever).

Not really, it could use assert, but that throws an assert error insteadof a RangeError. Of course, both are errors and will abort the program.I do wish there was a version(noboundscheck) to do this kind of stuffwith...

Actually, we would need a grapheme to be its own type, becausecomparing two char[]'s that don't contain equivalent bits and havingthem be equal, violates the expectation that char[] is an array.So the string_t!char would return a grapheme_t!char (names to bediscussed) as its element type.
Or you could make a grapheme a string_t. ;-)

I'm a little uneasy having a range return itself as its element type. Forall intents and purposes, a grapheme is a string of one 'element', so itcould potentially be a string_t.

It does seem daunting to have so many types, but at the same time, typesconvey relationships at compile time that can make coding impossible toget wrong, or make things actually possible when having a single typedoesn't.


I'll give you an example from a previous life:

Tango had a type called DateTime. This type represented *either* a pointin time, or a span of time (depending on how you used it). But I proposedwe switch to two distinct types, one for a point in time, one for a spanof time. It was argued that both were so similar, why couldn't we justkeep one type? The answer is simple -- having them be separate typesallows me to express relationships that the compiler enforces. Forexample, you can add two time spans together, but you can't add two pointsin time together. Or maybe you want a function to accept a time span(like a sleep operation). If there was only one type, thensleep(DateTime.now()) compiles and sleeps for what, 2011 years? ;)

I feel that making extra types when the relationship between them isimportant is worth the possible repetition of functionality. Catchingbugs during compilation is soooo much better than experiencing them duringruntime.


-Steve

Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Reply via email to