Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Michel Fortin Sat, 15 Jan 2011 13:00:25 -0800

On 2011-01-15 15:20:08 -0500, "Steven Schveighoffer"<[email protected]> said:

I'm not suggesting we impose it, just that we make it the default. Ifyou want to iterate by dchar, wchar, or char, just write:
        foreach (dchar c; "exposé") {}
        foreach (wchar c; "exposé") {}
        foreach (char c; "exposé") {}
        // or
        foreach (dchar c; "exposé".by!dchar()) {}
        foreach (wchar c; "exposé".by!wchar()) {}
        foreach (char c; "exposé".by!char()) {}
and it'll work. But the default would be a slice containing thegrapheme, because this is the right way to represent a Unicodecharacter.
I think this is a good idea. I previously was nervous about it, butI'm not sure it makes a huge difference. Returning a char[] iscertainly less work than normalizing a grapheme into one or more codepoints, and then returning them. All that it takes is to detect allthe code points within the grapheme. Normalization can be done ifneeded, but would probably have to output another char[], since anormalized grapheme can occupy more than one dchar.


I'm glad we agree on that now.

What if I modified my proposed string_t type to return T[] as itselement type, as you say, and string literals are typed asstring_t!(whatever)? In addition, the restrictions I imposed onslicing a code point actually get imposed on slicing a grapheme. Thatis, it is illegal to substring a string_t in a way that slices througha grapheme (and by deduction, a code point)?

I'm not opposed to that on principle. I'm a little uneasy about havingso many types representing a string however. Some other raw comments:

I agree that things would be more coherent if char[], wchar[], anddchar[] behaved like other arrays, but I can't really see ajustification for those types to be in the language if there's nothingspecial about them (why not a library type?). If strings and arrays ofcode units are distinct, slicing in the middle of a grapheme or in themiddle of a code point could throw an error, but for performancereasons it should probably check for that only when array boundschecking is turned on (that would require compiler support however).

Actually, we would need a grapheme to be its own type, becausecomparing two char[]'s that don't contain equivalent bits and havingthem be equal, violates the expectation that char[] is an array.
So the string_t!char would return a grapheme_t!char (names to bediscussed) as its element type.


Or you could make a grapheme a string_t. ;-)


--
Michel Fortin
[email protected]
http://michelf.com/

Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Reply via email to