On 2011-01-15 15:20:08 -0500, "Steven Schveighoffer"
<[email protected]> said:
I'm not suggesting we impose it, just that we make it the default. If
you want to iterate by dchar, wchar, or char, just write:
foreach (dchar c; "exposé") {}
foreach (wchar c; "exposé") {}
foreach (char c; "exposé") {}
// or
foreach (dchar c; "exposé".by!dchar()) {}
foreach (wchar c; "exposé".by!wchar()) {}
foreach (char c; "exposé".by!char()) {}
and it'll work. But the default would be a slice containing the
grapheme, because this is the right way to represent a Unicode
character.
I think this is a good idea. I previously was nervous about it, but
I'm not sure it makes a huge difference. Returning a char[] is
certainly less work than normalizing a grapheme into one or more code
points, and then returning them. All that it takes is to detect all
the code points within the grapheme. Normalization can be done if
needed, but would probably have to output another char[], since a
normalized grapheme can occupy more than one dchar.
I'm glad we agree on that now.
What if I modified my proposed string_t type to return T[] as its
element type, as you say, and string literals are typed as
string_t!(whatever)? In addition, the restrictions I imposed on
slicing a code point actually get imposed on slicing a grapheme. That
is, it is illegal to substring a string_t in a way that slices through
a grapheme (and by deduction, a code point)?
I'm not opposed to that on principle. I'm a little uneasy about having
so many types representing a string however. Some other raw comments:
I agree that things would be more coherent if char[], wchar[], and
dchar[] behaved like other arrays, but I can't really see a
justification for those types to be in the language if there's nothing
special about them (why not a library type?). If strings and arrays of
code units are distinct, slicing in the middle of a grapheme or in the
middle of a code point could throw an error, but for performance
reasons it should probably check for that only when array bounds
checking is turned on (that would require compiler support however).
Actually, we would need a grapheme to be its own type, because
comparing two char[]'s that don't contain equivalent bits and having
them be equal, violates the expectation that char[] is an array.
So the string_t!char would return a grapheme_t!char (names to be
discussed) as its element type.
Or you could make a grapheme a string_t. ;-)
--
Michel Fortin
[email protected]
http://michelf.com/