On Tue, 11 Jan 2011 18:00:30 -0500, Andrei Alexandrescu
<[email protected]> wrote:
On 1/11/11 11:21 AM, Steven Schveighoffer wrote:
It is supposed to be simple, and provide the expected interface, without
causing any undue performance degradation. That is, I should be able to
do all the things with a replacement string type that I can with a char
array today, as efficiently as I can today, except I should have to work
to get at the code-units. The huge benefit is that I can say "I'm
dealing with this as an array" when I know it's safe
Unfinished sentence?
Sorry, I forgot '.' :)
Anyway, for my money you just described what we have now.
All except the 'expected interface' part. The string type should deal
with dchars exclusively, since that's what it is a range of. char[] gives
you char's back when you index it. Anyone who doesn't use ASCII will be
confused by this.
Also, I expect to be able to use a char[] as an array, which Phobos
doesn't let me in some cases (e.g. sorting ASCII character array).
The disagreement will never be fully solved, as there is just as much
disagreement about the current state of affairs ;) e.g. should foreach
default to using dchar?
I disagree about the disagreement being unsolvable. I'm not rigid; if I
saw a terrific abstraction in your string, I'd be all for it. It just
shuffles some issues about, and although I agree it does one thing or
two better than char[], at the end of the day it doesn't carry its
weight.
I see it as having two vast improvements:
1. If we replace char[] with a specific type for string, then char[] can
be considered a true array by phobos, and phobos can now deal with a
char[] array without the need to cast.
2. It protects the casual user from incorrectly using a string by making
the default the correct API.
Those to me are very important.
I don't think I'll ever be 'happy' with the way strings sit in phobos
currently. I typically deal in ASCII (i.e. code units), and phobos works
very hard to prevent that.
I wonder if we could and should extend some of the functions in
std.string to work with ubyte[]. I did add a function called
representation() that I didn't document yet. Essentially representation
gives you the ubyte[], ushort[], or uint[] underneath a string, with the
same qualifiers. Whenever you want an algorithm to work on ASCII in
earnest, you can pass representation(s) to it instead of s.
This, again, fails on point 2 above. A char[] is an array, and allows
access to code-units, which is not the correct interface for a string.
Supporting ubyte[] doesn't fix that problem. Correct as the default is
usually a theme in D...
If you work a lot with ASCII, an AsciiString abstraction may be a better
and more likely to be successful string type. Better yet, you could
simply focus on AsciiChar and then define ASCII strings as arrays of
AsciiChar.
This seems like the wrong approach. Adding a new type does not fix the
problems with the original type. We need to replace the original type or
at least how it is treated by the compiler.
-Steve