On 01/02/2012 12:16 AM, Chad J wrote:
On 01/01/2012 02:25 PM, Timon Gehr wrote:
On 01/01/2012 08:01 PM, Chad J wrote:
On 01/01/2012 10:39 AM, Timon Gehr wrote:
On 01/01/2012 04:13 PM, Chad J wrote:
On 01/01/2012 07:59 AM, Timon Gehr wrote:
On 01/01/2012 05:53 AM, Chad J wrote:

If you haven't been educated about unicode or how D handles it, you
might write this:

char[] str;
... load str ...
for ( int i = 0; i<     str.length; i++ )
{
        font.render(str[i]); // Ewww.
        ...
}


That actually looks like a bug that might happen in real world code.
What is the signature of font.render?

In my mind it's defined something like this:

class Font
{
    ...

       /** Render the given code point at
           the current (x,y) cursor position. */
       void render( dchar c )
       {
           ...
       }
}

(Of course I don't know minute details like where the "cursor position"
comes from, but I figure it doesn't matter.)

I probably wrote some code like that loop a very long time ago, but I
probably don't have that code around anymore, or at least not easily
findable.

I think the main issue here is that char implicitly converts to dchar:
This is an implicit reinterpret-cast that is nonsensical if the
character is outside the ascii-range.

I agree.

Perhaps the compiler should insert a check on the 8th bit in cases like
these?

I suppose it's possible someone could declare a bunch of individual
char's and then start manipulating code units that way, and such an 8th
bit check could thwart those manipulations, but I would also counter
that such low manipulations should be done on ubyte's instead.

I don't know how much this would help though.  Seems like too little,
too late.

I think the conversion char ->  dchar should just require an explicit
cast. The runtime check is better left to std.conv.to;


What of valid transfers of ASCII characters into dchar?

Normally this is a widening operation, so I can see how it is permissible.


The bigger problem is that a char is being taken from a char[] and
thereby loses its context as (potentially) being part of a larger
codepoint.

If it is part of a larger code point, then it has its highest bit set.
Any individual char that has its highest bit set does not carry a
character on its own. If it is not set, then it is a single ASCII
character.

See above.


I think that assigning from a char[i] to another char[j] is probably
safe.  Similarly for slicing.  These calculations tend to occur, I
suspect, when the text is well-anchored.  I believe your balanced
parentheses example falls into this category:
(repasted for reader convenience)

void main(){
     string s = readln();
     int nest = 0;
     foreach(x;s){ // iterates by code unit
         if(x=='(') nest++;
         else if(x==')'&&  --nest<0) goto unbalanced;
     }
     if(!nest){
         writeln("balanced parentheses");
         return;
     }
unbalanced:
     writeln("unbalanced parentheses");
}

With these observations in hand, I would consider the safety of
operations to go like this:

char[i] = char[j];           // (Reasonably) Safe
char[i1..i2] = char[j1..j2]; // (Reasonably) Safe
char = char;                 // Safe
dchar = char                 // Safe.  Widening.
char = char[i];              // Not safe.  Should error.
dchar = char[i];             // Not safe.  Should error. (Corollary)
dchar = dchar[i];            // Safe.
char = char[i1..i2];         // Nonsensical; already an error.

That is an interesting point of view. Your proposal would therefore be to constrain char to the ASCII range except if it is embedded in an array? It would break the balanced parentheses example.

Reply via email to