On 01/01/2012 08:01 PM, Chad J wrote:
On 01/01/2012 10:39 AM, Timon Gehr wrote:
On 01/01/2012 04:13 PM, Chad J wrote:
On 01/01/2012 07:59 AM, Timon Gehr wrote:
On 01/01/2012 05:53 AM, Chad J wrote:

If you haven't been educated about unicode or how D handles it, you
might write this:

char[] str;
... load str ...
for ( int i = 0; i<    str.length; i++ )
{
       font.render(str[i]); // Ewww.
       ...
}


That actually looks like a bug that might happen in real world code.
What is the signature of font.render?

In my mind it's defined something like this:

class Font
{
   ...

      /** Render the given code point at
          the current (x,y) cursor position. */
      void render( dchar c )
      {
          ...
      }
}

(Of course I don't know minute details like where the "cursor position"
comes from, but I figure it doesn't matter.)

I probably wrote some code like that loop a very long time ago, but I
probably don't have that code around anymore, or at least not easily
findable.

I think the main issue here is that char implicitly converts to dchar:
This is an implicit reinterpret-cast that is nonsensical if the
character is outside the ascii-range.

I agree.

Perhaps the compiler should insert a check on the 8th bit in cases like
these?

I suppose it's possible someone could declare a bunch of individual
char's and then start manipulating code units that way, and such an 8th
bit check could thwart those manipulations, but I would also counter
that such low manipulations should be done on ubyte's instead.

I don't know how much this would help though.  Seems like too little,
too late.

I think the conversion char -> dchar should just require an explicit cast. The runtime check is better left to std.conv.to;


The bigger problem is that a char is being taken from a char[] and
thereby loses its context as (potentially) being part of a larger
codepoint.

If it is part of a larger code point, then it has its highest bit set. Any individual char that has its highest bit set does not carry a character on its own. If it is not set, then it is a single ASCII character.

Reply via email to