Re: The Case For Autodecode

Patrick Schluter via Digitalmars-d Sat, 04 Jun 2016 02:02:40 -0700

On Friday, 3 June 2016 at 20:18:31 UTC, Steven Schveighofferwrote:

On 6/3/16 3:52 PM, ag0aep6g wrote:
On 06/03/2016 09:09 PM, Steven Schveighoffer wrote:
Except many chars *do* properly convert. This should work:
char c = 'a';
dchar d = c;
assert(d == 'a');
Yeah, that's what I meant by "standalone code unit". Codeunits that on
their own represent a code point would not be touched.
But you can get a standalone code unit that is part of a codedsequence quite easily
foo(string s)
{
   auto x = s[0];
   dchar d = x;
}
As I mentioned in my earlier reply, some kind of "boundschecking" for
the conversion could be a possibility.

Hm... an interesting possiblity:

dchar _dchar_convert(char c)
{
return cast(int)cast(byte)c; // get sign extension fornon-ASCII
}
So when the char's most significant bit is set, this fills theupperbits of the dchar with 1s, right? And a set most significantbit in achar means it's part of a multibyte sequence, while in a dcharit meansthat the dchar is invalid, because they only go up toU+10FFFF. Huh. Neat.
An interesting thing is that I think the CPU can do this for us.
Does it work for for char -> wchar, too?
It does not. 0xffff is a valid code point, and I think so areall the other values that would result. In fact, I think thereare no invalid code units for wchar.


https://codepoints.net/specials

U+ffff would be fine, better at least than a surrogate.

Re: The Case For Autodecode

Reply via email to