While working on updating and improving Lionello Lunesu's proposed fix for DMD issue #259, I have come across a value range propagation related issue with the dchar type.

The patch adds VRP-based compile-time evaluation of integer type comparisons, where possible. This caused the following issue:

The compiler will now optimize out attempts to handle invalid, out-of-range dchar values. For example:

dchar c = cast(dchar) uint.max;
if(c > 0x10FFFF)
    writeln("invalid");
else
    writeln("OK");

With constant folding for integer comparisons, the above will print "OK" rather than "invalid", as it should. The predicate (c > 0x10FFFF) is simply *assumed* to be false, because the current starting range.imax for a dchar expression is dchar.max.

So, this leads to the question: is making use of dchar values greater than dchar.max considered undefined behaviour, or not?

1. If it is UB, then there is quite a lot of D code (including std.uni) which must be corrected to use uint instead of dchar when dealing with values which could possibly fall outside the officially supported range.

2. If it is not UB, then the compiler needs to be updated to stop assuming that dchar values greater than dchar.max are impossible. This basically just means removing some of dchar's special treatment, and running it through more of the same code paths as uint.

At the moment, I strongly prefer #2, but I suppose #1 could make sense if people think code which might have to deal with invalid code points can be isolated sufficiently from other unicode processing.

Reply via email to