dchar undefined behaviour

tsbockman via Digitalmars-d Thu, 22 Oct 2015 18:36:07 -0700

While working on updating and improving Lionello Lunesu'sproposed fix for DMD issue #259, I have come across a value rangepropagation related issue with the dchar type.

The patch adds VRP-based compile-time evaluation of integer typecomparisons, where possible. This caused the following issue:

The compiler will now optimize out attempts to handle invalid,out-of-range dchar values. For example:


dchar c = cast(dchar) uint.max;
if(c > 0x10FFFF)
    writeln("invalid");
else
    writeln("OK");

With constant folding for integer comparisons, the above willprint "OK" rather than "invalid", as it should. The predicate (c> 0x10FFFF) is simply *assumed* to be false, because the currentstarting range.imax for a dchar expression is dchar.max.

So, this leads to the question: is making use of dchar valuesgreater than dchar.max considered undefined behaviour, or not?

1. If it is UB, then there is quite a lot of D code (includingstd.uni) which must be corrected to use uint instead of dcharwhen dealing with values which could possibly fall outside theofficially supported range.

2. If it is not UB, then the compiler needs to be updated to stopassuming that dchar values greater than dchar.max are impossible.This basically just means removing some of dchar's specialtreatment, and running it through more of the same code paths asuint.

At the moment, I strongly prefer #2, but I suppose #1 could makesense if people think code which might have to deal with invalidcode points can be isolated sufficiently from other unicodeprocessing.

dchar undefined behaviour

Reply via email to