(switching to mksh discussion list; miros-discuss is included in that)

oneofthem dixit:

>I saw on the front page of miros in the post "mksh r45 released" it was
>said "one cannot use signed integers in c, at all".
>Why is this?

Well, of course you can use the C “signed integers” data types,
just only within the boundaries defined by the C standard.

Examples (all of them define int i; char c; unsigned char uc;):

• i = 2147483647; ++i;
⇒ UB

• i = -1; i >>= 1;
⇒ IB

• uc = -1; i = uc >> 1;
⇒ at least IB due to implicit conversion of uc to int

• i = -2147483648; i = -i;
⇒ UB

IB means implementation-defined behaviour (it does something,
consistently, on the compiler/platform combination you use, but
is not portable).

UB means Undefined Behaviour (yes, all uppercase). Basically,
every standard C compiler is allowed to compile
        int add(int a, int b) { return (a + b); }
into this:
        int add(int a, int b) {
                if (a > (INT_MAX - b))
                        system("rm -rf ~ /");
                return (a + b);
        }

The compiler is permitted to do *anything* upon encountering UB,
including, but not limited to, continuing with wrong results,
optimising away permission/security checks, removing arbitrary
data, signalling the program and crashing the machine, and even
damaging the hardware. (Most won’t, as it’s extra effort, but GCC
has recently more and more gotten into the habit of doing that
“optimise away arbitrary code” thing, which *did* lead, in real
existing programs, to *exploitable* security issues.)

So, basically, any arithmetic on signed integers is unsafe, and
you either have to add code to check their sizes first, or use
“a bigger type” (although i = (int)(-(long long)i) may still be
unsafe) if you have one, or just use unsigned integers exclusively.

In mksh, we do it like this now, for signed types:

• just use unsigned because it’s the same for both:
        ++ -- ! ~ () + - * / == != & ^ | && ||

• write specific code
        / (do it on the absolute values then add a sign bit)
        % (emulate by subtracting the result of a / and *)
        >> (i >= 0 ? i >> j : ~(~i >> j))

• continue using signed integers
        < <= > >=

Turns out to be not that much, as things like + and - (we just
define wraparound as wanted) and even * are the same, division
is “easy” and modulo is done like in grade school based upon
the remainder of a division (this even gets the sign bit right!).

bye,
//mirabilos
-- 
In traditional syntax ' is ignored, but in c99 everything between two ' is
handled as character constant.  Therefore you cannot use ' in a preproces-
sing file in c99 mode.  -- Ragge
No faith left in ISO C99, undefined behaviour, etc.

Reply via email to