https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677

            Bug ID: 82677
           Summary: Many projects (linux, coreutils, GMP, gcrypt, openSSL,
                    etc) are misusing asm(divq/divl) etc, potentially
                    resulting in faulty/unintended optimisations
           Product: gcc
           Version: 7.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: inline-asm
          Assignee: unassigned at gcc dot gnu.org
          Reporter: infinity0 at pwned dot gg
  Target Milestone: ---

Created attachment 42439
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42439&action=edit
Basic optimisation example, C-reduced from flint test code

If I understood the issue correctly, this is strictly speaking not a GCC bug.
But I'm filing this here so there is a central place to have the discussion.
There is also the possibility of GCC developers discussing a central fix for
this issue in GCC itself, rather than patching 20+ FOSS projects.

Background
==========

When using asm() it is vital that the dependencies are expressed correctly,
otherwise GCC can optimise stuff in an unintended way. Those new to the topic
(e.g. me a few days ago) can read:

- https://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html for a nice
intro
- https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html for details

The Intel instruction manual [0] for the "div" and "idiv" instructions states
that:

- "The CF, OF, SF, ZF, AF, and PF flags are undefined."
- raises a #DE (Divide error) (#a) "If the source operand (divisor) is 0" and
(#b) "If the quotient is too large for the designated register."

Therefore, if you want to write a general-purpose "div" utility in GCC asm,
that assumes nothing about its inputs or outputs, the correct syntax would be
something along the lines of:

  __asm__ __volatile__("divq %4" : "=a"(q), "=d"(r) : "0"((ulong)n0),
"1"((ulong)n1), "rm"((ulong)d) : "cc");

- __volatile__ because raising an exception is a "side-effect", i.e. affects
something outside of the declared inputs and outputs, and
- "cc" because it might clobber the FLAGS register.

Note that __volatile__ is necessary even if the divisor "cannot be 0" in most
cases. (Therefore, it is necessary for a general-purpose "div" utility.) For
example, (##) if the code is unreachable for reasons unrelated to the division
operands, then the instruction will never raise a DE, but if __volatile__ is
not declared then GCC might optimise the code in a way such that the div
operation *is* reachable sometimes when the divisor is zero.

To demonstrate that example concretely, see udiv.c as a test case, where "if
(g)" should prevent the div from running, but GCC 7.2 -O2 lifts the asm() out
of the inner loop and the if(), causing a DE (SIGFPE) at run time. (udiv.c was
minimised from the real-world example at [1] using C-Reduce [2] - which I
couldn't have done this without.)

One way to make __volatile__ *actually* unnecessary, is to do an explicit
unconditional check (i.e. not based on a removeable macro) for divisor != 0
before the operation, and branch away if = 0. In this case, since divisor is an
input to the asm() instruction, GCC will then know not to reorder it with
respect to the branch.

Also, I am not sure if this sort of unexpected SIGFPE could potentially result
in a security problem. If so then perhaps the priority should be raised.

[0] https://software.intel.com/en-us/articles/intel-sdm
[1] https://github.com/fredrik-johansson/arb/issues/194
[2] https://embed.cs.utah.edu/creduce/

Problem
=======

Lots of people copied the same code (longlong.h), with the same definition of
udiv_qrnnd, into their own projects. [3] At the time of writing this includes:
linux, grub2, coreutils, GMP, gcrypt, mpfr, and even older versions of GCC
itself. The code seems to originate from GMP, but I didn't confirm this or
investigate in great detail.

Now technically it is possible to use this correctly, by doing an unconditional
explicit check for divisor != 0 as mentioned above. However, this complexity is
*not* documented in the description, e.g. [4], which even describes the *other*
condition (#b) above, for the instruction not to raise an exception -
"HIGH_NUMERATOR must be less than DENOMINATOR for correct operation.". Even if
it was documented naively ("divisor must be zero"), one can imagine someone
interpreting this as (##) above, which would still be incorrect and still cause
faulty optimisations.

Also, there is a fallback macro __udiv_qrnnd_c [5] defined which reimplements
the instruction in C for platforms that don't have it, and in this case GCC
knows that the C "/" operation can cause div-by-zero errors, and optimises
everything as intended, as one can see by compiling the udiv-alt.c attached
with -DNOOPT_UDIV. So this suggests that the original author of udiv_qrnnd was
not aware of these issues either - since the correctness requirements between
udiv_qrnnd and __udiv_qrnnd_c are different.

[3]
https://codesearch.debian.net/search?q=asm.*%5C%28.*%22i%3Fdiv%5Bql%5D%5C+*%25%5Cd+path%3Alonglong.h&perpkg=1
[4]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/mpi/longlong.h?id=5ce3e312ec5c11abce13215be70700778bf601f0#n56
[5] https://gmplib.org/repo/gmp/file/tip/longlong.h#l2066

[6]
https://codesearch.debian.net/search?q=asm.*%5C%28.*%22i%3Fdiv%5Bql%5D%5C+*%25%5Cd&perpkg=1

Here [6] is a quick search through Debian packages, of all potential instances
of this issue. At the time of writing, a lot of these are affected by the issue
described above, their code sharing history with the GMP version, [3] being a
subset of [6].

Some projects correctly implement a generic div operation, using both
__volatile__ and "cc":

-
https://boringssl.googlesource.com/boringssl/+/d1425f69df16310bdca46a3d66144dcb4e3ad4fc%5E!/
-
https://github.com/ceph/ceph/commit/38e60deb5af10d5e2d8023a73a955080285b2760#diff-3501c244d08381cbfa48b804dfac9f09R412

Some projects were almost there:

-
https://github.com/GStreamer/gstreamer/commit/b0c1ebbd0883a9191d553a40b78facff8cb7c26e#diff-b162b68977c03c9a2029a6dd58860a50R267
// whoops, misses out "cc"
-
https://github.com/python/cpython/commit/a0346e56acc8c5371c7b6356e1862e6e01406dbb#diff-61d4a6a33e8afe74d982d4c887e55d41R226
// whoops, misses out "volatile"

Moving forward
==============

I pieced together the various parts of the puzzle over the last few days from
talking to various people. There was no one single person who could tell me
with certainty all of the details mentioned here, so I am not sure if I got
everything correct - therefore it would be good if a GCC developer could
confirm my analysis. Then I suppose we shall have to start filing bug reports
and patches to these projects to get things fixed.

Given that this is such a wide issue perhaps it might be good for GCC to emit a
warning, or else automatically treat asm(div) as volatile unless explicitly
marked as non-volatile and likewise for "cc". A topic for GCC developers - I'm
not in a position to know if this is feasible or not.

Reply via email to