https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98801

Peter Cordes <peter at cordes dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter at cordes dot ca

--- Comment #5 from Peter Cordes <peter at cordes dot ca> ---
(In reply to Richard Biener from comment #4)
> Slight complication arises because people will want to have cmoves with a
> memory destination.

Do we even want to provide this?  Most ISAs can't branchlessly conditionally
store, except via an RMW (which wouldn't be thread-safe for the no-store case
if not atomic) or something really clunky.  (Like x86  rep stos  with count=0
or 1.)

ARM predicated instructions allow branchless load or store that doesn't disturb
the memory operand (and won't even fault on a bad address).

I guess another option to emulate it could be to make a dummy local and cmov to
select a store address = dummy : real.  But that's something users can build in
the source using a non-memory conditional-select builtin that exposes the much
more widely available ALU conditional-select functionality like x86 CMOV,
AArch64 CSEL, MIPS MVN, etc.


> That won't solve the eventual request to have cmov _from_ memory ... (if we
> leave all of the memory combining to RTL people will again complain that
> it's subject to compilers discretion).

It might be sufficient for most use-cases like defending against timing
side-channels to not really try to allow conditional loads (from maybe-invalid
pointers).

----

I'm not sure if the motivation for this includes trying to make code without
data-dependent branching, to defend against timing side-channels.

But if we do provide something like this, people are going to want to use it
that way.  That's one case where best-effort behaviour at the mercy of the
optimizer for a ternary (or having to manually check the asm) is not great. 
Stack Overflow has gotten a few Q&As from people looking for guaranteed CMOV
for reasons like that.

So I think we should be wary of exposing functionality that most ISAs don't
have.  OTOH, failing to provide a way to take advantage of functionality that
some ISAs *do* have is not great, e.g. ISO C failing to provide popcnt and
bit-scan (clz / ctz) has been a problem for C for a long time.

But for something like __builtin_clz, emulating on machines that don't have
hardware support still works.  If we're trying to support a guarantee of no
data-dependent branching, that limits the emulation possibilities or makes them
clunkier.  Especially if we want to support ARM's ability to not fault / not
access memory if the condition is false.

The ALU-select part can be emulated with AND/OR, so that's something we can
provide on any target.

Folding memory operands into a predicated load on ARM could actually introduce
data-dependent cache access, vs. an unconditional load and a predicated reg-reg
MOV.  So this becomes somewhat thorny, and some design work to figure out what
documented guarantees to provide will be necessary.  Performance use-cases
would certainly rather just have a conditional load in one instruction.

Reply via email to