[PATCH] D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible.

Hal Finkel via Phabricator via cfe-commits Wed, 03 Jul 2019 18:39:10 -0700

hfinkel added a comment.

In D64128#1569590 <https://reviews.llvm.org/D64128#1569590>, @efriedma wrote:

> > If they're all syntactically together like this, maybe that's safe?
>
> Having them together syntactically doesn't really help, I think; it might be 
> guarded by some code that does the same conversion (and if you repeat the 
> conversion, it has to produce the same result).

Indeed. That's correct (and also why the hasOneUse check at the IR level would 
have been ineffective). However...

In D64128#1569578 <https://reviews.llvm.org/D64128#1569578>, @rjmccall wrote:

> I agree with Eli that this isn't obviously a legal transformation.  
> `llvm.ptrmask` appears to make semantic guarantees about e.g. the pointer 
> after the mask referring to the same underlying object, which means we can 
> only safely emit it when something about the source program makes that 
> guarantee.  It's not at all clear that C does so for an expression like `(T*) 
> ((intptr_t) x & N)`.

I think that this is the key point. First, at the IR level we have a problem 
because we have no way to robustly track pointer provenance information. If we 
have `if (a == b) { f(a); }` the optimizer can transform this code into `if (a 
== b) { f(b); }` and we've lost track of whether the parameter to f is based on 
a or b. At the source level we don't have this problem (because we have the 
unaltered expressions provided by the user, and can therefore use whatever 
provenance information that source implies).

Thus, as John says, the question is whether, at the source level, `(T*) 
((intptr_t) x & N)` always has, and only has, the same underlying objects as x 
- when executing the expression is well defined. In C++, I think that this is 
clearly true for implementations with "strict pointer safety" (6.6.5.4.3), as 
the rules for safely-derived pointer values state that, while you can get 
safely-derived pointer values using integer casts and bitwise operators, the 
result must be one that could have been safely derived from the original object 
using well-defined pointer arithmetic, and that's only true for pointers into 
some array pointed into by x (or one past the end). For implementations with 
"relaxed pointer safety", it's all implementation defined, so I don't see we 
couldn't choose our implementation-defined semantics to define this problem 
away (although we certainly need to be careful that we don't unintentionally 
make any significant body of preexisting code incompatible with Clang by doing 
so).

For C, we also need to be concerned with the definition of "based on" 
(6.7.3.1). In some philosophical sense, this seems trickier (i.e., what if 
modifying the value of x at some sequence point prior to the expression makes 
the expressions dead? Are we required, as part of the standardized through 
experiment, to also modify the other variables to keep the expression alive 
when performing the "based on" analysis, and do those modifications count for 
the purposes of determining the "based on" property?). Regardless, given that 
the intent is to enable optimizations, it seems reasonable to say that `(T*) 
((intptr_t) x & N)` is only based on x. For C, 6.3.2.3 makes the conversion 
validity itself implementation defined.

@rsmith , thoughts on this?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64128/new/

https://reviews.llvm.org/D64128

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible.

Reply via email to