[PATCH] D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible.

Hal Finkel via Phabricator via cfe-commits Fri, 05 Jul 2019 19:23:36 -0700

hfinkel added a comment.

In D64128#1570609 <https://reviews.llvm.org/D64128#1570609>, @rjmccall wrote:


> In D64128#1569836 <https://reviews.llvm.org/D64128#1569836>, @hfinkel wrote:
>
> > In D64128#1569817 <https://reviews.llvm.org/D64128#1569817>, @rjmccall 
> > wrote:
> >
> > > The pointer/integer conversion is "implementation-defined", but it's not 
> > > totally unconstrained.  C notes that "The mapping functions for 
> > > converting a pointer to an integer or an integer to a pointer are 
> > > intended to be consistent with the addressing structure of the execution 
> > > environment.", and we do have to honor that.  The standard allows that 
> > > "the result ... might not point to an entity of the referenced type", but 
> > > when in fact it's guaranteed to do so (i.e. it's not just a coincidental 
> > > result of an implementation decision like the exact address of a global 
> > > variable — no "guessing"), I do think we have an obligation to make it 
> > > work.  And on a practical level, there has to be *some* way of playing 
> > > clever address tricks in the language in order to implement things like 
> > > allocators and so forth.  So this makes me very antsy.
> >
> >
> > I don't disagree. But I believe the question is if we have:
> >
> >   int *x = malloc(4);
> >   int *y = malloc(4);
> >   if (x & ~15 == y) {
> >     *(x & ~15) = 5; // Is this allowed, and if so, must the compiler assume 
> > that it might set the value of *y?
> >   }
> >   
> >
> > I certainly agree that we must allow the implementation of allocators, etc. 
> > But allocators, I think, have the opposite problem. They actually have some 
> > large underlying objects (from mmap or whatever), and we want the rest of 
> > the system to treat some subobjects of these larger objects as though they 
> > were independent objects of some given types. From the point of view of the 
> > allocator, we have x, and we have `void *memory_pool`, and we need to allow 
> > `x & N` to point into `memory_pool`, but because, from the allocator's 
> > perspective, we never knew that x didn't point into memory_pool (as, in 
> > fact, it likely does), that should be fine (*).
> >
> > There might be more of an issue, for example, if for a given object, I 
> > happen to know that there's some interesting structure at the beginning of 
> > its page (or some other boundary).
>
>
> This is what I was thinking about for allocators; this is a common 
> implementation technique for `free` / `realloc` / `malloc_size`.
>
> > If I also have a pointer to this structure via some other means, then maybe 
> > this will cause a problem. This kind of thing certainly falls outside of 
> > the C/C++ abstract machine, and I'd lean toward a flag for supporting it 
> > (not on by default).
>
> If you mean a theoretical minimal C abstract machine that does not correspond 
> to an actual target and is therefore not bound by any of the statements in 
> the C standard that say things like "this is expected to have its obvious 
> translation on the target", then yes, I completely agree.  If you're talking 
> about the actual C programming language that does correspond to actual 
> targets, then it's not clear at all that it's outside the C abstract machine, 
> because AFAICT integer-pointer conversions are (1) well-specified on specific 
> targets by this de facto requirement of corresponding directly to pointer 
> representations and (2) well-behaved as long as the integer does correspond 
> to the address of an actual object of that type.
>
> Also, please understand that compiler writers have been telling our users for 
> decades that (1) pointer arithmetic is subject to some restrictions on 
> penalty of UB and (2) they can avoid those restrictions by using 
> pointer-integer conversions and doing integer arithmetic instead.


Indeed, we have.

>   So any proposal to weaken the latter as a workaround makes me very worried, 
> especially if it's also enforcing alignment restrictions that we've generally 
> chosen not to enforce when separated from actual memory accesses.

I *think* that what @fhahn points out about the masking restrictions addresses 
the concerns that we were discussing: My understanding of the 
potentially-problematic cases here are when the masking could get you from one 
variable from one underlying object (that you might access) to another variable 
in a different underlying object (that you might also access), and given the 
masking restrictions, this is impossible (*): because the high bits can't be 
used for addressing, and for the lower bits, these fall within the alignment of 
the type, and so in order for that to move you between underlying objects, the 
objects would need to be smaller than the type alignment.

(*) I suppose one can arrange for this to break something given some 
system-specific setup:

  // note: the linker script ensures that these are packed and adjacent.
  short a;
  struct {
    short b, c;
  } s;
  
  ...
  int *ib = (int *) s.b;
  int *ia = (int *) (((size_t) ib) & ~1);
   a = 6;
  (*(short *) ia) = 5;
  return a; // oops, aliasing will now enable us to return 6 here.


The probability that someone is doing that seems low, because they need to be 
doing the masking and casting with a type with larger alignment requirements 
than the type they'll actually access.

On the other hand, using the otherwise-unused bits in the pointer to store 
information, and then mask to get the pointer back, is pretty common. In any 
case, I recommend that we add a "-f" flag for this.

> 
> 
>> Also, and I could be wrong, but my impression is that all of this is extra - 
>> this motivating use case requires generating the intrinsic from the code in 
>> lib/CodeGen/TargetInfo.cpp - generating it from C/C++ expressions is just a 
>> potential additional benefit.
> 
> I agree that we could use this intrinsic there safely, with the "object" 
> being the variadic arguments area of the original va_list.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64128/new/

https://reviews.llvm.org/D64128



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible.

Reply via email to