[llvm-branch-commits] Add pointer field protection feature. (PR #133538)

Oliver Hunt via llvm-branch-commits Thu, 15 May 2025 02:06:36 -0700

ojhunt wrote:

> Hi Oliver, thanks for your comments! I'll address them below.
> 
> > Thoughts:
> > This should be opt-in on a field or struct granularity, not just a global 
> > behavior.
> 
> This would certainly be easier if it were an opt-in behavior, as it would 
> allow avoiding a substantial amount of complexity that you point out below. 
> The problem is that would not make the solution very effective as a UAF 
> mitigation, as it would only help with code that was opted in, which is 
> likely a tiny fraction of an entire codebase. That fraction is unlikely to be 
> the fraction with UAF bugs, not only because of the size of the fraction but 
> also because if developers aren't thinking about memory safety issues, they 
> certainly won't be manually opting into this. With PFP, the problem that I 
> set out to solve was to find a way to automatically protect as many pointers 
> stored in memory as possible in standards-conforming code.
>


There are a significant number of standard layout types that contain 
meaningfully powerful data if they are compromised, which leads to saying the 
standard layout types should also be covered.

That said, there's a significant cost to applying pointer protection 
universally though. The cost of existing mitigations rely on them not being 
global properties as they can saturate the required compute resources.

I would be interested in a performance comparison of enumerating struct fields 
as you increase the number of fields being accessed just to see if there's a 
point that perf starts to collapse - I suspect there would be, but if the 
number of concurrent (from the cpu pov, not threading) fields accesses to get 
to that point is high enough it may not matter. Similar tests for increasing 
levels of dependent loads are also probably worth while. The intent is to make 
this behavior at least partially universal (all pointers in objects that are 
not standard layout), which means that extrapolating from individual loads, or 
loads from the same addresses, are not necessarily representative.

In the RFC you talk about the time and/or cycles required to perform 
operations, but that's the best case, where the required units are not 
saturated. This gate on performance isn't unique to pointer protections: every 
operation consumes some amount of cpu resources, once those resources are 
saturated the pipeline stalls, so it's necessary to compare performance when 
many of these operations are happening in succession.

Now there's some caching (saying the tag remains constant, etc) you might be 
able to do, but that's dependent on being willing to say "this is a software 
mitigation, so we're ok if we don't catch everything" - these are things that a 
cpu can mitigate in hardware via direct knowledge of what is happening on a 
given core, and peeking on the memory bus. (There are also performance costs 
when you're accessing the same memory on different cores, but that goes so bad 
so quickly even in normal cases I'm not concerned about that yet).

> > In the RFC I think you mentioned not applying PFP to C types, but I'm 
> > unsure how you're deciding what is a C type?
> 
> This is based on whether the type has standard layout. C does not permit 
> declaring a type that is non-standard layout.

The issue here is that there are many cases where standard layout structs 
contain important data, which is why I'm not 100% sold on excluding C types 
from this.

> > There are a lot of special cases being added in places that should not need 
> > to be aware of PFP - the core type queries should be returning correct 
> > values for types containing PFP fields.
> 
> Agreed on type queries. The current adjustment to how trivially-relocatable 
> is defined is a workaround that will be removed once P2786 is fully 
> implemented in libc++, and at that point we shouldn't need 
> `__has_non_relocatable_fields` either. The intent is that turning on PFP 
> should be invisible to the developer (in other words, the modification to the 
> storage format of pointer fields in certain structs is permitted by the as-if 
> rule). As a result, most of the implementation is in CodeGen and below.

My point was that this should not have a separate attempt to make trivial 
relocation work, it should make sure the type queries report the correct 
information, and the implementation of trivially_relocate agrees with that 
definition. Also trivial relocation isn't meaningfully supported or used 
anywhere yet - it's a new feature - so I don't think you should be prioritizing 
supporting that yet: just make sure that the (internal) trait queries return 
correct/consistent results for what can be handled, and worry about optimizing 
that later.

> 
> > A lot of the "this read/write/copy/init special work" exactly aligns with 
> > the constraints of pointer auth, and I think the overall implementation 
> > could be made better by introducing the concept of a 
> > `HardenedPointerQualifier` or similar, that could be either 
> > PointerAuthQualifier or PFPQualifier. In principle this could even just be 
> > treated as a different PointerAuth key set.
> > That approach would also help with some of the problems you have with 
> > offsetof and pointers to PFP fields - the thing that causes the problems 
> > with the pointer to a PFP field is that the PFP schema for a given field is 
> > in reality part of the type, so given
> > ```c++
> > struct A {
> >    void *field1;
> >    void *field2;
> > };
> > ```
> > 
> > 
> > It looks you are trying to maintain the idea that `A::field1` and 
> > `A::field2` have the same type, which makes this code valid according to 
> > the type system:
> > ```c++
> > void f(void** obj);
> > void g(A *a) {
> >   f(&a->field1);
> >   f(&a->field2);
> > }
> > ```
> > 
> > but the real types are not the same. The semantic equivalent under pointer 
> > auth is something akin to
> > ```c++
> > struct A {
> >    void * __ptrauth(1, 1, hash("A::field1")) field1;
> >    void * __ptrauth(1, 1, hash("A::field2")) field2;
> > };
> > ```
> > 
> > Which makes it very clear that the types are different, and I think trying 
> > to pretend they are not is part of what is causing problems.
> 
> In the early stages of the project, I wanted to make it so that the PFP 
> fields would implicitly receive qualifiers, similar to what you proposed, and 
> I was trying to come up with an approach that would make it work, but I 
> couldn't see a way because of the inter-convertibility of pointers from 
> different structs (and pointers outside of structs entirely), and disallowing 
> conversions between differently qualified pointer types would render the 
> compiler unable to compile standard C++. The only way that I could see to 
> make it work was to utilize the as-if rule and invisibly modify the in-memory 
> representation.
> 

Can you provide a code example of the conversion case you're describing? I'm 
not sure I understand what case you're concerned about. If the qualifiers you 
placed implicitly on a field disagree, then the alternative of computing a 
schema later would also disagree. The semantics of implicitly adding a 
qualifier or attribute to carry the schema vs computing the schema upon use 
should not impact what schema you end up with, because both cases are 
describing the same field.

> For pointers in standard-layout types (i.e. pointers that explicitly opt in) 
> I agree with you that the right approach would be to use a qualifier, but I 
> think that should be done as a separate change, possibly as a followup. In 
> this change, I would like to implement the solution for implicitly protected 
> fields.

The problem you get there is similar to what pointer auth gets with function 
pointers: the authentication is present in some cases but not in others, so we 
end up with a function on ASTContext that looks first for an explicit policy, 
and then enumerates the various ways an implicit schema might apply. It's not 
great, and can result in errors where the correct schema is not used.

> 
> > The more I think about it, the more I feel that this could be implemented 
> > almost entirely on top of the existing pointer auth work.
> > Currently for pointer auth there's a set of ARM keys specified, and I think 
> > you just need to create a different set, PFP_Keys or similar, and set up 
> > the appropriate schemas (basically null schemas for all the existing 
> > cases), and add a new schema: DefaultFieldSchema or similar, and apply that 
> > schema to every pointer field in an object unless there's an explicit 
> > qualifier applied already.
> > Then it would in principle just be a matter of making adding the 
> > appropriate logic to the ptrauth intrinsics in llvm.
> > Now there are some moderately large caveats to that idea
> > 
> > * the existing ptrauth semantics simply say that any struct containing 
> > address discriminated values is no-longer a pod type and so gets copied 
> > through the stack, which is something you're trying to avoid, so you'd 
> > still need to keep that logic (though as I said, that should be done by 
> > generating and then calling the functions to do rather than regenerating 
> > the entire copy repeatedly).
> > * the way that pointer auth is currently implemented assumes that there is 
> > only a single pointer auth abi active so you wouldn't be able to trivially 
> > have both pointer auth and PFP live at the same time. That's what makes me 
> > think having a SecuredPointer abstraction might be a batter approach, but 
> > it might also not be too difficult to update the PointerAuthQualifier to 
> > include the ABI being used -- I'm really not sure.
> 
> Yes, something like that is how I would imagine implementing this for opt-in 
> fields.

What I mean is that in an environment where you enable the behavior globally, 
you would simply apply the qualifier (or attribute) implicitly to all relevant 
pointer fields, computing the exact same schema you currently compute, only you 
do it at the point of declaration. In both cases all fields for which PFP 
applies will have the correct schema attached, and all the changes is the 
ration of "has PFP" to "does not have PFP"

Rather than going back and forth in review, I'm also happy to chat in discord :D



https://github.com/llvm/llvm-project/pull/133538
_______________________________________________
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] Add pointer field protection feature. (PR #133538)

Reply via email to