Re: [PATCH v4] Introduce strub: machine-independent stack scrubbing

Richard Biener Thu, 23 Nov 2023 04:09:35 -0800

On Thu, Nov 23, 2023 at 11:56 AM Alexandre Oliva <ol...@adacore.com> wrote:
>
> Hello, Richi,
>
> Thanks for the extensive review!
>
> On Nov 22, 2023, Richard Biener <richard.guent...@gmail.com> wrote:
>
> > On Mon, Nov 20, 2023 at 1:40 PM Alexandre Oliva <ol...@adacore.com> wrote:
> >>
> >> On Oct 26, 2023, Alexandre Oliva <ol...@adacore.com> wrote:
> >>
> >> >> This is a refreshed and improved version of the version posted back in
> >> >> June.  https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621936.html
> >>
> >> > Ping? https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633675.html
> >> > I'm combining the gcc/ipa-strub.cc bits from
> >> > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633526.html
> >>
> >> Ping?
> >> Retested on x86_64-linux-gnu, with and without -fstrub=all.
>
> > @@ -898,7 +899,24 @@ decl_attributes (tree *node, tree attributes, int 
> > flags,
> >        TYPE_NAME (tt) = *node;
> >      }
>
> > -  *anode = cur_and_last_decl[0];
> > +  if (*anode != cur_and_last_decl[0])
> > +    {
> > +      /* Even if !spec->function_type_required, allow the attribute
> > + handler to request the attribute to be applied to the function
> > + type, rather than to the function pointer type, by setting
> > + cur_and_last_decl[0] to the function type.  */
> > +      if (!fn_ptr_tmp
> > +  && POINTER_TYPE_P (*anode)
> > +  && TREE_TYPE (*anode) == cur_and_last_decl[0]
> > +  && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (*anode)))
> > + {
> > +  fn_ptr_tmp = TREE_TYPE (*anode);
> > +  fn_ptr_quals = TYPE_QUALS (*anode);
> > +  anode = &fn_ptr_tmp;
> > + }
> > +      *anode = cur_and_last_decl[0];
> > +    }
> > +
>
> > what is this a workaround for?
>
> For the fact that the strub attribute attaches to types, whether data or
> function types, so we can't have fn_type_req, but when it's a function
> or pointer-to-function type, we want to affect the function type, rather
> than the pointer type, when the attribute has an argument.  The argument
> names the strub mode for a function; that only applies to function
> types, never to data types.
>
> The hunk above introduces the means for the attribute handler to choose
> what to attach the attribute t.
>
> > Isn't there a suitable parsing position for placing the attribute?
>
> It's been a while, but IIRC the need for this first came up in Ada,
> where attributes can't just go anywhere, and it was further complicated
> by the fact that Ada doesn't have first-class function or procedure
> types, only access-to-them, but we needed some means for the attributes
> to apply to the function type.
>
> > +#ifndef STACK_GROWS_DOWNWARD
> > +# define STACK_TOPS GT
> > +#else
> > +# define STACK_TOPS LT
> > +#endif
>
> > according to docs this is defined to 0 or 1 so the above looks wrong
> > (it's always defined).
>
> Ugh.  Thanks, will fix.  (I'm pretty sure I had notes somewhere stating
> that stack-grows-upwards hadn't been tested, and that was for the sheer
> lack of platforms making that choice, but I hoped it wasn't that broken
> :-(
>
> > +  if (optimize < 2 || optimize_size || flag_no_inline)
> > +    return NULL_RTX;
>
> > I'm wondering about these checks in the expansions of the builtins,
> > I think this is about inline expanding or emitting a libcall, right?
>
> Yeah.
>
> > I wonder if you should use optimize_function_for_speed (cfun) instead?
> > Usually -fno-inline shouldn't affect such calls, but -fno-builtin-FOO would.
> > I have no strong opinion here though.
>
> I've occasionally wondered whether builtins were the best framework for
> these semi-internal calls.
>
> > The new builtins seem undocumented - usually those are documented
> > within extend.texi
>
> Erhm...  Weird.  I had documentation for them.
>
> (checks)
>
> No, it's there, in extend.texi, right after __builtin_stack_address.
> It's admittedly a big patch :-/
>
> > I guess placing __builtin___strub_enter calls in the code manually
> > will break in interesting ways - if that's not supposed to happen the
> > trick is to embed a space in the name of the built-in.
>
> Yeah, I was a little torn between the choices here.  On the one hand, I
> needed visible symbols for the out-of-line implementations, so I figured
> that trying to hide the builtins wouldn't bring any advantage.
>
> However, I've also designed the builtins with interfaces that would
> avoid disruption even with explicit calls.  __strub_enter and
> __strub_update only initialize or adjust a pointer handed to them.
> __strub_leave will erase things from the top of the stack to the
> pointer, so if the watermark is "active stack", nothing happens, and
> things only get cleared if it points to "unused stack space".  There's
> potential for disruption if one passes a statically-allocated pointer to
> it, but nothing much different from memsetting that memory range, core
> wars-style.
>
> > -symtab_node::reset (void)
> > +symtab_node::reset (bool preserve_comdat_group)
>
> > not sure what for, I'll leave Honza to comment.
>
> This restores the possibility of getting the pre-PR107897 behavior, that
> the strub wrapper/wrapped splitting relied on.  Conceptually, the
> original function becomes the wrapped one, and the wrapper that calls it
> is kind of an implementation detail to preserve the exposed API/ABI
> while introducing strubbing around the body, so preserving the comdat
> group makes sense.  ISTR getting strub regressions when the patch for
> PR107897 was put in, but my notes don't detail what broke, alas.
>
> > +/* Create a distinct copy of the type of NODE's function, and change
> > +   the fntype of all calls to it with the same main type to the new
> > +   type.  */
> > +
> > +static void
> > +distinctify_node_type (cgraph_node *node)
> > +{
> > +  tree old_type = TREE_TYPE (node->decl);
> > +  tree new_type = build_distinct_type_copy (old_type);
> > +  tree new_ptr_type = NULL_TREE;
> > +
> > +  /* Remap any calls to node->decl that use old_type, or a variant
> > +     thereof, to new_type as well.  We don't look for aliases, their
> > +     declarations will have their types changed independently, and
> > +     we'll adjust their fntypes then.  */
> > +  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
> > +    {
> > +      if (!e->call_stmt)
> > + continue;
> > +      tree fnaddr = gimple_call_fn (e->call_stmt);
> > +      gcc_checking_assert (TREE_CODE (fnaddr) == ADDR_EXPR
> > +   && TREE_OPERAND (fnaddr, 0) == node->decl);
> > +      if (strub_call_fntype_override_p (e->call_stmt))
> > + continue;
> > +      if (!new_ptr_type)
> > + new_ptr_type = build_pointer_type (new_type);
> > +      TREE_TYPE (fnaddr) = new_ptr_type;
> > +      gimple_call_set_fntype (e->call_stmt, new_type);
> > +    }
> > +
> > +  TREE_TYPE (node->decl) = new_type;
>
> > it does feel like there's IPA mechanisms to deal with what you are trying 
> > to do
> > here (or in the caller(s)).
>
> If there is, I couldn't find it.  There are some vaguely similar
> operations, but nothing that will modify the ABI of exported functions
> like what strub-at-calls purports to do.  A synthetic argument gets
> added to the function's interface, but since that type could conceivably
> have been associated/unified with non-strub types, or with similar types
> with different strub modes that don't undergo such transformations, we
> have to build a new type for the function, and adjust the type in the
> call graph.  IPA can do this as part of replacing calls with specialized
> clones of a function, but here the attribute calls for an adjustment to
> the function type, vaguely resembling the introduction of the implicit
> 'this' in non-static member functions in C++, but without much help from
> the front-end, and without exposing the formal to the front-end.  It
> doesn't seem to me that IPA is willing to help with that, and even if it
> were, it would require strub to become an actual IPA pass, which would
> be a huge undertaking without clear benefit.


Conceptually it shouldn't be much different from what IPA-SRA does
which is cloning a function but with different arguments, the function
signature transform described in terms of ipa-param-manipulation bits.
I've talked with Martin and at least there's currently no
by-value-to-by-reference
"transform", but IPA-SRA can pass two registers instead of one aggregate
for example.  There's IPA_PARAM_OP_NEW already to add a new param.
In principle the whole function rewriting (apart of recovering
from inlining) should be doable within this framework.

> > +unsigned int
> > +pass_ipa_strub_mode::execute (function *)
> > +{
> > +  last_cgraph_order = 0;
> > +  ipa_strub_set_mode_for_new_functions ();
> > +
> > +  /* Verify before any inlining or other transformations.  */
> > +  verify_strub ();
>
> > if  (flag_checking) verify_strub ();
>
> > please.
>
> No, no, verify_strub is not an internal consistency check, it's part of
> the implementation that verifies that strictness requirements have been
> met WRT calls requested by the user.  In strict mode, a strub function
> can only call other strub functions or functions explicitly marked as
> callable from strub contexts.  This security requirement is to be
> enforced, and diagnostics are to be issued in case of violation,
> regardless of the self-checking mode of the compiler.
>
> > I guess we talked about this last year - what's the reason to have both
> > an IPA pass and a simple IPA pass?
>
> We did.  I didn't recall all the reasons on the spot when you asked, but
> I followed up by email:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603255.html
>
> The earlier pass, that must be placed before any inlining, is the simple
> IPA one, and it only assigns modes.
>
> It must be that early because strub-enabled functions must never be
> inlined into strub-disabled ones.  So we have to determine modes before
> any sort of inlining.
>
> But we can and want to inline strub-enabled functions into other
> strub-enabled functions.  But we don't want to inline a wrapped function
> back into its wrapper, so we want to (early) inline functions before
> they're split, and we also want to inline wrappers after they're split
> if possible.
>
> So the assignment pass has to be before any inlining, and the
> wrapping/splitting (for internal strub) must be after some inlining but
> before the last attempt at inlining.
>
> > IIRC the simple IPA pass is a simple one because it wants to see
> > inlined bodies and "fixes" those up?  Some toplevel comments
> > explaining both passes in the ipa-strub.cc pass would be nice to have.
>
> ACK, will do.
>
> > I guess I also asked before - did you try it with -flto?
>
> Yeah, I've tested it extensively with -flto during development.  I've
> run the GCC testsuite with a patch equivalent to -fstrub=all enabled by
> default during development, and I still do so occasionally.  I didn't
> before the ping, but prompted by you, I did again.  Torture tests and
> specific lto tests don't seem to face any trouble.  And there's no
> reason why they should.  The spots in which the strub passes were
> introduced involved exploring where they fit best, and adding calls such
> as analyzing nodes.  There are even commented-out vestigial calls of
> inline_analyze_function, from a time when the pass was inserted at a
> different spot and that call was required.
>
> > +    /* Decide which of the wrapped function's parms we want to turn into
> > +       references to the argument passed to the wrapper.  In general,
> > we want to
> > +       copy small arguments, and avoid copying large ones.
> > Variable-sized array
> > +       lengths given by other arguments, as in 20020210-1.c, would lead to
> > +       problems if passed by value, after resetting the original function 
> > and
> > +       dropping the length computation; passing them by reference works.
> > +       DECL_BY_REFERENCE is *not* a substitute for this: it involves 
> > copying
> > +       anyway, but performed at the caller.  */
> > +    indirect_parms_t indirect_nparms (3, false);
> > ...
>
> > IMHO it's a bad idea, at least initially, to mess with the ABI in a 
> > heuristical
> > way this way.
>
> Do you realize that this is only about the wrapped ABI, used exclusively
> by the wrapper, to avoid pointlessly copying large arguments twice?  The
> wrapper already needs a custom ABI, because of the added watermark
> pointer, and va_list and whatnot when needed, so a little further
> customization to avoid a quite significant overhead seemed desirable.

I understood the purpose of this, but I also saw it's only ever needed for
non-strict operation and I think that if you care about security you'd never
want to operate in a way that you don't absolutely know the function
you call isn't going to scrub your secrets ...

But yes, I also wondered why you even run into re-gimplification issues.
If you make sure to never pass things as reference that are registers
in the callee context then replacing the PARM_DECL there with
a MEM_REF (new-PARM_DECL) should work without re-gimplification.
OK, existing MEM_REFs need folding to be canonical again, but that
should be it.

Note I didn't look into the re-gimplification code much, I just saw
it and wondered, thinking this would be the reason for it?

> > I see you need all sorts of "fixup", including re-gimplification
> > of stmts, etc., double-ugh.
>
> *nod*, I've considered going the nested-function way, enabling the
> wrapped function to access wrapper's variables directly, but (i) support
> for nested functions isn't available everywhere, and (ii) strub
> functions could already be nested functions, and the wrapped body would
> need access to both enclosing contexts, so adjustments would have to be
> made anyway.
>
> > Ideally IPA-SRA can already do parts of those
> > transforms which would mean ipa-param-manipulation has generic code
> > to encode them?
>
> You'd think so.  I tried that, run into various issues, still have
> pending patches I submitted separately to address them, but...  at some
> point I had to stop waiting for feedback and had make it work without
> that.  I figured what I was doing wasn't a good fit and moved on.
>
> Now, it would be great if IPA could enable me to split the entire body
> of a function *while* adding custom arguments, such as the watermark,
> va_list, etc.  It didn't look like it could or even wanted to, but maybe
> the presence of a useful feature like stack scrubbing could motivate the
> introduction of such infrastructure.
>
> > I realize you can't tail-call, but I wonder if some clever
> > machine-specific tricks would work, I guess it would need the wrapper
> > to be target generated of course.
>
> I'm not sure what you have in mind here, but one of the goals was to
> introduce stack scrubbing without machine-dependent code.
>
> We are considering an improvement for strub(internal) to do away with
> wrappers given machine-specific prologue/epilogue support, but that's
> not even in the roadmap, and it's not clear how that would interact with
> exceptions, which was a primary driver for the currently-proposed
> approaches.

Understood.  But I didn't remember seeing that you do sth like wrapping
each "strubbed call" inside a try { call(); } finally { do-strub } to
ensure this.
Guess it was well hidden in the large patch ;)

> > As it's only for optimization, how much of the code would go away when
> > you avoid changing the parameters of the wrapped function?
>
> We can't avoid changing them entirely, and IIRC some of the
> infrastructure for regimplification was also used for va_list and
> apply_args handling, so it wouldn't save all that much.

Ah, var-args ... indeed for simple forwarding it shouldn't be too bad.
I wonder if this were a way to remove the restriction on function
splitting of var-args functions - there's a recent bugreport about that
(before any va_arg () has been called, of course).

>  It is an
> important optimization, to avoid doubling the stack requirements for
> parameters with -fstrub=internal (because wrapper passes arguments on to
> wrapped body), and the optimization is already implemented, so it would
> be a pity to take it out, and you wouldn't even get rid of any
> significant amount of code.
>
> > Could one easily disable the optimization via a --param maybe?
>
> The param is not implemented, but guarding the loop you commented on by
> it would accomplish it.  You probably won't want to bootstrap GCC with
> -fstrub=all along with it, though ;-)
>
> > There's still much commented code and FIXMEs in, a lot of cgraph
> > related code is used which I can't review very well.
>
> Now that you mentioned it, I realize I hadn't yet dropped some of the
> preprocessed-out bits when I posted v4.  I did so afterwards.  I was
> trying to avoid posting such a monster patch again, but I guess I will
> have to post at least another version :-(
>
> But yeah, there are some wishlist items that remain, and some bits that
> could probably go away.
>
> > I fear that -fstrub is going to be actually used by people - are you up
> > to the task fixing things?
>
> I, for one, would welcome that, and I am (professionally) on the hook to
> fix it, so I'd better be up to it ;-) Building it on top of
> infrastructure I'm closely familiar with, rather than e.g. on IPA, that
> I'm not so much, is a plus in this regard.  That said, I've fixed bugs
> in GCC most everywhere, from front-ends to back-ends, so I'm pretty sure
> I've got it covered in terms of knowledge.  As for time, if it gets too
> much interest (is there such a thing? :-) I might have trouble keeping
> up (there are only so many hours in the day), and it would likely be
> wise then for others to become familiar with it.  I'd be happy to share
> knowledge, and I know I have support from my "employer" (in quotes
> because I'm not formally employed, I'm a consultant) to maintain the
> code I contribute to the community on its behalf, even having other work
> and non-work commitments.  So, bring them on, I say ;-)
>
> > +void ATTRIBUTE_STRUB_CALLABLE
> > +__strub_enter (void **watermark)
> > +{
> > +  *watermark = __builtin_frame_address (0);
>
> > in your experience, do these (or the inline version) work reliably
> > when optimizations like omitting the frame pointer or shrink-wrapping
> > happen from a security point of view?
>
> Yeah.  We're talking about the frame address for the current function
> here, and the compiler knows what it is, even if it optimizes the heck
> out of the function.  In the builtin expansion, it's the top of the
> stack instead, and the compiler also has to have a good grasp of where
> that is.  Red zones are a small complication, but not really much
> trouble here: it's a watermark we're talking about, how high the stack
> may have gotten, so we know we have to scrub no further than that.  If
> it goes too far (as it often does with red zones), we just scrub a
> little too much.  What shouldn't happen is not going far enough, but
> since we rely on the stack frame address for __strub_update to update
> the watermark, and the caller can't really misrepresent its own stack
> use without risking the callee to corrupt it, this works really well.
> If strub_update gets builtin-expanded inline, it uses the adjusted (for
> red zone) stack pointer instead, after any internal stack allocation
> (prologue and alloca), but it won't necessarily capture non-accumulated
> outgoing args.  The arguments passed to a callee are not regarded as
> part of the caller stack frame, so that should be fine.  In order to
> cover them, one must have a strub-enabled caller and a strub(at-calls)
> callee.
>
> > You are adding symbols to libgcc but I don't see you amending
> > libgcc/libgcc-std.ver.in?
>
> That's a good catch, thanks.
>
> hardcfr symbols are missing too.
>
> Will fix.
>
> > Neither the documentation in extend.texi nor the one in invoke.texi
> > mentions the target audience of -fstrub
>
> ACK, good point, despite all the detail, that is too implicit.  Will
> fix.
>
> > The patches mention the red zone, documenting the guarantees and
> > caveats would be nice to have.  I think the documentation belongs to
> > the attribute documentation.
>
> It is kind of there, but since the details vary depending on the strub
> mode, they're given where the modes are detailed, rather than as if
> applying to all strub modes.  (some of the modes are disabled and
> callable, that don't do any strubbing whatsoever)
>
> > I hope Honza can have a quick look over the IPA passes.
>
> Likewise.  I shall wait for further feedback before posting a huge v5,
> but I'll be happy to refresh the users/aoliva/heads/strub branch in the
> git repo as I adjust the patch if that helps.  Currently the patch is in
> users/aoliva/heads/testme.
>
> > a better way than taking what is percieved as a huge ugly part
> > of the patch.
>
> ISTM you're misperceiving the amount of code involved in that
> optimization.  There is a significant amount of wrapper/wrapped
> parameter manipulation that is *not* involved with the optimization, and
> that was already there before the optimization was implemented.  The
> optimization was a tiny addition over it.

I see.

Thanks,
Richard.

> Thanks again,
>
> --
> Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
>    Free Software Activist                   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [PATCH v4] Introduce strub: machine-independent stack scrubbing

Reply via email to