On Thu, Nov 23, 2023 at 11:56 AM Alexandre Oliva <ol...@adacore.com> wrote: > > Hello, Richi, > > Thanks for the extensive review! > > On Nov 22, 2023, Richard Biener <richard.guent...@gmail.com> wrote: > > > On Mon, Nov 20, 2023 at 1:40 PM Alexandre Oliva <ol...@adacore.com> wrote: > >> > >> On Oct 26, 2023, Alexandre Oliva <ol...@adacore.com> wrote: > >> > >> >> This is a refreshed and improved version of the version posted back in > >> >> June. https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621936.html > >> > >> > Ping? https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633675.html > >> > I'm combining the gcc/ipa-strub.cc bits from > >> > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633526.html > >> > >> Ping? > >> Retested on x86_64-linux-gnu, with and without -fstrub=all. > > > @@ -898,7 +899,24 @@ decl_attributes (tree *node, tree attributes, int > > flags, > > TYPE_NAME (tt) = *node; > > } > > > - *anode = cur_and_last_decl[0]; > > + if (*anode != cur_and_last_decl[0]) > > + { > > + /* Even if !spec->function_type_required, allow the attribute > > + handler to request the attribute to be applied to the function > > + type, rather than to the function pointer type, by setting > > + cur_and_last_decl[0] to the function type. */ > > + if (!fn_ptr_tmp > > + && POINTER_TYPE_P (*anode) > > + && TREE_TYPE (*anode) == cur_and_last_decl[0] > > + && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (*anode))) > > + { > > + fn_ptr_tmp = TREE_TYPE (*anode); > > + fn_ptr_quals = TYPE_QUALS (*anode); > > + anode = &fn_ptr_tmp; > > + } > > + *anode = cur_and_last_decl[0]; > > + } > > + > > > what is this a workaround for? > > For the fact that the strub attribute attaches to types, whether data or > function types, so we can't have fn_type_req, but when it's a function > or pointer-to-function type, we want to affect the function type, rather > than the pointer type, when the attribute has an argument. The argument > names the strub mode for a function; that only applies to function > types, never to data types. > > The hunk above introduces the means for the attribute handler to choose > what to attach the attribute t. > > > Isn't there a suitable parsing position for placing the attribute? > > It's been a while, but IIRC the need for this first came up in Ada, > where attributes can't just go anywhere, and it was further complicated > by the fact that Ada doesn't have first-class function or procedure > types, only access-to-them, but we needed some means for the attributes > to apply to the function type. > > > +#ifndef STACK_GROWS_DOWNWARD > > +# define STACK_TOPS GT > > +#else > > +# define STACK_TOPS LT > > +#endif > > > according to docs this is defined to 0 or 1 so the above looks wrong > > (it's always defined). > > Ugh. Thanks, will fix. (I'm pretty sure I had notes somewhere stating > that stack-grows-upwards hadn't been tested, and that was for the sheer > lack of platforms making that choice, but I hoped it wasn't that broken > :-( > > > + if (optimize < 2 || optimize_size || flag_no_inline) > > + return NULL_RTX; > > > I'm wondering about these checks in the expansions of the builtins, > > I think this is about inline expanding or emitting a libcall, right? > > Yeah. > > > I wonder if you should use optimize_function_for_speed (cfun) instead? > > Usually -fno-inline shouldn't affect such calls, but -fno-builtin-FOO would. > > I have no strong opinion here though. > > I've occasionally wondered whether builtins were the best framework for > these semi-internal calls. > > > The new builtins seem undocumented - usually those are documented > > within extend.texi > > Erhm... Weird. I had documentation for them. > > (checks) > > No, it's there, in extend.texi, right after __builtin_stack_address. > It's admittedly a big patch :-/ > > > I guess placing __builtin___strub_enter calls in the code manually > > will break in interesting ways - if that's not supposed to happen the > > trick is to embed a space in the name of the built-in. > > Yeah, I was a little torn between the choices here. On the one hand, I > needed visible symbols for the out-of-line implementations, so I figured > that trying to hide the builtins wouldn't bring any advantage. > > However, I've also designed the builtins with interfaces that would > avoid disruption even with explicit calls. __strub_enter and > __strub_update only initialize or adjust a pointer handed to them. > __strub_leave will erase things from the top of the stack to the > pointer, so if the watermark is "active stack", nothing happens, and > things only get cleared if it points to "unused stack space". There's > potential for disruption if one passes a statically-allocated pointer to > it, but nothing much different from memsetting that memory range, core > wars-style. > > > -symtab_node::reset (void) > > +symtab_node::reset (bool preserve_comdat_group) > > > not sure what for, I'll leave Honza to comment. > > This restores the possibility of getting the pre-PR107897 behavior, that > the strub wrapper/wrapped splitting relied on. Conceptually, the > original function becomes the wrapped one, and the wrapper that calls it > is kind of an implementation detail to preserve the exposed API/ABI > while introducing strubbing around the body, so preserving the comdat > group makes sense. ISTR getting strub regressions when the patch for > PR107897 was put in, but my notes don't detail what broke, alas. > > > +/* Create a distinct copy of the type of NODE's function, and change > > + the fntype of all calls to it with the same main type to the new > > + type. */ > > + > > +static void > > +distinctify_node_type (cgraph_node *node) > > +{ > > + tree old_type = TREE_TYPE (node->decl); > > + tree new_type = build_distinct_type_copy (old_type); > > + tree new_ptr_type = NULL_TREE; > > + > > + /* Remap any calls to node->decl that use old_type, or a variant > > + thereof, to new_type as well. We don't look for aliases, their > > + declarations will have their types changed independently, and > > + we'll adjust their fntypes then. */ > > + for (cgraph_edge *e = node->callers; e; e = e->next_caller) > > + { > > + if (!e->call_stmt) > > + continue; > > + tree fnaddr = gimple_call_fn (e->call_stmt); > > + gcc_checking_assert (TREE_CODE (fnaddr) == ADDR_EXPR > > + && TREE_OPERAND (fnaddr, 0) == node->decl); > > + if (strub_call_fntype_override_p (e->call_stmt)) > > + continue; > > + if (!new_ptr_type) > > + new_ptr_type = build_pointer_type (new_type); > > + TREE_TYPE (fnaddr) = new_ptr_type; > > + gimple_call_set_fntype (e->call_stmt, new_type); > > + } > > + > > + TREE_TYPE (node->decl) = new_type; > > > it does feel like there's IPA mechanisms to deal with what you are trying > > to do > > here (or in the caller(s)). > > If there is, I couldn't find it. There are some vaguely similar > operations, but nothing that will modify the ABI of exported functions > like what strub-at-calls purports to do. A synthetic argument gets > added to the function's interface, but since that type could conceivably > have been associated/unified with non-strub types, or with similar types > with different strub modes that don't undergo such transformations, we > have to build a new type for the function, and adjust the type in the > call graph. IPA can do this as part of replacing calls with specialized > clones of a function, but here the attribute calls for an adjustment to > the function type, vaguely resembling the introduction of the implicit > 'this' in non-static member functions in C++, but without much help from > the front-end, and without exposing the formal to the front-end. It > doesn't seem to me that IPA is willing to help with that, and even if it > were, it would require strub to become an actual IPA pass, which would > be a huge undertaking without clear benefit.
Conceptually it shouldn't be much different from what IPA-SRA does which is cloning a function but with different arguments, the function signature transform described in terms of ipa-param-manipulation bits. I've talked with Martin and at least there's currently no by-value-to-by-reference "transform", but IPA-SRA can pass two registers instead of one aggregate for example. There's IPA_PARAM_OP_NEW already to add a new param. In principle the whole function rewriting (apart of recovering from inlining) should be doable within this framework. > > +unsigned int > > +pass_ipa_strub_mode::execute (function *) > > +{ > > + last_cgraph_order = 0; > > + ipa_strub_set_mode_for_new_functions (); > > + > > + /* Verify before any inlining or other transformations. */ > > + verify_strub (); > > > if (flag_checking) verify_strub (); > > > please. > > No, no, verify_strub is not an internal consistency check, it's part of > the implementation that verifies that strictness requirements have been > met WRT calls requested by the user. In strict mode, a strub function > can only call other strub functions or functions explicitly marked as > callable from strub contexts. This security requirement is to be > enforced, and diagnostics are to be issued in case of violation, > regardless of the self-checking mode of the compiler. > > > I guess we talked about this last year - what's the reason to have both > > an IPA pass and a simple IPA pass? > > We did. I didn't recall all the reasons on the spot when you asked, but > I followed up by email: > https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603255.html > > The earlier pass, that must be placed before any inlining, is the simple > IPA one, and it only assigns modes. > > It must be that early because strub-enabled functions must never be > inlined into strub-disabled ones. So we have to determine modes before > any sort of inlining. > > But we can and want to inline strub-enabled functions into other > strub-enabled functions. But we don't want to inline a wrapped function > back into its wrapper, so we want to (early) inline functions before > they're split, and we also want to inline wrappers after they're split > if possible. > > So the assignment pass has to be before any inlining, and the > wrapping/splitting (for internal strub) must be after some inlining but > before the last attempt at inlining. > > > IIRC the simple IPA pass is a simple one because it wants to see > > inlined bodies and "fixes" those up? Some toplevel comments > > explaining both passes in the ipa-strub.cc pass would be nice to have. > > ACK, will do. > > > I guess I also asked before - did you try it with -flto? > > Yeah, I've tested it extensively with -flto during development. I've > run the GCC testsuite with a patch equivalent to -fstrub=all enabled by > default during development, and I still do so occasionally. I didn't > before the ping, but prompted by you, I did again. Torture tests and > specific lto tests don't seem to face any trouble. And there's no > reason why they should. The spots in which the strub passes were > introduced involved exploring where they fit best, and adding calls such > as analyzing nodes. There are even commented-out vestigial calls of > inline_analyze_function, from a time when the pass was inserted at a > different spot and that call was required. > > > + /* Decide which of the wrapped function's parms we want to turn into > > + references to the argument passed to the wrapper. In general, > > we want to > > + copy small arguments, and avoid copying large ones. > > Variable-sized array > > + lengths given by other arguments, as in 20020210-1.c, would lead to > > + problems if passed by value, after resetting the original function > > and > > + dropping the length computation; passing them by reference works. > > + DECL_BY_REFERENCE is *not* a substitute for this: it involves > > copying > > + anyway, but performed at the caller. */ > > + indirect_parms_t indirect_nparms (3, false); > > ... > > > IMHO it's a bad idea, at least initially, to mess with the ABI in a > > heuristical > > way this way. > > Do you realize that this is only about the wrapped ABI, used exclusively > by the wrapper, to avoid pointlessly copying large arguments twice? The > wrapper already needs a custom ABI, because of the added watermark > pointer, and va_list and whatnot when needed, so a little further > customization to avoid a quite significant overhead seemed desirable. I understood the purpose of this, but I also saw it's only ever needed for non-strict operation and I think that if you care about security you'd never want to operate in a way that you don't absolutely know the function you call isn't going to scrub your secrets ... But yes, I also wondered why you even run into re-gimplification issues. If you make sure to never pass things as reference that are registers in the callee context then replacing the PARM_DECL there with a MEM_REF (new-PARM_DECL) should work without re-gimplification. OK, existing MEM_REFs need folding to be canonical again, but that should be it. Note I didn't look into the re-gimplification code much, I just saw it and wondered, thinking this would be the reason for it? > > I see you need all sorts of "fixup", including re-gimplification > > of stmts, etc., double-ugh. > > *nod*, I've considered going the nested-function way, enabling the > wrapped function to access wrapper's variables directly, but (i) support > for nested functions isn't available everywhere, and (ii) strub > functions could already be nested functions, and the wrapped body would > need access to both enclosing contexts, so adjustments would have to be > made anyway. > > > Ideally IPA-SRA can already do parts of those > > transforms which would mean ipa-param-manipulation has generic code > > to encode them? > > You'd think so. I tried that, run into various issues, still have > pending patches I submitted separately to address them, but... at some > point I had to stop waiting for feedback and had make it work without > that. I figured what I was doing wasn't a good fit and moved on. > > Now, it would be great if IPA could enable me to split the entire body > of a function *while* adding custom arguments, such as the watermark, > va_list, etc. It didn't look like it could or even wanted to, but maybe > the presence of a useful feature like stack scrubbing could motivate the > introduction of such infrastructure. > > > I realize you can't tail-call, but I wonder if some clever > > machine-specific tricks would work, I guess it would need the wrapper > > to be target generated of course. > > I'm not sure what you have in mind here, but one of the goals was to > introduce stack scrubbing without machine-dependent code. > > We are considering an improvement for strub(internal) to do away with > wrappers given machine-specific prologue/epilogue support, but that's > not even in the roadmap, and it's not clear how that would interact with > exceptions, which was a primary driver for the currently-proposed > approaches. Understood. But I didn't remember seeing that you do sth like wrapping each "strubbed call" inside a try { call(); } finally { do-strub } to ensure this. Guess it was well hidden in the large patch ;) > > As it's only for optimization, how much of the code would go away when > > you avoid changing the parameters of the wrapped function? > > We can't avoid changing them entirely, and IIRC some of the > infrastructure for regimplification was also used for va_list and > apply_args handling, so it wouldn't save all that much. Ah, var-args ... indeed for simple forwarding it shouldn't be too bad. I wonder if this were a way to remove the restriction on function splitting of var-args functions - there's a recent bugreport about that (before any va_arg () has been called, of course). > It is an > important optimization, to avoid doubling the stack requirements for > parameters with -fstrub=internal (because wrapper passes arguments on to > wrapped body), and the optimization is already implemented, so it would > be a pity to take it out, and you wouldn't even get rid of any > significant amount of code. > > > Could one easily disable the optimization via a --param maybe? > > The param is not implemented, but guarding the loop you commented on by > it would accomplish it. You probably won't want to bootstrap GCC with > -fstrub=all along with it, though ;-) > > > There's still much commented code and FIXMEs in, a lot of cgraph > > related code is used which I can't review very well. > > Now that you mentioned it, I realize I hadn't yet dropped some of the > preprocessed-out bits when I posted v4. I did so afterwards. I was > trying to avoid posting such a monster patch again, but I guess I will > have to post at least another version :-( > > But yeah, there are some wishlist items that remain, and some bits that > could probably go away. > > > I fear that -fstrub is going to be actually used by people - are you up > > to the task fixing things? > > I, for one, would welcome that, and I am (professionally) on the hook to > fix it, so I'd better be up to it ;-) Building it on top of > infrastructure I'm closely familiar with, rather than e.g. on IPA, that > I'm not so much, is a plus in this regard. That said, I've fixed bugs > in GCC most everywhere, from front-ends to back-ends, so I'm pretty sure > I've got it covered in terms of knowledge. As for time, if it gets too > much interest (is there such a thing? :-) I might have trouble keeping > up (there are only so many hours in the day), and it would likely be > wise then for others to become familiar with it. I'd be happy to share > knowledge, and I know I have support from my "employer" (in quotes > because I'm not formally employed, I'm a consultant) to maintain the > code I contribute to the community on its behalf, even having other work > and non-work commitments. So, bring them on, I say ;-) > > > +void ATTRIBUTE_STRUB_CALLABLE > > +__strub_enter (void **watermark) > > +{ > > + *watermark = __builtin_frame_address (0); > > > in your experience, do these (or the inline version) work reliably > > when optimizations like omitting the frame pointer or shrink-wrapping > > happen from a security point of view? > > Yeah. We're talking about the frame address for the current function > here, and the compiler knows what it is, even if it optimizes the heck > out of the function. In the builtin expansion, it's the top of the > stack instead, and the compiler also has to have a good grasp of where > that is. Red zones are a small complication, but not really much > trouble here: it's a watermark we're talking about, how high the stack > may have gotten, so we know we have to scrub no further than that. If > it goes too far (as it often does with red zones), we just scrub a > little too much. What shouldn't happen is not going far enough, but > since we rely on the stack frame address for __strub_update to update > the watermark, and the caller can't really misrepresent its own stack > use without risking the callee to corrupt it, this works really well. > If strub_update gets builtin-expanded inline, it uses the adjusted (for > red zone) stack pointer instead, after any internal stack allocation > (prologue and alloca), but it won't necessarily capture non-accumulated > outgoing args. The arguments passed to a callee are not regarded as > part of the caller stack frame, so that should be fine. In order to > cover them, one must have a strub-enabled caller and a strub(at-calls) > callee. > > > You are adding symbols to libgcc but I don't see you amending > > libgcc/libgcc-std.ver.in? > > That's a good catch, thanks. > > hardcfr symbols are missing too. > > Will fix. > > > Neither the documentation in extend.texi nor the one in invoke.texi > > mentions the target audience of -fstrub > > ACK, good point, despite all the detail, that is too implicit. Will > fix. > > > The patches mention the red zone, documenting the guarantees and > > caveats would be nice to have. I think the documentation belongs to > > the attribute documentation. > > It is kind of there, but since the details vary depending on the strub > mode, they're given where the modes are detailed, rather than as if > applying to all strub modes. (some of the modes are disabled and > callable, that don't do any strubbing whatsoever) > > > I hope Honza can have a quick look over the IPA passes. > > Likewise. I shall wait for further feedback before posting a huge v5, > but I'll be happy to refresh the users/aoliva/heads/strub branch in the > git repo as I adjust the patch if that helps. Currently the patch is in > users/aoliva/heads/testme. > > > a better way than taking what is percieved as a huge ugly part > > of the patch. > > ISTM you're misperceiving the amount of code involved in that > optimization. There is a significant amount of wrapper/wrapped > parameter manipulation that is *not* involved with the optimization, and > that was already there before the optimization was implemented. The > optimization was a tiny addition over it. I see. Thanks, Richard. > Thanks again, > > -- > Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ > Free Software Activist GNU Toolchain Engineer > More tolerance and less prejudice are key for inclusion and diversity > Excluding neuro-others for not behaving ""normal"" is *not* inclusive