On Mon, 7 Jun 2021, Qing Zhao wrote: > (Kees, can you answer one of Richard’s question below? On the reason to > initialize padding of structures) > > Richard, > > > On Jun 7, 2021, at 2:48 AM, Richard Biener > <rguent...@suse.de<mailto:rguent...@suse.de>> wrote: > > Meh - can you try using a mailer that does proper quoting? It's difficult > to spot your added comments. Will try anyway (and sorry for the delay) > > Only the email replied to gcc-patch alias had this issue, all the other > emails I sent are fine. Not sure why?
All your mails have this problem for me, it makes it quite difficult to follow the conversation. > Both clang and my patch add initialization to the above auto variable “line”. > > So, I have the following questions need help: > > 1. Do we need to exclude C++ class with ctor from auto initialization? > > 2. I see Clang use call to internal memset to initialize such class, but for > my patch, I only initialize the data fields inside this class. > Which one is better? > > I can't answer either question, but generally using block-initialization > (for example via memset, but we'd generally prefer X = {}) is better for > later optimization. > > Okay. So, Is this he same reason as lowering the call to .DEFFERED_INIT > through expand_builtin_memset other than expand_assign? Yes, more efficient code generated and more efficient code generation. > seeing this, can you explain why using .DEFERRED_INIT does not > work for VLAs? > > The major reason for going different routes for VLAs vs. no-VLAs is: > > In the original gimplification phase, VLAs and no-VLAs go different routes. > I just followed the different routes for them: > > In “gimplify_decl_expr”, VLA goes to “gimplify_vla_decl”, and is expanded to > call to alloca. Naturally, I add calls to “memset/memcpy” in > “gimplify_vla_decl” to > Initialize it. > > On the other hand, no-VLAs are handled differently in “gimplify_decl_expr”, so > I added calls to “.DEFFERED_INIT” to initialize them. > > What’s the major issue if I add calls to “memset/memcpy” in > “gimplify_vla_decl” to > Initialize VLAs? > > Just inconsistency and unexpected different behavior with respect to > uninitialized warnings? > > Okay. > Will try to initialize VLA through the call to .DEFFERED_INIT too. And see > whether there is any issue with it. Thanks. > > @@ -5001,6 +5185,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq > *pre_p, gimple_seq *post_p, > /* If a single access to the target must be ensured and all > elements > are zero, then it's optimal to clear whatever their number. > */ > cleared = true; > + else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED > + && !TREE_STATIC (object) > + && type_has_padding (type)) > + /* If the user requests to initialize automatic variables with > + paddings inside the type, we should initialize the paddings > too. > + C guarantees that brace-init with fewer initializers than > members > + aggregate will initialize the rest of the aggregate as-if it > were > + static initialization. In turn static initialization > guarantees > + that pad is initialized to zero bits. > + So, it's better to clear the whole record under such > situation. */ > + cleared = true; > > so here we have padding as well - I think this warrants to be controlled > by an extra option? And we can maybe split this out to a separate > patch? (the whole padding stuff) > > Clang does the padding initialization with this option, shall we be > consistent with Clang? > > Just for the sake of consistency? No. Is there a technical reason > for this complication? Say we have > > struct { short s; int i; } a; > > what's the technical reason to initialize the padding? I might > be tempted to use -ftrivial-auto-init but I'd definitely don't > want to spend cycles/instructions initializing the padding in the > above struct. > > Kees, could you please answer this question? What’s the major reason to > initialize padding > of structures from the security point of view? > > > At this point I also wonder whether doing the actual initialization > by block-initializing the current function frame at allocation > time. > > Which phase is for “allocation time”, please point me to the specific phase > and source file. I actually don't know exactly but it would be the function prologue stack adjustment (that would also cover spill slots if they are accumulated), maybe config/i386/i386.c:pro_epilogue_adjust_stack. Maybe it can be hooked into the -fstack-clash-protection code as well by changing the probe sequence to a zeroing/patterning sequence. As said, it was just a thought - zeroing/patterning of auto vars with not needing to respect object boundaries can be more efficient for example on x86 it could be just a single rep movq; Richard.