On 08/28/2018 02:43 PM, Martin Sebor wrote: > On 08/27/2018 10:27 PM, Jeff Law wrote: >> On 08/27/2018 10:27 AM, Martin Sebor wrote: >>> On 08/27/2018 02:29 AM, Richard Biener wrote: >>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <l...@redhat.com> wrote: >>>>> >>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote: >>>>>> The warning suppression for -Wstringop-truncation looks for >>>>>> the next statement after a truncating strncpy to see if it >>>>>> adds a terminating nul. This only works when the next >>>>>> statement can be reached using the Gimple statement iterator >>>>>> which isn't until after gimplification. As a result, strncpy >>>>>> calls that truncate their constant argument that are being >>>>>> folded to memcpy this early get diagnosed even if they are >>>>>> followed by the nul assignment: >>>>>> >>>>>> const char s[] = "12345"; >>>>>> char d[3]; >>>>>> >>>>>> void f (void) >>>>>> { >>>>>> strncpy (d, s, sizeof d - 1); // -Wstringop-truncation >>>>>> d[sizeof d - 1] = 0; >>>>>> } >>>>>> >>>>>> To avoid the warning I propose to defer folding strncpy to >>>>>> memcpy until the pointer to the basic block the strnpy call >>>>>> is in can be used to try to reach the next statement (this >>>>>> happens as early as ccp1). I'm aware of the preference to >>>>>> fold things early but in the case of strncpy (a relatively >>>>>> rarely used function that is often misused), getting >>>>>> the warning right while folding a bit later but still fairly >>>>>> early on seems like a reasonable compromise. I fear that >>>>>> otherwise, the false positives will drive users to adopt >>>>>> other unsafe solutions (like memcpy) where these kinds of >>>>>> bugs cannot be as readily detected. >>>>>> >>>>>> Tested on x86_64-linux. >>>>>> >>>>>> Martin >>>>>> >>>>>> PS There still are outstanding cases where the warning can >>>>>> be avoided. I xfailed them in the test for now but will >>>>>> still try to get them to work for GCC 9. >>>>>> >>>>>> gcc-87028.diff >>>>>> >>>>>> >>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation >>>>>> strncpy with global variable source string >>>>>> gcc/ChangeLog: >>>>>> >>>>>> PR tree-optimization/87028 >>>>>> * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding >>>>>> when >>>>>> statement doesn't belong to a basic block. >>>>>> * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle >>>>>> MEM_REF on >>>>>> the left hand side of assignment. >>>>>> >>>>>> gcc/testsuite/ChangeLog: >>>>>> >>>>>> PR tree-optimization/87028 >>>>>> * c-c++-common/Wstringop-truncation.c: Remove xfails. >>>>>> * gcc.dg/Wstringop-truncation-5.c: New test. >>>>>> >>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c >>>>>> index 07341eb..284c2fb 100644 >>>>>> --- a/gcc/gimple-fold.c >>>>>> +++ b/gcc/gimple-fold.c >>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy >>>>>> (gimple_stmt_iterator *gsi, >>>>>> if (tree_int_cst_lt (ssize, len)) >>>>>> return false; >>>>>> >>>>>> + /* Defer warning (and folding) until the next statement in the >>>>>> basic >>>>>> + block is reachable. */ >>>>>> + if (!gimple_bb (stmt)) >>>>>> + return false; >>>>> I think you want cfun->cfg as the test here. They should be >>>>> equivalent >>>>> in practice. >>>> >>>> Please do not add 'cfun' references. Note that the next stmt is also >>>> accessible >>>> when there is no CFG. I guess the issue is that we fold this during >>>> gimplification >>>> where the next stmt is not yet "there" (but still in GENERIC)? >>>> >>>> We generally do not want to have unfolded stmts in the IL when we can >>>> avoid that >>>> which is why we fold most stmts during gimplification. We also do >>>> that because >>>> we now do less folding on GENERIC. >>>> >>>> There may be the possibility to refactor gimplification time folding >>>> to what we >>>> do during inlining - queue stmts we want to fold and perform all >>>> folding delayed. >>>> This of course means bigger compile-time due to cache effects. >>>> >>>>> >>>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c >>>>>> index d0792aa..f1988f6 100644 >>>>>> --- a/gcc/tree-ssa-strlen.c >>>>>> +++ b/gcc/tree-ssa-strlen.c >>>>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc >>>>>> (gimple_stmt_iterator gsi, tree src, tree cnt) >>>>>> && known_eq (dstoff, lhsoff) >>>>>> && operand_equal_p (dstbase, lhsbase, 0)) >>>>>> return false; >>>>>> + >>>>>> + if (code == MEM_REF >>>>>> + && TREE_CODE (lhsbase) == SSA_NAME >>>>>> + && known_eq (dstoff, lhsoff)) >>>>>> + { >>>>>> + /* Extract the referenced variable from something like >>>>>> + MEM[(char *)d_3(D) + 3B] = 0; */ >>>>>> + gimple *def = SSA_NAME_DEF_STMT (lhsbase); >>>>>> + if (gimple_nop_p (def)) >>>>>> + { >>>>>> + lhsbase = SSA_NAME_VAR (lhsbase); >>>>>> + if (lhsbase >>>>>> + && dstbase >>>>>> + && operand_equal_p (dstbase, lhsbase, 0)) >>>>>> + return false; >>>>>> + } >>>>>> + } >>>>> If you find yourself looking at SSA_NAME_VAR, you're usually >>>>> barking up >>>>> the wrong tree. It'd be easier to suggest something here if I >>>>> could see >>>>> the gimple (with virtual operands). BUt at some level what you really >>>>> want to do is make sure the base of the MEM_REF is the same as what >>>>> got >>>>> passed as the destination of the strncpy. You'd want to be testing >>>>> SSA_NAMEs in that case. >>>> >>>> Yes. Why not simply compare the SSA names? Why would it be >>>> not OK to do that when !lhsbase? >>> >>> The added code handles this case: >>> >>> void f (char *d) >>> { >>> __builtin_strncpy (d, "12345", 4); >>> d[3] = 0; >>> } >>> >>> where during forwprop we see: >>> >>> __builtin_strncpy (d_3(D), "12345", 4); >>> MEM[(char *)d_3(D) + 3B] = 0; >>> >>> The next statement after the strncpy is the assignment whose >>> lhs is the MEM_REF with a GIMPLE_NOP as an operand. There >>> is no other information in the GIMPLE_NOP that I can see to >>> tell that the operand is d_3(D) or that it's the same as >>> the strncpy argument (i.e., the PARAM_DECl d). Having to >>> do open-code this all the time seems so cumbersome -- is >>> there some API that would do this for me? (I thought >>> get_addr_base_and_unit_offset was that API but clearly in >>> this case it doesn't do what I expect -- it just returns >>> the argument.) >> >> I think you need to look harder at that MEM_REF. It references d_3. >> That's what you need to be checking. The base (d_3) is the first >> operand of the MEM_REF, the offset is the second operand of the MEM_REF. >> >> (gdb) p debug_gimple_stmt ($2) >> # .MEM_5 = VDEF <.MEM_4> >> MEM[(char *)d_3(D) + 3B] = 0; >> >> >> (gdb) p gimple_assign_lhs ($2) >> $5 = (tree_node *) 0x7ffff01a6208 >> >> (gdb) p debug_tree ($5) >> <mem_ref 0x7ffff01a6208 >> type <integer_type 0x7ffff00723f0 char public string-flag QI >> size <integer_cst 0x7ffff0059d80 constant 8> >> unit-size <integer_cst 0x7ffff0059d98 constant 1> >> align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type >> 0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max >> <integer_cst 0x7ffff0059df8 127> >> pointer_to_this <pointer_type 0x7ffff007de70>> >> >> arg:0 <ssa_name 0x7ffff0063dc8 >> type <pointer_type 0x7ffff007de70 type <integer_type >> 0x7ffff00723f0 char> >> public unsigned DI >> size <integer_cst 0x7ffff0059c90 constant 64> >> unit-size <integer_cst 0x7ffff0059ca8 constant 8> >> align:64 warn_if_not_align:0 symtab:0 alias-set -1 >> canonical-type 0x7ffff007de70 reference_to_this <reference_type >> 0x7ffff017d738>> >> visited var <parm_decl 0x7ffff01a5000 d> >> def_stmt GIMPLE_NOP >> version:3> >> arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70> >> constant 3> >> j.c:4:6 start: j.c:4:5 finish: j.c:4:8> >> >> >> Note arg:0 is the SSA_NAME d_3. And not surprising that's lhsbase: > > The d in the MEM_REF you see in the dump above is the SSA_NAME's > SSA_NAME_VAR: > > visited var <parm_decl 0x7ffff01a5000 d> > > Here's the print_node() code that prints it: > > print_node_brief (file, "var", SSA_NAME_VAR (node), indent + 4); > > There is nothing else in the MEM_REF operand that tells me that. > Why is it wrong to look at the SSA_NAME_VAR? > >> (gdb) p debug_tree (lhsbase) >> <ssa_name 0x7ffff0063dc8 >> type <pointer_type 0x7ffff007de70 >> type <integer_type 0x7ffff00723f0 char public string-flag QI >> size <integer_cst 0x7ffff0059d80 constant 8> >> unit-size <integer_cst 0x7ffff0059d98 constant 1> >> align:8 warn_if_not_align:0 symtab:0 alias-set -1 >> canonical-type 0x7ffff00723f0 precision:8 min <integer_cst >> 0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127> >> pointer_to_this <pointer_type 0x7ffff007de70>> >> public unsigned DI >> size <integer_cst 0x7ffff0059c90 constant 64> >> unit-size <integer_cst 0x7ffff0059ca8 constant 8> >> align:64 warn_if_not_align:0 symtab:0 alias-set -1 >> canonical-type 0x7ffff007de70 reference_to_this <reference_type >> 0x7ffff017d738>> >> visited var <parm_decl 0x7ffff01a5000 d> >> def_stmt GIMPLE_NOP >> version:3> >> Sadly, dstbase is the PARM_DECL for d. That's where things are going >> "wrong". > > As Richard observed, that's because get_attr_nonstring_decl() > returns the DECL that the expression refers to. It does that > because that's where it looks for attribute nonstring, and so > that the warning can mention the DECL with the attribute. > > I suppose since I'm not supposed to be using SSA_NAME_VAR > (I still don't understand why it's taboo) I'll have to avoid > using the get_attr_nonstring_decl() return value and instead > look into comparing the SSA_NAMEs. Because it's not generally useful because it has no dataflow information associated with it. SSA_NAMEs are what carry dataflow information and what you need to check if you want to know if two objects are the same.
SSA_NAME_VAR's primary use is for diagnostic messages and debugging. We do hang attributes off the _DECL node it refers to, so you can take an SSA_NAME, query its SSA_NAME_VAR if you need to check if the SSA_NAME has a particular attribute property. But if you're trying to see if two objects in the IL are the same, you need to be looking at the SSA_NAME. jeff