On 08/28/2018 02:43 PM, Martin Sebor wrote:
> On 08/27/2018 10:27 PM, Jeff Law wrote:
>> On 08/27/2018 10:27 AM, Martin Sebor wrote:
>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <l...@redhat.com> wrote:
>>>>>
>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>>> the next statement after a truncating strncpy to see if it
>>>>>> adds a terminating nul.  This only works when the next
>>>>>> statement can be reached using the Gimple statement iterator
>>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>>> calls that truncate their constant argument that are being
>>>>>> folded to memcpy this early get diagnosed even if they are
>>>>>> followed by the nul assignment:
>>>>>>
>>>>>>   const char s[] = "12345";
>>>>>>   char d[3];
>>>>>>
>>>>>>   void f (void)
>>>>>>   {
>>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>>     d[sizeof d - 1] = 0;
>>>>>>   }
>>>>>>
>>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>>> is in can be used to try to reach the next statement (this
>>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>>> fold things early but in the case of strncpy (a relatively
>>>>>> rarely used function that is often misused), getting
>>>>>> the warning right while folding a bit later but still fairly
>>>>>> early on seems like a reasonable compromise.  I fear that
>>>>>> otherwise, the false positives will drive users to adopt
>>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>>> bugs cannot be as readily detected.
>>>>>>
>>>>>> Tested on x86_64-linux.
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> PS There still are outstanding cases where the warning can
>>>>>> be avoided.  I xfailed them in the test for now but will
>>>>>> still try to get them to work for GCC 9.
>>>>>>
>>>>>> gcc-87028.diff
>>>>>>
>>>>>>
>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>>>> strncpy with global variable source string
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>>       PR tree-optimization/87028
>>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding
>>>>>> when
>>>>>>       statement doesn't belong to a basic block.
>>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
>>>>>> MEM_REF on
>>>>>>       the left hand side of assignment.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>>       PR tree-optimization/87028
>>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>>
>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>>> index 07341eb..284c2fb 100644
>>>>>> --- a/gcc/gimple-fold.c
>>>>>> +++ b/gcc/gimple-fold.c
>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>>>> (gimple_stmt_iterator *gsi,
>>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>>      return false;
>>>>>>
>>>>>> +  /* Defer warning (and folding) until the next statement in the
>>>>>> basic
>>>>>> +     block is reachable.  */
>>>>>> +  if (!gimple_bb (stmt))
>>>>>> +    return false;
>>>>> I think you want cfun->cfg as the test here.  They should be
>>>>> equivalent
>>>>> in practice.
>>>>
>>>> Please do not add 'cfun' references.  Note that the next stmt is also
>>>> accessible
>>>> when there is no CFG.  I guess the issue is that we fold this during
>>>> gimplification
>>>> where the next stmt is not yet "there" (but still in GENERIC)?
>>>>
>>>> We generally do not want to have unfolded stmts in the IL when we can
>>>> avoid that
>>>> which is why we fold most stmts during gimplification.  We also do
>>>> that because
>>>> we now do less folding on GENERIC.
>>>>
>>>> There may be the possibility to refactor gimplification time folding
>>>> to what we
>>>> do during inlining - queue stmts we want to fold and perform all
>>>> folding delayed.
>>>> This of course means bigger compile-time due to cache effects.
>>>>
>>>>>
>>>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>>>>> index d0792aa..f1988f6 100644
>>>>>> --- a/gcc/tree-ssa-strlen.c
>>>>>> +++ b/gcc/tree-ssa-strlen.c
>>>>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
>>>>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
>>>>>>         && known_eq (dstoff, lhsoff)
>>>>>>         && operand_equal_p (dstbase, lhsbase, 0))
>>>>>>       return false;
>>>>>> +
>>>>>> +      if (code == MEM_REF
>>>>>> +       && TREE_CODE (lhsbase) == SSA_NAME
>>>>>> +       && known_eq (dstoff, lhsoff))
>>>>>> +     {
>>>>>> +       /* Extract the referenced variable from something like
>>>>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
>>>>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>>>>>> +       if (gimple_nop_p (def))
>>>>>> +         {
>>>>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
>>>>>> +           if (lhsbase
>>>>>> +               && dstbase
>>>>>> +               && operand_equal_p (dstbase, lhsbase, 0))
>>>>>> +             return false;
>>>>>> +         }
>>>>>> +     }
>>>>> If you find yourself looking at SSA_NAME_VAR, you're usually
>>>>> barking up
>>>>> the wrong tree.  It'd be easier to suggest something here if I
>>>>> could see
>>>>> the gimple (with virtual operands).  BUt at some level what you really
>>>>> want to do is make sure the base of the MEM_REF is the same as what
>>>>> got
>>>>> passed as the destination of the strncpy.  You'd want to be testing
>>>>> SSA_NAMEs in that case.
>>>>
>>>> Yes.  Why not simply compare the SSA names?  Why would it be
>>>> not OK to do that when !lhsbase?
>>>
>>> The added code handles this case:
>>>
>>>   void f (char *d)
>>>   {
>>>     __builtin_strncpy (d, "12345", 4);
>>>     d[3] = 0;
>>>   }
>>>
>>> where during forwprop we see:
>>>
>>>   __builtin_strncpy (d_3(D), "12345", 4);
>>>   MEM[(char *)d_3(D) + 3B] = 0;
>>>
>>> The next statement after the strncpy is the assignment whose
>>> lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
>>> is no other information in the GIMPLE_NOP that I can see to
>>> tell that the operand is d_3(D) or that it's the same as
>>> the strncpy argument (i.e., the PARAM_DECl d).  Having to
>>> do open-code this all the time seems so cumbersome -- is
>>> there some API that would do this for me?  (I thought
>>> get_addr_base_and_unit_offset was that API but clearly in
>>> this case it doesn't do what I expect -- it just returns
>>> the argument.)
>>
>> I think you need to look harder at that MEM_REF.  It references d_3.
>> That's what you need to be checking.  The base (d_3) is the first
>> operand of the MEM_REF, the offset is the second operand of the MEM_REF.
>>
>> (gdb) p debug_gimple_stmt ($2)
>> # .MEM_5 = VDEF <.MEM_4>
>> MEM[(char *)d_3(D) + 3B] = 0;
>>
>>
>> (gdb) p gimple_assign_lhs ($2)
>> $5 = (tree_node *) 0x7ffff01a6208
>>
>> (gdb) p debug_tree ($5)
>>  <mem_ref 0x7ffff01a6208
>>     type <integer_type 0x7ffff00723f0 char public string-flag QI
>>         size <integer_cst 0x7ffff0059d80 constant 8>
>>         unit-size <integer_cst 0x7ffff0059d98 constant 1>
>>         align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
>> 0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
>> <integer_cst 0x7ffff0059df8 127>
>>         pointer_to_this <pointer_type 0x7ffff007de70>>
>>
>>     arg:0 <ssa_name 0x7ffff0063dc8
>>         type <pointer_type 0x7ffff007de70 type <integer_type
>> 0x7ffff00723f0 char>
>>             public unsigned DI
>>             size <integer_cst 0x7ffff0059c90 constant 64>
>>             unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>>             align:64 warn_if_not_align:0 symtab:0 alias-set -1
>> canonical-type 0x7ffff007de70 reference_to_this <reference_type
>> 0x7ffff017d738>>
>>         visited var <parm_decl 0x7ffff01a5000 d>
>>         def_stmt GIMPLE_NOP
>>         version:3>
>>     arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
>> constant 3>
>>     j.c:4:6 start: j.c:4:5 finish: j.c:4:8>
>>
>>
>> Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase:
> 
> The d in the MEM_REF you see in the dump above is the SSA_NAME's
> SSA_NAME_VAR:
> 
>           visited var <parm_decl 0x7ffff01a5000 d>
> 
> Here's the print_node() code that prints it:
> 
>       print_node_brief (file, "var", SSA_NAME_VAR (node), indent + 4);
> 
> There is nothing else in the MEM_REF operand that tells me that.
> Why is it wrong to look at the SSA_NAME_VAR?
> 
>> (gdb) p debug_tree (lhsbase)
>> <ssa_name 0x7ffff0063dc8
>>     type <pointer_type 0x7ffff007de70
>>         type <integer_type 0x7ffff00723f0 char public string-flag QI
>>             size <integer_cst 0x7ffff0059d80 constant 8>
>>             unit-size <integer_cst 0x7ffff0059d98 constant 1>
>>             align:8 warn_if_not_align:0 symtab:0 alias-set -1
>> canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
>> 0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
>>             pointer_to_this <pointer_type 0x7ffff007de70>>
>>         public unsigned DI
>>         size <integer_cst 0x7ffff0059c90 constant 64>
>>         unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>>         align:64 warn_if_not_align:0 symtab:0 alias-set -1
>> canonical-type 0x7ffff007de70 reference_to_this <reference_type
>> 0x7ffff017d738>>
>>     visited var <parm_decl 0x7ffff01a5000 d>
>>     def_stmt GIMPLE_NOP
>>     version:3>
>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>> "wrong".
> 
> As Richard observed, that's because get_attr_nonstring_decl()
> returns the DECL that the expression refers to.  It does that
> because that's where it looks for attribute nonstring, and so
> that the warning can mention the DECL with the attribute.
> 
> I suppose since I'm not supposed to be using SSA_NAME_VAR
> (I still don't understand why it's taboo) I'll have to avoid
> using the get_attr_nonstring_decl() return value and instead
> look into comparing the SSA_NAMEs.
Because it's not generally useful because it has no dataflow information
associated with it.  SSA_NAMEs are what carry dataflow information and
what you need to check if you want to know if two objects are the same.

SSA_NAME_VAR's primary use is for diagnostic messages and debugging.  We
do hang attributes off the _DECL node it refers to, so you can take an
SSA_NAME, query its SSA_NAME_VAR if you need to check if the SSA_NAME
has a particular attribute property.  But if you're trying to see if two
objects in the IL are the same, you need to be looking at the SSA_NAME.

jeff

Reply via email to