Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

Jeff Law Tue, 14 Aug 2018 22:32:52 -0700

On 08/03/2018 07:00 AM, Bernd Edlinger wrote:
> On 08/02/18 22:34, Martin Sebor wrote:
>> On 08/02/2018 12:56 PM, Bernd Edlinger wrote:
>>> On 08/02/18 15:26, Bernd Edlinger wrote:
>>>>>
>>>>>    /* If the length can be computed at compile-time, return it.  */
>>>>> -  len = c_strlen (src, 0);
>>>>> +  tree array;
>>>>> +  tree len = c_strlen (src, 0, &array);
>>>>
>>>> You know the c_strlen tries to compute wide character sizes,
>>>> but strlen does not do that, strlen (L"abc") should give 1
>>>> (or 0 on a BE machine)
>>>> I wonder if that is correct.
>>>>
>>> [snip]
>>>>>
>>>>>  static tree
>>>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>>>  {
>>>>>    if (!validate_arg (arg, POINTER_TYPE))
>>>>>      return NULL_TREE;
>>>>>    else
>>>>>      {
>>>>> -      tree len = c_strlen (arg, 0);
>>>>> -
>>>>> +      tree arr = NULL_TREE;
>>>>> +      tree len = c_strlen (arg, 0, &arr);
>>>>
>>>> Is it possible to write a test case where strlen(L"test") reaches this 
>>>> point?
>>>> what will c_strlen return then?
>>>>
>>>
>>> Yes, of course it is:
>>>
>>> $ cat y.c
>>> int f(char *x)
>>> {
>>>    return __builtin_strlen(x);
>>> }
>>>
>>> int main ()
>>> {
>>>    return f((char*)&L"abcdef"[0]);
>>> }
>>> $ gcc -O3 -S y.c
>>> $ cat y.s
>>> main:
>>> .LFB1:
>>>     .cfi_startproc
>>>     movl    $6, %eax
>>>     ret
>>>     .cfi_endproc
>>>
>>> The reason is that c_strlen tries to fold wide chars at all.
>>> I do not know when that was introduced, was that already before your last 
>>> patches?
>>
>> The function itself was introduced in 1992 if not earlier,
>> before wide strings even existed.  AFAICS, it has always
>> accepted strings of all widths.  Until r241489 (in GCC 7)
>> it computed their length in bytes, not characters.  I don't
>> know if that was on purpose or if it was just never changed
>> to compute the length in characters when wide strings were
>> first introduced.  From the name I assume it's the latter.
>> The difference wasn't detected until sprintf tests were added
>> for wide string directives.  The ChangeLog description for
>> the change reads: Correctly handle wide strings.  I didn't
>> consider pathological cases like strlen (L"abc").  It
>> shouldn't be difficult to change to fix this case.
>>
> 
> Oh, oh, oh....
> 
> $ cat y3.c
> int main ()
> {
>    char c[100];
>    int x = __builtin_sprintf (c, "%S", L"\uFFFF");
> 
>    __builtin_printf("%d %ld\n", x,__builtin_strlen(c));
> }
> 
> $ gcc-4.8 -O3 -std=c99 y3.c
> $ ./a.out
> -1 0
> $ gcc -O3 y3.c
> $ ./a.out
> 1 0
> $ echo $LANG
> de_DE.UTF-8
> 
> I would have expected L"\uFFFF" to converted to UTF-8
> or another encoding, so the return value if sprintf is
> far from obvious, and probably language dependent.
FWIW, Martin has a patch (under review) that I think will fix this and
includes a testcase that is likely inspired by the code above.


> 
> Why do you think it is a good idea to use really every
> opportunity to optimize totally unnecessary things like
> using the return value from the sprintf function as it is?
> 
> Did you never think this adds a significant maintenance
> burden on the rest of us as well?
It largely came along for free during the implementation of the sprintf
warnings.  That's changed a bit over time, but it's still the case that
the sprintf warnings do all the analysis necessary to optimize the
sprintf return value.

As both Martin and I have stated before the real goal is getting good
warnings from sprintf.  Optimization is a distant second.

jeff

Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

Reply via email to