Re: Using source-level annotations to help GCC detect buffer overflows

David Brown Tue, 29 Jun 2021 11:31:27 -0700

On 29/06/2021 17:50, Martin Sebor wrote:
> On 6/29/21 6:27 AM, David Brown wrote:
>> On 28/06/2021 21:06, Martin Sebor via Gcc wrote:
>>> I wrote an article for the Red Hat Developer blog about how
>>> to annotate code to get the most out of GCC's access checking
>>> warnings like -Warray-bounds, -Wformat-overflow, and
>>> -Wstringop-overflow.  The article published last week:
>>>
>>> https://developers.redhat.com/articles/2021/06/25/use-source-level-annotations-help-gcc-detect-buffer-overflows
>>>
>>>
>>
>> Thanks for that write-up - and of course thank you to whoever
>> implemented these attributes!
>>
>> The caveat that the access attributes are lost when a function is
>> inlined is an important one.  As a user who appreciates all the checks I
>> can get, it is disappointing - but I assume there are good reasons for
>> that limitation.  I can merely hope that will change in future gcc
>> versions.
> 
> There's nothing the attribute could obviously attach to after a call
> has been inlined.  An extreme example is a function whose argument
> isn't used:
> 
>   __attribute__ ((access (write_only, 1, 2))) void
>   f (char *p, int n) { }
> 
> (The function might have a body in the original source that could
> be eliminated from the IL based on the values of other arguments.)


Could these attributes not be attached to the arguments when the
function is called, or the parameters when the function is expanded?
After all, in cases such as the "access" attribute it is not the
function as such that has the access hints, it is the parameters of the
function.

(I'm talking here based on absolutely no knowledge of how this is
implemented, but it's always possible that a different view, unbiased by
knowing the facts, can inspire new ideas.)

> 
> Calls to it that are not inlined will be checked but those that are
> won't be.  This could be improved by doing the checking also before
> inlining but at a cost of some false positives for code that's later
> determined to be unreachable.  I don't have a sense of how bad it
> might be so it's something to try.  This class of false positives
> could also be dealt with by queuing up the warnings (e.g., by
> emitting them into the IL via __builtin_warning) and issuing them
> only if they survive dead code elimination.  This is something I'd
> like to try to tackle for GCC 12.
> 

I fully appreciate that some checks can be easier earlier in the
process, others later.  It might even be helpful to do similar checks at
more than one stage, and combine the results.

>>
>> I believe it would make sense to add this information to the gcc manual
>> page for common function attributes.  There are quite a number of
>> attributes that are useful for static checking, such as
>> "warn_unused_result" and "nonnull".  Are these also dropped if the
>> function is inlined?
> 
> I agree the documentation could stand to be made clearer on this
> point.  In general, I think it would be helpful to give users
> more guidance about what to expect from attributes as well as
> warnings: which ones are purely lexical and which ones flow-
> sensitive and so inherently susceptible to false positives and
> negatives, and to what extent.

It could be difficult to quantify that kind of thing, but sometimes
guidance could be useful.  (There is already such information for some
warning flags, especially those that support multiple levels.)

Certainly since first reading about the "access" attributes, I have been
considering adding them to my current project.  I have also been mulling
around in my head possibilities of making variadic templates in C++ that
add access attributes in the right places for some kinds of pointers -
but now that I know the attributes will get dropped for inline
functions, and such templates would involve inline functions, there is
little point.  (Maybe I will still figure a neat way to do this for
external functions - it just won't be useful in as many places.)

> 
> Whether an attribute has an effect depends on the compilation stage
> where it's handled.  warn_unused_result is handled very early (well
> before inlining) so it always has the expected effect.  Attribute
> nonnull is handled both early (to catch the simple cases) and also
> later, after inlining, to benefit from some flow analysis, so its
> effect is lost if the function it attaches to is inlined.  Attribute
> access is handled very late and so it suffers from this problem
> even more.
> 

I suppose some attributes are not needed for inline functions, since the
compiler has the full function definition and can figure some things out
itself.  That would apply to "pure" and "const" functions, I expect.
And if you want a parameter to be non-null, it's possible to do a check
inside the function:

extern void __attribute__((error("Nonnull check failed")))
                nonnull_check_failed(void);

#define nonnull(x) \
    do { \
        if (__builtin_constant_p(!(x))) { \
            if (!(x)) nonnull_check_failed(); \
        } \
        if (!(x)) __builtin_unreachable(); \
    } while (0)


inline int foo(int *p) {
        nonnull(p);
        (*p)++;
        return *p;
}


(The "__builtin_unreachable()" line could also be a call to an error
handler, or missing entirely, according to need.)


If you try to call "foo(0)" and the compiler can see at compile time
that the parameter is null, you'll get a compile-time error.  I've used
that kind of check in my code, but it's a little uglier than
__attribute__((nonnull)) !



> The new attribute malloc (to associate allocators with deallocators)
> is also handled very late but it deals with the same problem by
> disabling inlining.  This was done to avoid false positives, not
> to prevent false negatives, but it works for both.  Disabling
> inlining is obviously suboptimal and wouldn't be appropriate for
> simple accessor functions but for memory allocation it seems like
> an acceptable tradeoff.

I've sometimes had allocator functions that are so simple that inlining
is appropriate (on dedicated embedded systems it is not unusual to need
to allocate some memory early on in the code, but never need to free it,
leading to minimal allocation functions).  But the cost of a function
call would not be noticeable.

> 
> The inlining problem is not unique to attributes that affect
> warnings.  It impacts all function (and function type) attributes,
> including those that affect optimization.  Those that specifically
> change optimization options disable inlining to avoid meaningless
> mismatches between options controlling the codegen of the caller
> and those intended to control the codegen for the callee.
> 

It's obvious that some attributes don't play well with inlining, such as
"section" (unless inlined in a function with the same section
attribute), but it looks like there is a lot of detail that is missing
from the manual pages here.  And some of these effects are
counter-intuitive and unhelpful.

For example, it is very occasionally useful to arithmetic operations on
signed types with wrapping semantics rather than the usual "overflow
doesn't happen" semantics that lets gcc produce more efficient code.  A
neat and convenient way to write that in C++ would be to make an enum
class for wrapping ints:

enum class WInt : int {};

__attribute__((optimize("-fwrapv")))
WInt operator + (WInt x, WInt y) {
    return (WInt) ((int) x + (int) y);
}

__attribute__((optimize("-fwrapv")))
WInt operator - (WInt x, WInt y) {
    return (WInt) ((int) x - (int) y);
}

// etc.


Simple, clear, safe (you can't mix "int" and "WInt" by mistake) and
efficient - one might think.  But it turns out this is not the case -
using these operators from a function that does not also have "-fwrapv"
in effect will lead to function calls.

I'm glad I found out now, and not in a situation where inlining was
important.  But I think it would be a good thing to mention this in the
documentation.  (It would be even better to remove the restriction on
inlining, but I expect that will take more time!)

mvh.,

David

Re: Using source-level annotations to help GCC detect buffer overflows

Reply via email to