[clang] [clang][sema] Add nonnull attribute to builtin format functions (PR #160988)

Nikolas Klauser via cfe-commits Fri, 07 Nov 2025 07:27:44 -0800

Radovan =?utf-8?q?Božić?= <[email protected]>,
Radovan =?utf-8?q?Božić?= <[email protected]>,
Radovan =?utf-8?q?Božić?= <[email protected]>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/[email protected]>

philnik777 wrote:

> > > I consider whether a pointer is allowed to be null or not to be a 
> > > fundamental part of the contract of the function; you don't?
> > 
> > 
> > Not necessarily. The standard doesn't make any guarantees, but libraries 
> > are allowed to. That's the part the below example doesn't match. 
> > `__builtin_unreachable()` isn't part of any standard,
> 
> It's the implementation for `std::unreachable` so it does follow the standard.

I don't really agree here. It's _one possible_ implementation. libc++'s 
`std::unreachable` isn't necessarily equivalent to `__builtin_unreachable`. 
That's only not a problem because the compiler doesn't hijack 
`std::unreachable`.

> I used it as an example because the point to library builtins is to recognize 
> calls to standard library functions and translate those into 
> `__builtin_whatever` calls so the rest of the toolchain can reason about the 
> call more easily. So for the compiler, there is potentially no difference 
> between calling `std::unreachable` and calling `__builtin_unreachable` 
> because they may the same thing. (We don't have library builtins for most of 
> the C++ standard library because most of the C++ standard library does not 
> lend itself to translation to builtins, but it does happen.)

I think it's not a particularly useful example here, since it doesn't hijack 
anything. If you only modified `__builtin_printf` and not `printf` I think the 
idea of this patch would be much less contentious. Feel free to go wild on the 
API provided by the compiler.

> I understand that's your assertion but what I don't understand is the 
> justification for it. To me, this is the opposite of how things usually work: 
> if the standard defines it as UB, we optimize based on that unless there's a 
> good reason _not_ to. Your argument is that we need to provide a good reason 
> _to_ optimize based on UB otherwise we shouldn't be doing so. That's a valid 
> stance to take, but isn't how we've approached things in the past and I think 
> requires wider community buy-in.

I don't think that's my stance at all. My stance is that the compiler shouldn't 
assume what UB libraries define and what not, unless there is good evidence 
that libraries do in fact not define the behaviour. AFAICT this is the first 
attribute where we assume libraries don't define certain UB. All the other 
attributes tell the compiler something a standard already guarantees.

> To me, I think this is the crux of the problem. There's a tension between the 
> notion of a builtin statically knowing the semantics of the API and the 
> notion of a specific library implementation wanting to have different (maybe 
> stronger) semantics. We don't have a way for a library to signal "we're doing 
> something beyond the standard semantics", so we have no way from the compiler 
> to know which markings are reasonable. For example, maybe we want to encode 
> whether an API sets `errno` internally or not (think: using the `pure` 
> marking), but some libraries do set `errno` while others do not. But we can't 
> necessarily rely on the library to mark things for us (or opt out of our 
> inferences) because some popular libraries provide no markings (for example, 
> musl or MSVC CRT). I'm not certain what a good solution for this is, so I 
> think we're left doing what we think is best on a case-by-case basis with the 
> static information.

I understand that there are libraries that don't annotate their APIs and that 
may mean the compiler needs to add attributes by hijacking the declarations. 
That doesn't mean I have to like it, and I'd be especially unhappy if that 
happened to C++ APIs. OTOH I'd be quite happy if we had some way to opt-in to 
function hijacking, i.e. have some attribute to say "this function has 
identical semantics to __builtin_whatever". I realize this requires library 
buy-in which seems to not be an option for some libcs, but I do think it's 
feasible for the C++ libraries.

> I think what I'm hearing on this PR is that we're all comfortable optimizing 
> on it for the formatted IO functions (or am I reading the room wrong)?

I'm fine with this, yes. I'm also happy to move the discussion somewhere else, 
maybe your next office hours?

https://github.com/llvm/llvm-project/pull/160988
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][sema] Add nonnull attribute to builtin format functions (PR #160988)

Reply via email to