Hi Everyone,

The C and C++ Compatibility Study Group, when working on the new
standard `#embed` preprocessor parameter that mirrors the
`clang::offset(...)` and `gnu::offset(...)` parameters, had someone
raise a concern that the order of may be confusing. The concerns came
from the June 4th, 2025 meeting:
https://github.com/sg22-c-cpp-standard-compatibility/sg-compatibility/blob/main/README.md#june-4th-2025


Background
=========

Throughout the rest of this text, `clang::offset`, `gnu::offset`, and
the almost-standard `offset` parameter will be used interchangeably in
prose. They represent the same preprocessor embed parameter, with the
same semantics.

Similarly, a resource named `<data.bin>` is a resource with exactly 10
bytes and is considered as such when put in an `#embed` statement.

While the following 2 invocation of `#embed` are identical and produce
exactly the same data:

-----
#embed <data.bin> clang::offset(1) limit(3) /* ONE */
#embed <data.bin> limit(3)  clang::offset(1) /* TWO */
-----

some people questioned whether or not the difference in order might
make some people confused that they do not produce identical effects
(e.g., that `offset` is always calculated first based on the raw file
size, and then `limit` is applied after, or vice-versa).



The Core Proposal
==============

Following from the background, some people advocated for providing a
warning/error for if it was written in the "wrong order". That is,
since `limit` always applies after `offset`, the standard wanted to
mandate that such parameters must always be written in a specific
order. That is, `/* ONE */` would be fine but `/* TWO */` should
trigger an error.

It was then pointed out that this can also apply to other parameters
based on the standard wording. For example, `limit(0)` or
`offset(SIZE_MAX)` can make a resource that has data be considered
"empty". In particular, using `<data.bin>` again:

-----
#embed <data.bin> limit(0) if_empty("meow") /* THREE */
#embed <data.bin> if_empty("meow") limit(0) /* FOUR */
-----

`/* FOUR */`, under the previous ideals, should issue a diagnostic
since `if_empty` is being evaluated before `limit` turns the resource
empty, while `/* THREE */` would issue no diagnostics. This lead to
the formulation of the following guidance:

- `offset` must appear before `limit`.
- `limit` and/or `offset` must appear before any of `prefix`,
`suffix`, or `if_empty`.

We are asking implementations how they feel about the above 2 rules
and implementing them.

To be extremely clear: `offset`, `clang::offset`, and `gnu::offset`
always apply before the standard `limit(...)` parameter, both in
Wording and in All Real Implementations, but do not impose an order in
how they are written.

To be more clear: this is not how C23 specified it, and not how C++
standardized it so far. As `#embed`'s principles author and carrier
through the last 7 years, nobody has really came forward to say this
was confusing or harmful, but this may simply be selection bias or
simply that nobody has spoken up.

We note that some of this is weird. Again, consider the case of `/*
FOUR */` before:

#embed <data.bin> if_empty("meow") limit(0) /* FOUR */

If `<data.bin>` is an empty resource, would that mean the preceding
`if_empty` is fine because `limit(0)` would not have any effect
anyways? In an obvious sense, the diagnostic would apply anwyays but
this is one of those things where I personally did not believe anyone
would advocate for ordering requirements either way so now I feel like
I have to ask if that's a quality-of-implementation thing anyone would
care about in the first place. This is, again, in the face of the fact
that the order of the parameters does on all the implementations and
that nobody has asked me both in the run-up to standardization and
after if this should be a thing.



The Questions
============

Therefore, we'd like to poll the GCC developer community:

1. Does anyone think a diagnostic on the order will help prevent
confusion with users, even if the semantics never change between
invocations regardless of parameter order?
2. If the answer to (1) is yes, do we believe it should be a warning
(recommended practice in Standard Speak) or an error (a Constraint
Violation/Ill-Formed in Standard Speak)?

Sub-questions such as "an error, but only in pedantic mode" and
similar can be golfed and bikeshedded after answering the first two
questions.

A formalization of these semantics is going to be presented to WG21
and WG14 at some point. I'm gathering implementer feedback and
willingness to change their existing implementations to formulate a
new paper: https://isocpp.org/files/papers/P3731R0.html

Thank you for reading,
Björkus

CC:
Jakub Jelinek (who wrote about implementing this on RedHat; apologies
if that's an inappropriate CC)
Joseph Myers (a WG14 regular who may have been tangentially interested
in this question)

Reply via email to