On 6/26/26 3:16 PM, Arsen Arsenović wrote:
The set of all qualifiers present on a type consists of the const,
volatile, restrict and atomic qualification, which may be either present
or absent, and the address space qualifier, which may be one of many
values (one of which is the "generic" address space, present on all
platforms, used in absence of another address space; it is also the
address space qualifier on which all standard library routines operate).

For the former four, 'int' serves us okay; using ints as sets is a
well-understood pattern (but it leads to problems like those described
here when those ints cease to be sets).  Since they're either present or
absent, treating an int as a bit-set is convenient and simple.  But,
with the introduction of address space qualifiers into the mix, the
semantics of these operators becomes incorrect.

Take, for instance, the following line of code:

   quals_union = quals1 | quals2;

... (where quals1, quals2 are TYPE_QUALS of some types)

Only in cases where DECODE_QUAL_ADDR_SPACE (quals1) ==
DECODE_QUAL_ADDR_SPACE (quals2), or where one of those address spaces is
generic, and the other a super/subset of the generic address space, does
this yields what the author intended.  In all other cases, the operation
above yields subtly incorrect results, while the compiler is (naturally)
silent about it.

This issue is not theoretical either.  In the implementation of the C++
Named Address Spaces support currently being worked on and discussed on
the mailing list[2], the following broken testcase arose (for the GCN
target; __flat address space is AS1, __lds is AS2, and __gds is AS3;
__flat is a superset address space to __lds):

   template<typename T>
   __flat T *
   foo ();

   void
   bar ()
   { **foo<__lds int> (); }

The above testcase produces the diagnostic:

   <stdin>:1:52: error: invalid type argument of unary '*' (have '__gds int')

It is obvious how this came about: the frontend used '|' to merge two
sets of qualifiers, presuming that such a merge can never fail, because
the union of two sets is a valid and complete function, and that the
bit-OR of these two bitsets is the union of the corresponding two sets
of qualifiers.

However, this is not so: with the introduction of address spaces, the
union of two sets of qualifiers ceases to coincide with the bit-OR
operator, and the union becomes a partial function.

To demonstrate the former, we can use the testcase above.  In it, the
C++ frontend was trying to take the union of the qualifier sets {__lds}
and {__flat}.  These were previously represented as 0x0100 and 0x0200.
Ergo, the result of the bit-op was 0x0300.

0x0300 corresponds to the qualifier set {__gds}.  This is how we got the
bad diagnostic above.  This is an instance of the "bit-OR no longer
coincides with set union (except by accident)" problem; the union of
those two sets (in most cases, anyway) is {__flat}.

But, even if we were to fix this initial problem, we still have the
problem of the union of qualifier sets becoming a partial function.

This one can be demonstrated in C, as the C frontend already deals with
this problem in an ad-hoc fashion.

In the following case, the __lds and __global address spaces are
distinct (though they are both subsets of the __flat address space):

   typedef __lds int lds_int;
   void foo (__global lds_int *x);

... the C frontend issues the diagnostic:

   <stdin>:2:31: error: conflicting named address spaces (__global vs __lds)

It had to do so because the union of the qualifier sets {__lds} and
{__global} for the purposes of this declaration does not exist.[1]

The way the C FE handles this is ad-hoc: in 'grokdeclarator', it has a
specific check that covers this case:

   if (!ADDR_SPACE_GENERIC_P (as1) && !ADDR_SPACE_GENERIC_P (as2) && as1 != as2)
     error_at (loc, "conflicting named address spaces (%s vs %s)",
              c_addr_space_name (as1), c_addr_space_name (as2));

Thus, it is quite easy for developers to forget this check.

What's worse, there is a lot of existing code that presumes that the set
of qualifiers is actually the set of the four "simple" present/absent
qualifier (CVRA).  If that presumption changes, there's no way to
diagnose all sites where this presumption is now broken (as it is in the
C++ NAS support patch).

The C++ type system is completely capable of encoding the restrictions
above, and ergo diagnosing misuse.  Making use of that is the goal of
this patch.

First, we make a distinction between cv_qualifier and qualifier_set.

I think this choice could use more explanation of its rationale. IIUC this is e.g. to allow quals | TYPE_QUAL_CONST but not quals1 | quals2 where quals2 happens to contain only TYPE_QUAL_CONST?

The former models the CVRA qualifiers, which are either present or
absent.

In this revision, I left it as an unscoped enum, but fixed its
underlying type as 'unsigned char' (so that its size is known to be less
than that of qualifier_set).  It may be desirable to make it an enum
class, to forbid the usage of operators like + on it.  This is a lower
priority since there's no existing code that does so.

The latter (qualifier_set) models the set of all qualifiers, i.e. CVRA +
the address space qualifier (at the moment).  It is implemented as an
enum class with the 'unsigned' underlying type.

I'd have preferred qualifier_set to be a struct but, unfortunately, some
ABIs do not pass small, trivial stricts (even of size equal to
'unsigned') through registers; though, this seems not to be an issue on
most 64-bit platforms, and thus it may be worth doing anyway.

Doing so would've made working with qualifier sets less verbose, by
providing member functions and conversion operators/constructors, it'd
have eliminated many cases that appear in this patch where an
convenience overload is added so that a qualifier_set function can
accept cv_qualifiers also, and it'd have forbidden constructing
qualifier_sets with invalid contents.

I'd like opinions on this, I think it'd be beneficial enough to justify
the presumably tiny runtime hit on such platforms, personally.

That seems fine to me.

I elected to refactor the C and C++ frontends for the purpose of this
RFC.  I wouldn't mind also doing the rest (and I wouldn't mind someone
else doing them either ;-) ), but these two are sufficient for the
demonstration.

The C FE, of course, already handles address spaces, so changes in it
were mostly mechanical.

The C++ FE, however, does not.  In this patch, I've strung in address
space support where the compiler told me to, where it was also obvious
how to do so.  In the places where the compiler told me to, but it was
not clear what to do with address spaces, I've left an assert.

This demonstrates why utilizing the type system in this way is quite
useful; it forces the user to at least be aware (and, sometimes, even to
take into account) the restrictions that exist on the qualifier types.
It also presented a way to find where named address space support work
actually needs to be done (indeed, it found quite a few cases that the
patch series that implements C++ NAS support missed).

The interesting bits of the patch are in tree-core.h and tree.h.  These
two provide the new types and matching helper functions.  It may be
worth breaking them out into bits-style headers, though.  I've tried to
curb the growth of those headers too much, but the constexpr operators
and functions were actually needed a few times.

For cv_qualifiers, I've provided binary bitwise operations, in order to
inhibit integer promotion.  This makes it so that manual casting isn't
necessary when using cv_qualifier values.

For qualifier_set, tree.h lost operations that were made redundant/wrong
by the change.  In their place, I've provided functions for modifying
and reading qualifier_set values.

I also provided a split_quals.  The idea with this one was to induce
compile errors if, at some point, split_quals stops being two
components (cv_qualifier and address_space_t), and becomes more.  It's
somewhat cumbersome to use because we can't use structured binding,
though.  Hence, it may not be worth it.

Maybe std::tie would be a useful substitute for structured bindings?

Jason

> [snip]

Reply via email to