On 6/26/26 3:16 PM, Arsen Arsenović wrote:
The set of all qualifiers present on a type consists of the const, volatile, restrict and atomic qualification, which may be either present or absent, and the address space qualifier, which may be one of many values (one of which is the "generic" address space, present on all platforms, used in absence of another address space; it is also the address space qualifier on which all standard library routines operate).For the former four, 'int' serves us okay; using ints as sets is a well-understood pattern (but it leads to problems like those described here when those ints cease to be sets). Since they're either present or absent, treating an int as a bit-set is convenient and simple. But, with the introduction of address space qualifiers into the mix, the semantics of these operators becomes incorrect. Take, for instance, the following line of code: quals_union = quals1 | quals2; ... (where quals1, quals2 are TYPE_QUALS of some types) Only in cases where DECODE_QUAL_ADDR_SPACE (quals1) == DECODE_QUAL_ADDR_SPACE (quals2), or where one of those address spaces is generic, and the other a super/subset of the generic address space, does this yields what the author intended. In all other cases, the operation above yields subtly incorrect results, while the compiler is (naturally) silent about it. This issue is not theoretical either. In the implementation of the C++ Named Address Spaces support currently being worked on and discussed on the mailing list[2], the following broken testcase arose (for the GCN target; __flat address space is AS1, __lds is AS2, and __gds is AS3; __flat is a superset address space to __lds): template<typename T> __flat T * foo (); void bar () { **foo<__lds int> (); } The above testcase produces the diagnostic: <stdin>:1:52: error: invalid type argument of unary '*' (have '__gds int') It is obvious how this came about: the frontend used '|' to merge two sets of qualifiers, presuming that such a merge can never fail, because the union of two sets is a valid and complete function, and that the bit-OR of these two bitsets is the union of the corresponding two sets of qualifiers. However, this is not so: with the introduction of address spaces, the union of two sets of qualifiers ceases to coincide with the bit-OR operator, and the union becomes a partial function. To demonstrate the former, we can use the testcase above. In it, the C++ frontend was trying to take the union of the qualifier sets {__lds} and {__flat}. These were previously represented as 0x0100 and 0x0200. Ergo, the result of the bit-op was 0x0300. 0x0300 corresponds to the qualifier set {__gds}. This is how we got the bad diagnostic above. This is an instance of the "bit-OR no longer coincides with set union (except by accident)" problem; the union of those two sets (in most cases, anyway) is {__flat}. But, even if we were to fix this initial problem, we still have the problem of the union of qualifier sets becoming a partial function. This one can be demonstrated in C, as the C frontend already deals with this problem in an ad-hoc fashion. In the following case, the __lds and __global address spaces are distinct (though they are both subsets of the __flat address space): typedef __lds int lds_int; void foo (__global lds_int *x); ... the C frontend issues the diagnostic: <stdin>:2:31: error: conflicting named address spaces (__global vs __lds) It had to do so because the union of the qualifier sets {__lds} and {__global} for the purposes of this declaration does not exist.[1] The way the C FE handles this is ad-hoc: in 'grokdeclarator', it has a specific check that covers this case: if (!ADDR_SPACE_GENERIC_P (as1) && !ADDR_SPACE_GENERIC_P (as2) && as1 != as2) error_at (loc, "conflicting named address spaces (%s vs %s)", c_addr_space_name (as1), c_addr_space_name (as2)); Thus, it is quite easy for developers to forget this check. What's worse, there is a lot of existing code that presumes that the set of qualifiers is actually the set of the four "simple" present/absent qualifier (CVRA). If that presumption changes, there's no way to diagnose all sites where this presumption is now broken (as it is in the C++ NAS support patch). The C++ type system is completely capable of encoding the restrictions above, and ergo diagnosing misuse. Making use of that is the goal of this patch. First, we make a distinction between cv_qualifier and qualifier_set.
I think this choice could use more explanation of its rationale. IIUC this is e.g. to allow quals | TYPE_QUAL_CONST but not quals1 | quals2 where quals2 happens to contain only TYPE_QUAL_CONST?
The former models the CVRA qualifiers, which are either present or absent. In this revision, I left it as an unscoped enum, but fixed its underlying type as 'unsigned char' (so that its size is known to be less than that of qualifier_set). It may be desirable to make it an enum class, to forbid the usage of operators like + on it. This is a lower priority since there's no existing code that does so. The latter (qualifier_set) models the set of all qualifiers, i.e. CVRA + the address space qualifier (at the moment). It is implemented as an enum class with the 'unsigned' underlying type. I'd have preferred qualifier_set to be a struct but, unfortunately, some ABIs do not pass small, trivial stricts (even of size equal to 'unsigned') through registers; though, this seems not to be an issue on most 64-bit platforms, and thus it may be worth doing anyway. Doing so would've made working with qualifier sets less verbose, by providing member functions and conversion operators/constructors, it'd have eliminated many cases that appear in this patch where an convenience overload is added so that a qualifier_set function can accept cv_qualifiers also, and it'd have forbidden constructing qualifier_sets with invalid contents. I'd like opinions on this, I think it'd be beneficial enough to justify the presumably tiny runtime hit on such platforms, personally.
That seems fine to me.
I elected to refactor the C and C++ frontends for the purpose of this RFC. I wouldn't mind also doing the rest (and I wouldn't mind someone else doing them either ;-) ), but these two are sufficient for the demonstration. The C FE, of course, already handles address spaces, so changes in it were mostly mechanical. The C++ FE, however, does not. In this patch, I've strung in address space support where the compiler told me to, where it was also obvious how to do so. In the places where the compiler told me to, but it was not clear what to do with address spaces, I've left an assert. This demonstrates why utilizing the type system in this way is quite useful; it forces the user to at least be aware (and, sometimes, even to take into account) the restrictions that exist on the qualifier types. It also presented a way to find where named address space support work actually needs to be done (indeed, it found quite a few cases that the patch series that implements C++ NAS support missed). The interesting bits of the patch are in tree-core.h and tree.h. These two provide the new types and matching helper functions. It may be worth breaking them out into bits-style headers, though. I've tried to curb the growth of those headers too much, but the constexpr operators and functions were actually needed a few times. For cv_qualifiers, I've provided binary bitwise operations, in order to inhibit integer promotion. This makes it so that manual casting isn't necessary when using cv_qualifier values. For qualifier_set, tree.h lost operations that were made redundant/wrong by the change. In their place, I've provided functions for modifying and reading qualifier_set values. I also provided a split_quals. The idea with this one was to induce compile errors if, at some point, split_quals stops being two components (cv_qualifier and address_space_t), and becomes more. It's somewhat cumbersome to use because we can't use structured binding, though. Hence, it may not be worth it.
Maybe std::tie would be a useful substitute for structured bindings? Jason > [snip]
