dblaikie added a comment. In D81865#2178542 <https://reviews.llvm.org/D81865#2178542>, @froydnj wrote:
> In D81865#2176589 <https://reviews.llvm.org/D81865#2176589>, @dblaikie wrote: > >> I believe this falls under the (using cppreference ( >> https://en.cppreference.com/w/cpp/language/union ) , since it's a bit easier >> to read) UB clause: " It's undefined behavior to read from the member of the >> union that wasn't most recently written. Many compilers implement, as a >> non-standard language extension, the ability to read inactive members of a >> union." >> >> Last member written to was the "StringTable" member, but then it's read from >> the "String" member, so that'd be UB. Commonly supported, but UB - not sure >> if we have a general statement that we depend on this behavior in LLVM, even >> though it's non-standard (but it's possible that we do make such an >> assumption about the compiler that's building LLVM). It'd be nice to avoid >> that, though - and not too difficult, I think - I /believe/ it's valid to >> take a pointer to an object, cast it to char*, compute a pointer to some >> specific member and then cast it back to the right type and access. But I >> could be wrong there. @rsmith would be the person to give an authoritative >> answer. > > Thanks for the explanation. Is the language of "writing" applicable here, > given that this is a constant blob of storage? (I suppose the compiler is > permitted to designate a particular member as having been "written"?) I /believe/ it is applicable, though I could be wrong. Ah, cppreference's example supports that theory at least: `S s = {0x12345678}; // initializes the first member, s.n is now the active member` - that the initialization itself does set the active member of the union. (oh, and a later example on cppreference, which is lifted from the C++ spec (though not sure exactly which version) says similarly: "Y y = { { 1, 2 } }; // OK, y.x is active union member (9.2)") > Would it be more palatable to write: > > struct StaticDiagInfoDescriptionStringTable { > // members as char[] for each diagnostic > }; > > const StaticDiagInfoDescriptionStringTable StaticDiagInfoDescriptions = { > // define all the members > }; > > ... > > struct StaticDiagInfoRec { > ... > StringRef getDescription() const { > size_t MyIndex = this - &StaticDiagInfo[0]; > uint32_t StringOffset = StaticDiagInfoDescriptionOffsets[MyIndex]; > // Defined as a pointer to the first member, and (presumably) there is > no internal padding. > const char *StringTable = reinterpret_cast<const > char*>(&StaticDiagInfoDescriptions); > return StringRef(&StringTable[StringOffset], DescriptionLen); > }; > > and then we don't have to care about how the host compiler interprets access > to different members of unions? I think so? I guess that's essentially the point of offsetof. The comment about padding probably isn't needed, I think? Even if there was padding, the StringOffset comes from "offsetof" so it describes the offset including any padding involved? Be great if @rsmith got a chance to weigh in here. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D81865/new/ https://reviews.llvm.org/D81865 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits