https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122783
--- Comment #1 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
Clang now says that there are two versions of cuda compatibility:
If I understand that correctly, then my card, sm_120 would be incompatible with
earlier cards (as an exception to previous cuda behavior) which would explain
the miscompilations reported above:
They say:
The family-specific variants have f feature suffix and they follow following
order: sm_X{Y2}f > sm_X{Y1}f iff Y2 > Y1 sm_XY{f} > sm_{XY}{}
The architecture-specific variants have a feature suffix and they follow
following order: sm_XY{a} > sm_XY{f} > sm_{XY}{}
For example, take sm_100f (10 represents X, 0 represents Y, and f represents z)
and sm_103f (10 represents X, 3 represents Y, and f represents z) architecture
variants. Since Y1 < Y2, sm_100f is compatible with sm_103f. Similarly based on
the second rule, sm_90 is compatible with sm_103f.
Thats what we are used to.
But then there is the blackwell chip now
Some counter examples, take sm_100f and sm_120f (12 represents X, 0 represents
Y, and f represents z) architecture variants. Since both belongs to different
family i.e. X1 != X2, sm_100f is not compatible with sm_120f.
The architecture-specific variants have a feature suffix and they follow
following order: sm_XY{a} > sm_XY{f} > sm_{XY}{}
I have sm_120 of coursem X1=12.... gcc's sm_89 has X2=8 probably that is this
counter example with an incompatibility for X1!=X2 with X1=12, if I understand
correctly...