================
@@ -80,8 +85,10 @@ class NVPTXSubtarget : public NVPTXGenSubtargetInfo {
   bool allowFP16Math() const;
   bool hasMaskOperator() const { return PTXVersion >= 71; }
   bool hasNoReturn() const { return SmVersion >= 30 && PTXVersion >= 64; }
-  unsigned int getSmVersion() const { return SmVersion; }
+  unsigned int getSmVersion() const { return FullSmVersion / 10; }
+  unsigned int getFullSmVersion() const { return FullSmVersion; }
   std::string getTargetName() const { return TargetName; }
+  bool isSm90a() const { return getFullSmVersion() == 901; }
----------------
Artem-B wrote:

According to [CUDA 
docs](docs.nvidia.com/cuda/parallel-thread-execution/index.html?highlight=sm_90a#ptx-module-directives-target)

> Target architectures with suffix “a”, such as sm_90a, include 
> architecture-accelerated features that are supported on the specified 
> architecture only, hence such targets do not follow the onion layer model. 
> Therefore, PTX code generated for such targets cannot be run on later 
> generation devices. Architecture-accelerated features can only be used with 
> targets that support these features.

It's not clear where they are going with this approach.

I can make it a more generic `int hasAAFeatures() { return FullSmVersion % 10;  
}` if that's what you're looking for.


https://github.com/llvm/llvm-project/pull/74895
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to