[PATCH] D139398: [AMDGPU] Add bf16 storage support

Matt Arsenault via Phabricator via cfe-commits Thu, 08 Dec 2022 07:43:50 -0800

arsenm added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td:49
+  CCIfType<[bf16], CCBitConvertToType<i16>>,
+  CCIfType<[v2bf16], CCBitConvertToType<i32>>,
   CCIfNotInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16] , CCAssignToReg<[
----------------
Pierre-vh wrote:
> arsenm wrote:
> > Without being added to a register class, all the tablegen changes should 
> > not do anything
> bf16 ones seem to not be needed but if I don't have the v2bf16 ones I get 
> "cannot allocate arguments" in "test_arg_store_v2bf16"
Sounds like a type legality logic bug which I don't want to spend time fighting


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:913
+      else
+        RegisterVT = (ScalarVT == MVT::bf16 ? MVT::v2bf16 : MVT::v2f16);
       IntermediateVT = RegisterVT;
----------------
If you wanted the promote to i32, you could have done it here instead of in the 
tablegen cc handling


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5558
+  // This lowers them into (i16 (bitconvert (f32 (fptrunc x))))
+  if (Op->getValueType(0) != MVT::i16)
+    return SDValue();
----------------
Op.getValueType()


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5563
+  return DAG.getNode(ISD::BITCAST, SL, MVT::i16,
+                     DAG.getFPExtendOrRound(Op->getOperand(0), SL, MVT::f16));
+}
----------------
Should be specific cast, not FPExtOrRound. I don't think the FP_ROUND case 
would be correct


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5569
+  // (bitconvert x))))
+  if (!Op->getValueType(0).isFloatingPoint() ||
+      Op->getOperand(0).getValueType() != MVT::i16)
----------------
Should use Op.getValueType() instead of Op->getValueType(0)


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5573-5576
+  SDLoc SL(Op);
+  return DAG.getNode(
+      ISD::FP_EXTEND, SL, MVT::f32,
+      DAG.getNode(ISD::BITCAST, SL, MVT::f16, Op->getOperand(0)));
----------------
ExpandNode covers lowering BF16_TO_FP. It also has a shift by 16-bits into the 
high bits. Is this correct?


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4819-4831
+    // When we don't have 16 bit instructions, bf16 is illegal and gets
+    // softened to i16 for storage, with float being used for arithmetic.
+    //
+    // After softening, some i16 -> fp32 bf16_to_fp operations can be left 
over.
+    // Lower those to (f32 (fp_extend (f16 (bitconvert x))))
+    if (!Op->getValueType(0).isFloatingPoint() ||
+        Op->getOperand(0).getValueType() != MVT::i16)
----------------
Pierre-vh wrote:
> arsenm wrote:
> > Pierre-vh wrote:
> > > arsenm wrote:
> > > > Pierre-vh wrote:
> > > > > arsenm wrote:
> > > > > > The generic legalizer should have handled this?
> > > > > It looks like those operations are not implemented in the generic 
> > > > > legalizer, e.g. I get 
> > > > > ``` 
> > > > > Do not know how to promote this operator's operand!
> > > > > ```
> > > > Right, this is the code that would go there
> > > Do I just copy/paste this code in that PromoteInt function, and keep a 
> > > copy here too in LowerOperation? (not really a fan of copy-pasting code 
> > > in different files, I'd rather keep it all here)
> > > We need to have the lowering too AFAIK, it didn't go well when I tried to 
> > > remove it
> > I'm not following why you need to handle it here
> IIRC:
>  - I need to handle FP_TO_BF16 in ReplaceNodeResult because that's what the 
> Integer Legalizer calls (through CustomLowerNode)
>  - I need to handle both opcodes in LowerOperation because otherwise they'll 
> fail selection. They can be left over from expanding/legalizing other 
> operations.
But why are they custom? We don't have to handle FP16_TO_FP or FP_TO_FP16 
there, and they aren't custom lowered. They have the same basic properties. We 
have this:


```
    setOperationAction(ISD::FP16_TO_FP, MVT::i16, Promote);
    AddPromotedToType(ISD::FP16_TO_FP, MVT::i16, MVT::i32);
    setOperationAction(ISD::FP_TO_FP16, MVT::i16, Promote);
    AddPromotedToType(ISD::FP_TO_FP16, MVT::i16, MVT::i32);
```

I'd expect the same basic pattern


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139398/new/

https://reviews.llvm.org/D139398

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D139398: [AMDGPU] Add bf16 storage support

Reply via email to