liangfu commented on a change in pull request #4392: [VTA] Enable streamlined 
GEMM execution
URL: https://github.com/apache/incubator-tvm/pull/4392#discussion_r350549356
 
 

 ##########
 File path: vta/hardware/chisel/src/main/scala/core/TensorGemm.scala
 ##########
 @@ -126,8 +145,7 @@ class MatrixVectorMultiplication(implicit p: Parameters) 
extends Module {
   })
   val dot = Seq.fill(size)(
     Module(new DotProduct(aBits = inpBits, bBits = wgtBits, size)))
-  val acc = Seq.fill(size)(
-    Module(new Pipe(UInt(accBits.W), latency = log2Ceil(size) + 1)))
+  val acc = Seq.fill(size)(Module(new Pipe(UInt(accBits.W), latency = 2)))
 
 Review comment:
   It's one cycle for MAC module (, which is a fused-mulitply-adder (FMA)), the 
other one cycle for one `PipeAdder` in the first layer of the accumulator. 
Therefore, this 1+1=2 should be smaller than 4 (the states iterate over 
`sReadUop :: sComputeIdx :: sReadTensor :: sExe` in TensorGemm module), in 
order to ensure accumulated `acc` should be available for sReadTensor stage. 
Does this make sense?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to