comaniac commented on pull request #9261:
URL: https://github.com/apache/tvm/pull/9261#issuecomment-942488532


   Ah sorry I didn't make it clear. The interface of C source codegen does deal 
with dynamic workloads, because it takes the raw pointer which could be any 
size in run time.
   
   What I meant was how to generate CUTLASS kernels that are able to perform 
well with all shapes. In @Laurawly's post, they generate lots of kernels to 
cover possible shapes, which result in 7GB binary. I assume they also generate 
a run time dispatching logic (also in the generated C source code) to determine 
which kernel should be used given the known shape in run time. Obviously, the 
binary size will definitely be an issue for this solution.
   
   For JSON codegen/runtime, it would be similar to TensorRT: We simply dump a 
JSON graph in codegen without doing anything else. Meanwhile, we have a custom 
runtime that JITs/catches CUTLASS kernels based on known shapes. This results 
in a much smaller binary, but the first execution (or an execution with new 
shapes) may take several seconds or even a minute to JIT all kernels.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to