vinx13 commented on pull request #39: URL: https://github.com/apache/tvm-rfcs/pull/39#issuecomment-935305808
Thanks @Lunderberg for the RFC. Logical-physical mapping is definitely an important feature. I also implemented something similar for warp memory to support tensor core instructions on GPU, I'm happy to collaborate more to get an unified design. Some preliminary comments: The current representation of logical-physical layout mapping is to use an array of axis/factor to define how the logical axes are split/reordered/fused to form the physical axes. This works for the case of packed layout like `NCHW4c`, but we might need to think whether this is a generic way to represent the mapping. For example, another way is to use a mapping function: `(n, c, h, w) -> (n, tir.floordiv(c, 4), h, w, tir.floormod(c, 4))`. This would allow arbitrary mapping (we can add more restrictions like requiring affine mapping though, to make analysis easier). A possible use cases of more complex mapping is [permuted layout](https://github.com/NVIDIA/cutlass/blob/master/media/docs/implicit_gemm_convolution.md#shared-memory-layouts) for shared memory on CUDA. Also, there are related [affine analysis infrastructure](https://github.com/apache/tvm/blob/main/include/tvm/arith/iter_affine_map.h) available, it would be great if we can reuse it for loop analysis and rewriting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
