comaniac opened a new pull request #8457: URL: https://github.com/apache/tvm/pull/8457
Per discussion in https://discuss.tvm.apache.org/t/cuda-enable-half2-in-cuda-injective-schedule/10441, this PR improves the CUDA injective schedule to benefit more from `half2` when working on `float16`. The background is that although the CUDA injective schedule does vectorize the innermost loop when working on float16, the vectorization may fail due to the if-conditions introduced by non-dividable workload and block/thread sizes. Formally, vectorization requires `prod(output_shape) % block % thread % vector_width == 0`. To make sure vectorization is effective, this PR adjusts the block and thread sizes accordingly (see the code change for details). On the other hand, when the output shapes are weird (e.g., prime numbers), the selected block and thread sizes may be too small. For example, if the output shape is `(311, 3814)`, then factors are `(1, 2, 311, 1907, 3814)`. As a result, we may select `(block, thread) = (2, 311)` with the consideration of the maximum `(block, thread) = (256, 1024)`. In this case, we don't utilize the compute resources well even `half2` is enabled. Ideally, we should pad the output to let the factors always be power of two, but it is too complicate and may introduce other issues. Accordingly, another heuristic introduced by this PR is that when `(select_block * select_thread) / (max_block * max_thread) < R`, then we don't apply the change and let the vectorization failed. Here is the evaluation results when `R=0.7`. * Workloads: FP32Mul_FP16Add, FP16Mul_FP16Add, FP16Mul, Cast, FP32Mul_FP32Add, FP32Mul. * Output shapes: I manually assigned two shapes (768, 3072), (1, 1000) and randomly generated additional 100 shapes ranging from 1 to 4096. * Platform: NVIDIA T4 and V100. For each platform, I displayed the worst, the best, and the average speedup of all workloads over the current upstream. * T4: Worst 0.98x, Best, 1.41x, Average 1.12x. * V100: Worst 0.97x, Best 1.33x, Average 1.15x. cc @vinx13 @wpan11nv @Laurawly @masahi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
