banach-space wrote: **UPDATE: 30/7/25**
* This [commit](https://github.com/llvm/llvm-project/pull/149293/commits/56108b1df69e150c475adc58880ca7dce5355b21) addresses the remaining comments from @hanhanW . * I have rebased this PR on top of https://github.com/llvm/llvm-project/pull/151334. This rebase addresses this [comment](https://github.com/llvm/llvm-project/pull/149293#discussion_r2237499014) from @egebeysel . **GENERAL OBSERVATIONS + FUTURE STEPS** Having implemented #151334, I now realise that we don't require separate vector sizes for the _write_ operation (there's a small twist though). To illustrate, take this example: ```mlir func.func @example(%source: tensor<8x4x16x16xf32>, %dest: tensor<64x127xf32>) -> tensor<64x127xf32> { %0 = linalg.unpack %source outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [16, 16] into %dest : tensor<8x4x16x16xf32> -> tensor<64x127xf32> return %0 : tensor<64x127xf32> } ``` It will be vectorized as: ```mlir func.func @example(%arg0: tensor<8x4x16x16xf32>, %arg1: tensor<64x127xf32>) -> tensor<64x127xf32> { %cst = arith.constant 0.000000e+00 : f32 %c0 = arith.constant 0 : index // This is key - vec Op 1 !!! %0 = vector.transfer_read %arg0[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true, true]} : tensor<8x4x16x16xf32>, vector<8x4x16x16xf32> // This is key - vec Op 2 !!! %1 = vector.transpose %0, [1, 2, 0, 3] : vector<8x4x16x16xf32> to vector<4x16x8x16xf32> // This is key - vec Op 3 !!! %2 = vector.shape_cast %1 : vector<4x16x8x16xf32> to vector<64x128xf32> %c0_0 = arith.constant 0 : index // This is key - vec Op 4!!! %3 = vector.transfer_write %2, %arg1[%c0_0, %c0_0] {in_bounds = [true, false]} : vector<64x128xf32>, tensor<64x127xf32> return %3 : tensor<64x127xf32> } ``` Now, once we vectorize the read operation, the remaining sizes are already pre-determined (i.e. the sizes for the _write_ operation): * For `vector.transpose`, the sizes must match the sizes from `vector.transfer_read` (% permutation). * For `vector.shape_cast`, the input must match the output of `vector.transpose`. The output is uniquely determined by e.g. applying `outer_dims_perm` from `linalg.unpack` to the output from `vector.transpose`. * For `vector.transfer_write`, we have to use the output shape from `vector.shape_cast`. TL;Dr We should only require sizes for the _write_ operation. **TWIST** While we should be able to infer the scalable flags, there is some logic still missing. This should not be a problem though. **NEXT STEPS** While we could land this as is (IREE integration looks fine: https://github.com/iree-org/iree/pull/21514, thanks @hanhanW ) and then iterate in-tree, it might be "healthier" if there's one self-contained change. Let me refine this and then integrate into IREE (to make sure that the integration works). Also, @hanhanW , lets sync offline and make sure that switching to "only vector sizes for the read Op" is going to work for IREE. WDYT? https://github.com/llvm/llvm-project/pull/149293 _______________________________________________ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits