| Issue |
164853
|
| Summary |
[DAG] Attempt to narrow loads of non-constant aligned offsets
|
| Labels |
missed-optimization,
llvm:SelectionDAG
|
| Assignees |
|
| Reporter |
RKSimon
|
DAGCombiner::reduceLoadWidth handles cases where we are shifting + truncating wide loads, but only for constant shift amounts.
We should be able to do something similar for cases such as below where we're extracting aligned i64 blocks from a i512 load:
```ll
define i64 @load512_extract64(ptr %word, i32 %idx) {
%ld = load i512, ptr %word, align 8
%rem = and i32 %idx, 511 ; idx in bounds
%rem2 = and i32 %rem, -64 ; idx aligned
%sh_prom = zext nneg i32 %rem2 to i512
%sub = lshr i512 %ld, %sh_prom
%res = trunc i512 %sub to i64
ret i64 %res
}
```
By the looks of the codegen, we're not managing this prior to legalisation which spills the i512 to stack, and then manage to do at least some cleanup, but we still end up doing the loading from the stack copy:
```asm
load512_extract64: # @load512_extract64
pushq %rax
vmovups (%rdi), %ymm0
vmovups 32(%rdi), %ymm1
vxorps %xmm2, %xmm2, %xmm2
vmovups %ymm2, -32(%rsp)
vmovups %ymm2, -64(%rsp)
vmovups %ymm1, -96(%rsp)
vmovups %ymm0, -128(%rsp)
shrl $3, %esi
andl $56, %esi
movq -128(%rsp,%rsi), %rax
popq %rcx
vzeroupper
retq
```
Instead it should be possible to do this:
```asm
shrl $3, %esi
andl $56, %esi
movq (%rdi,%rsi), %rax
retq
``
https://zig.godbolt.org/z/YYGaonx5M
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs