ritter-x2a wrote:

One thing that's not ideal with this patch is that the lowering for memsets 
with large statically known sizes wastes registers with the SelectionDAG ISel:
In this case, the IR lowering uses a `<64xi32>` store in the main memset loop.  
We correctly legalize this into 16 dwordx4 stores, but the huge `<64xi32>` 
splat value that is stored there (and which consists of 256 times the same 
byte) lives in a different basic block than the stores. SDAG ISel therefore 
doesn't know that those 64 32-bit registers with the same value are not needed 
at the same time and that 4 would be enough (GlobalISel, since it can look 
across BBs, doesn't have this problem).

You can see this for example in `@memset_p0_sz1055_align_4_varsetval` in 
`memset-param-combinations.ll`.

I tried adjusting the IR lowering to put the splat values in the same basic 
block as the accesses, but then they are LICM-ed out again.

I also tried adjusting the lowering to use N(=16) `<4xi32>` stores (with only a 
single `<4xi32>` splat), and while that fixed the register wastage, it made 
code generation worse in a different way, because the SCEV-based strength 
reduction (loop-reduce) then replaces the address computations with new 
computations that don't use `inbounds` and `nuw`, which means that offsets 
cannot be folded into store instructions in various cases. This even happens if 
I change the memset lowering to produce the form that loop-reduce would 
generate: it still re-generates the address computation minus the 
poison-generating flags.

The effect of the register wastage here is probably in practice not very 
dramatic because this only happens for quite large memsets that will take some 
time anyway, but do let me know if you have suggestions on how to avoid it.

https://github.com/llvm/llvm-project/pull/169040
_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

Reply via email to