Issue 179607
Summary [X86] Missed Fold: `vgf2p8affineqb(vgf2p8affineqb(x, C1), C2)` => `vgf2p8affineqb(x, C3)`
Labels new issue
Assignees
Reporter WalterKruger
    Given that `vgf2p8affineqb` performs a XOR permutations of the input, and that XORs are associative, a composition of multiple `vgf2p8affineqb` operations can be folded into a one:

```asm
revRunningParity_clang:
        gf2p8affineqb   xmm0, xmmword ptr [rip + .LCPI0_0], 0
        gf2p8affineqb   xmm0, xmmword ptr [rip + .LCPI0_1], 0
        ret
```

```asm
revRunningParity_tgt:
 gf2p8affineqb   xmm0, xmmword ptr [rip + .LCPI1_0], 0
 ret
```

https://godbolt.org/z/536MK4PTq

The matrix for such fold can be calculated by affining the sub-matrix by the super-matrix at a byte, rather then bit, granularity. Given that the sub-matrix's byte order determines the bit-output in reverse order, this "byte affine" should determine the byte output in-order and select the source bytes in-reverse order from `vgf2p8affineqb`. See the following pseudocode code:

```
u64 byteGranularAffine(u64 subMatrix, u64 superMatrix) {
	u64 res = 0

	for i FROM 0...7 {
		u8 matrixRow = superMatrix.byte[i]

		for j FROM 0...7 {
			res.byte[i] ^= matrixRow.bit[j]? subMatrix.byte[7-j] : 0
		}
	}

	return res
}
```

The sub-matrix's immediate also needs to be folded, which can be done by performing a normal `vgf2p8affineqb` on it using the super's matrix and immediate. See #179606.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to