Forward to the list from PM
> So you now use m128i registers and do instructions on 128 bit wide
yes
> values? I assume that what you are doing is what is usually called bit
> slicing?
yes
> For both Cell and SSE2, do you use interlacing of instructions to fill
> up the pipeline (and usually improve speed by a factor 2 or 3)? (not
> sure if that really helps on bit sliced calculations though)
i did not pay special attention to the instruction scheduling.
The heavy load loop is unrolled and gcc will have plenty of opportunity
to reorder the instructions.
Also the loop body is executed ~ 650m times per second on a 2ghz core,
effectivly using 3 clocks per loop for the 7 instructions (unrolled)
on the cell spu it is half as fast despite higher clocks, so i should try to
optimize there.
(but the SPU is also not fully multiscalar, so should be half as fast)
with the SPU being in-order, instruction instruction reordering and scheduling
may pay off.
This function does 90% of the work:
nshift == 0, lenght= 19 or 22 or 23, RT=ssevector(uint64, uint64)
template <int base, int length, int nshift, typename RT>
void lsh_reg(RT * regs, RT clock3) {
RT clock2 = ~clock3;
int i;
for (i = base - length + 1; i < base; ++i) {
regs[i] = regs[i + nshift] & clock2 | regs[i + nshift + 1] & clock3;
}
Before loop unrolling:
xmm0: clock2
xmm6: ~clock2
720(...): regs[i + nshift]
736(...): regs[i + nshift + 1]
L41:
.loc 2 110 0
movdqa 720(%ecx,%eax), %xmm1
movdqa 736(%ecx,%eax), %xmm2
L18:
.loc 2 111 0
pand %xmm0, %xmm1
pand %xmm6, %xmm2
por %xmm2, %xmm1
movdqa %xmm1, 720(%ecx,%eax)
addl $16, %eax
.loc 2 110 0
cmpl $288, %eax
jne L41
After loop unrolling this pattern repeats:
xmm2, xmm6: clock2, ~clock2, 720(), 736() regs[...]
movdqa %xmm2, %xmm0
movdqa %xmm6, %xmm1
pand 720(%ebx,%eax), %xmm0
pand 736(%ebx,%eax), %xmm1
por %xmm1, %xmm0
movdqa %xmm0, 720(%ebx,%eax)
leal 32(%edx), %eax
______________________________________________________
GRATIS für alle WEB.DE-Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://movieflat.web.de
_______________________________________________
A51 mailing list
[email protected]
http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51