http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089
--- Comment #29 from Oleg Endo <olegendo at gcc dot gnu.org> 2013-02-16
11:36:37 UTC ---
Another case taken from CSiBE / bzip2, where reusing the intermediate shift
result would be better:
void uInt64_from_UInt32s ( UInt64* n, UInt32 lo32, UInt32 hi32 )
{
n->b[7] = (UChar)((hi32 >> 24) & 0xFF);
n->b[6] = (UChar)((hi32 >> 16) & 0xFF);
n->b[5] = (UChar)((hi32 >> 8) & 0xFF);
n->b[4] = (UChar) (hi32 & 0xFF);
/*
n->b[3] = (UChar)((lo32 >> 24) & 0xFF);
n->b[2] = (UChar)((lo32 >> 16) & 0xFF);
n->b[1] = (UChar)((lo32 >> 8) & 0xFF);
n->b[0] = (UChar) (lo32 & 0xFF);
*/
}
on rev 196091 with -O2 -m4 compiles to:
mov r6,r0
shlr16 r0
shlr8 r0
mov.b r0,@(7,r4)
mov r6,r0
shlr16 r0
mov.b r0,@(6,r4)
mov r6,r0
shlr8 r0
mov.b r0,@(5,r4)
mov r6,r0
mov.b r0,@(4,r4)
which would be better as:
mov r6,r0
mov.b r0,@(4,r4)
shlr8 r0
mov.b r0,@(5,r4)
shlr8 r0
mov.b r0,@(6,r4)
shlr8 r0
mov.b r0,@(7,r4)
this would require reordering of the mem stores, which should be OK to do if
the mem is not volatile.
Reordering the stores manually:
void uInt64_from_UInt32s ( UInt64* n, UInt32 lo32, UInt32 hi32 )
{
n->b[4] = (UChar) (hi32 & 0xFF);
n->b[5] = (UChar)((hi32 >> 8) & 0xFF);
n->b[6] = (UChar)((hi32 >> 16) & 0xFF);
n->b[7] = (UChar)((hi32 >> 24) & 0xFF);
}
still results in:
mov r6,r0
mov.b r0,@(4,r4)
mov r6,r0
shlr8 r0
mov.b r0,@(5,r4)
mov r6,r0
shlr16 r0
mov.b r0,@(6,r4)
mov r6,r0
shlr16 r0
shlr8 r0
mov.b r0,@(7,r4)
... at least this case should be handled, I think.