On 05/07/16 13:57, Kyrill Tkachov wrote:
Hi Bernd,
On 04/07/16 19:02, Bernd Schmidt wrote:
On 07/01/2016 11:18 AM, Kyrill Tkachov wrote:
In this arm wrong-code PR the struct assignment goes wrong when
expanding constructor elements to a register destination
when the constructor elements are signed bitfields less than a word wide.
In this testcase we're intialising a struct with a 16-bit signed
bitfield to -1 followed by a 1-bit bitfield to 0.
Before it starts storing the elements it zeroes out the register.
The code in store_constructor extends the first field to word size
because it appears at the beginning of a word.
It sign-extends the -1 to word size. However, when it later tries to
store the 0 to bitposition 16 it has some logic
to avoid redundant zeroing since the destination was originally cleared,
so it doesn't emit the zero store.
But the previous sign-extended -1 took up the whole word, so the
position of the second bitfield contains a set bit.
This patch fixes the problem by zeroing out the bits of the widened
field that did not appear in the original value,
so that we can safely avoid storing the second zero in the constructor.
[...]
Bootstrapped and tested on arm, aarch64, x86_64 though the codepath is
gated on WORD_REGISTER_OPERATIONS I didn't
expect any effect on aarch64 and x86_64 anyway.
So - that code path starts with this comment:
/* If this initializes a field that is smaller than a
word, at the start of a word, try to widen it to a full
word. This special case allows us to output C++ member
function initializations in a form that the optimizers
can understand. */
Doesn't your patch completely defeat the purpose of this? Would you get
better/identical code by just deleting this block? It seems unfortunate to have
two different code generation approaches like this.
It would be interesting to know the effects of your patch, and the effects of removing this code entirely, on generated code. Try to find the motivating C++ member function example perhaps? Maybe another possibility is to ensure this
doesn't happen if the value would be interpreted as signed.
Doing some archaeology shows this code was added back in 1998 with no testcase
(r22567) so I'd have to do more digging.
My interpretation of that comment was that for WORD_REGISTER_OPERATIONS targets
it's more beneficial to have word-size
operations, so the expansion code tries to emit as many of the operations in
word_mode as it safely can.
With my patch it still emits a word_mode operation, it's just that the
immediate that is moved in word_mode has it's
top bits zeroed out instead of sign-extended.
I'll build SPEC2006 on arm (a WORD_REGISTER_OPERATIONS target) and inspect the
assembly to see if I see any interesting
effects, but I suspect there won't be much change.
Ok, I found a bit of time to investigate.
On SPEC2006 I didn't see a difference in codegen with or without that whole
block
(or with and without this patch).
The differences in codegen can be observed on the testcase in the patch.
Currently (without this patch) we generate the wrong-code:
main:
strd r3, lr, [sp, #-8]!
movw r3, #:lower16:d
mov r2, #-1
movt r3, #:upper16:d
str r2, [r3]
bl abort
With this patch that performs the zero extension on the loaded immediate but
still
widens the operation to word_mode we generate (the correct):
main:
strd r3, lr, [sp, #-8]!
movw r3, #:lower16:d
movw r2, #65535
movt r3, #:upper16:d
mov r0, #0
str r2, [r3]
cbnz r0, .L5
pop {r3, pc}
.L5:
bl abort
If I remove that whole block of code we end up emitting a short operation with
a bitfield
move operation, so we generate:
main:
strd r3, lr, [sp, #-8]!
mov r2, #0
mov r1, #-1
movw r3, #:lower16:d
bfi r2, r1, #0, #16
movt r3, #:upper16:d
mov r0, #0
str r2, [r3]
cbnz r0, .L5
pop {r3, pc}
.L5:
bl abort
which is correct, but suboptimal compared to the codegen with this patch.
Based on that, I think that code block is a useful optimisation, we just need
to take care with immediates.
What do you think?
Thanks,
Kyrill