https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Bitfield extraction on ppc64le goes the /* Try loading part of OP0 into a register and extracting the bitfield from that. */ unsigned HOST_WIDE_INT bitpos; rtx xop0 = adjust_bit_field_mem_for_reg (pattern, op0, bitsize, bitnum, 0, 0, tmode, &bitpos); way which ends up generating the DImode load using the fact that the struct alignment adds padding after csum_level. The store path OTOH ends up honoring the C++ mem model which says to access the bitfield in ints declared type (IIRC?) and the bit region via DECL_BIT_FIELD_REPRESENTATIVE is of size 8 (because of C++ inheritance that tail padding can be re-used). It looks like we didn't adjust the bitfield read paths for the mem model because in practice it doesn't matter and it may generate larger/slower code not to do loads in larger types on some archs. This leads to the observed load-store / store-load issues. Note that we conservatively compute the extent for DECL_BIT_FIELD_REPRESENTATIVE by prefering smaller modes. There's some ??? in finish_bitfield_representative and the above remark about tail padding re-use is only implemented via prefering smaller modes. Thus when adding a 'long foo' after csum_level the representative doesn't change to 64bit width but stays at 8bits (both are valid from the C++ memory model). Note that the proposed simple lowering of bitfield accesses on GIMPLE would do accesses of DECL_BIT_FIELD_REPRESENTATIVE and thus in this case use byte accesses. I suppose we want to be less conservative about DECL_BIT_FIELD_REPRESENTATIVE and leave it up to the target how to do the actual accesses. Widening the representative generates __skb_decr_checksum_unnecessary: ld 9,8(3) addi 10,9,3 rldicr 9,9,0,61 rldicl 10,10,0,62 or 9,9,10 std 9,8(3) blr