On 10/10/16 12:06, Kyrill Tkachov wrote:

On 10/10/16 11:22, Richard Biener wrote:
On Mon, 10 Oct 2016, Kyrill Tkachov wrote:

Hi Richard,

As I mentioned, here is the patch applying to the main store merging patch to
re-implement encode_tree_to_bitpos
to operate on the bytes directly.

This works fine on little-endian but breaks on big-endian, even for merging
bitfields within a single byte.
Consider the code snippet from gcc.dg/store_merging_6.c:

struct bar {
   int a : 3;
   unsigned char b : 4;
   unsigned char c : 1;
   char d;
   char e;
   char f;
   char g;
};

void
foo1 (struct bar *p)
{
   p->b = 3;
   p->a = 2;
   p->c = 1;
   p->d = 4;
   p->e = 5;
}

The correct GIMPLE for these merged stores on big-endian is:
   MEM[(voidD.49 *)p_2(D)] = 18180;
   MEM[(charD.8 *)p_2(D) + 2B] = 5;

whereas with this patch we emit:
   MEM[(voidD.49 *)p_2(D)] = 39428;
   MEM[(charD.8 *)p_2(D) + 2B] = 5;

The dump for merging the individual stores without this patch (using the
correct but costly wide_int approach in the base patch) is:
After writing 3 of size 4 at position 3 the merged region contains:
6 0 0 0 0 0
After writing 2 of size 3 at position 0 the merged region contains:
46 0 0 0 0 0
After writing 1 of size 1 at position 7 the merged region contains:
47 0 0 0 0 0
After writing 4 of size 8 at position 8 the merged region contains:
47 4 0 0 0 0
After writing 5 of size 8 at position 16 the merged region contains:
47 4 5 0 0 0


And with this patch it is:
After writing 3 of size 4 at position 3 the merged region contains:
18 0 0 0 0 0
After writing 2 of size 3 at position 0 the merged region contains:
1a 0 0 0 0 0
After writing 1 of size 1 at position 7 the merged region contains:
9a 0 0 0 0 0
After writing 4 of size 8 at position 8 the merged region contains:
9a 4 0 0 0 0
After writing 5 of size 8 at position 16 the merged region contains:
9a 4 5 0 0 0

(Note the dump just dumps the byte array from index 0 to <len> so the first
thing printed is the lowest numbered byte.
Also, each byte is dumped in hex.)

The code as included here doesn't do any byte swapping for big-endian but as
seen from the dump even writing a sub-byte
bitfield goes wrong so it would be nice to resolve that before going forward.
Any help with debugging this is hugely appreciated. I've included an ASCII
diagram of the steps in the algorithm
in the patch itself.
Ah, I think you need to account for BITS_BIG_ENDIAN in
shift_bytes_in_array.  You have to shift towards MSB which means changing
left to right shifts for BITS_BIG_ENDIAN.

Thanks, I'll try it out. But this is on aarch64 where
BITS_BIG_ENDIAN is 0 even when BYTES_BIG_ENDIAN is 1
so there's something else bad here.

You also seem to miss to account for amnt / BITS_PER_UNIT != 0.
Independently of BYTES_BIG_ENDIAN it would be

   ptr[i + (amnt / BITS_PER_UNIT)] = ptr[i] << amnt;
...

doh, yes. I'll fix that.


Scratch that, just read your other reply.
The precondition for that function is that the shift amount is less than 
BITS_PER_UNIT.
I'll clarify that in the comment.

Kyril

(so best use a single load / store and operate on a temporary).

Thanks,
Kyrill

Richard.

Thanks,
Kyrill



Reply via email to