> On 17 Aug 2017, at 10:41 AM, Andrew Pinski <pins...@gmail.com> wrote:
> 
> On Wed, Aug 16, 2017 at 3:29 PM, Michael Clark <michaeljcl...@mac.com> wrote:
>> Hi,
>> 
>> Is there any reason for 3 loads being issued for these bitfield accesses, 
>> given two of the loads are bytes, and one is a half; the compiler appears to 
>> know the structure is aligned at a half word boundary. Secondly, the riscv 
>> code is using a mixture of 32-bit and 64-bit adds and shifts. Thirdly, with 
>> -Os the riscv code size is the same, but the schedule is less than optimal. 
>> i.e. the 3rd load is issued much later.
> 
> 
> Well one thing is most likely SLOW_BYTE_ACCESS is set to 0.  This
> forces byte access for bit-field accesses.  The macro is misnamed now
> as it only controls bit-field accesses right now (and one thing in
> dojump dealing with comparisons with and and a constant but that might
> be dead code).  This should allow for you to get the code in hand
> written form.
> I suspect SLOW_BYTE_ACCESS support should be removed and be assumed to
> be 1 but I have not time to look into each backend to see if it is
> correct to do or not.  Maybe it is wrong for AVR.

Thanks, that’s interesting.

So I should try compiling the riscv backend with SLOW_BYTE_ACCESS = 1? Less 
risk than making a change to x86.

This is clearly distinct from slow unaligned access. It seems odd that O3 
doesn’t coalesce loads even if byte access is slow as one would expect the 
additional cost of the additional loads would outweigh the fact that byte 
accesses are not slow unless something weird is happening with the costs of 
loads of different widths.

x86 could also be helped here too. I guess subsequent loads will be served from 
L1, but that’s not really an excuse for this codegen when the element is 
32-bits aligned (unsigned int).

Reply via email to