Hopefully I am not going to drag this discussion on too long, but I like this
stuff so…
The cortex-M processors have byte, half-word and word instructions. It will use
the appropriate instruction when you access bytes, half-words or words.
For example, here is an excerpt of disassembled code. In this example, req and
cp point to non-packed structures. These elements are 16-bits in their
respective structures.
/* Copy timeoput from cp to req */
req->timeout = cp->timeout;
deb8: 88fa ldrh r2, [r7, #6]
deba: 811a strh r2, [r3, #8]
This is the same code but with req being packed. Note that cp is not packed:
/* Copy timeout from cp to req */
req->timeout = cp->timeout;
df2a: 7998 ldrb r0, [r3, #6]
df2c: 7010 strb r0, [r2, #0]
df2e: 79db ldrb r3, [r3, #7]
df30: 7053 strb r3, [r2, #1]
Since the compiler cannot assume that the 16-bit value is aligned within the
req structure, it has to use byte instructions to store the bytes. This gets
even worse for unaligned 32-bit values. This is what I was trying to point out
as one of the pitfalls of using packed structures re: code size.
>
> Maybe I wasn't clear enough, but those are suppose to be used *only*
> for mapping them to/from memory buffer ie only accessed as pointers.
> So above mentioned mydata.e32 =50 is not suppose to happen ever.
I think I understand what you are getting at here. I agree; you should limit
the use of the packed structures to this.
In the end, I really think it depends on what you do with these packed
structures as to whether or not you will save space, but with your caveat above
(how you use them), I agree that we will probably see code savings given the
way the current code is written.
All good! Fun discussion.
> On Jan 20, 2017, at 11:45 AM, Szymon Janc <[email protected]> wrote:
>
> Hi,
>
> On 20 January 2017 at 19:14, Christopher Collins <[email protected]> wrote:
>> On Fri, Jan 20, 2017 at 09:45:07AM -0800, will sanfilippo wrote:
>>> I was referring to C code that accesses a packed structure, not necessarily
>>> the construction part of it. For example: (and in this example I am
>>> assuming the processor can access bytes anywhere, 16-bit values on 16-bit
>>> boundaries and 32-bit values on 32-bit boundaries).
>>>
>>> struct my_struct
>>> {
>>> uint8_t e8;
>>> uint16_t e16;
>>> uint32_t e32;
>>> } __packed__ /* I know this syntax is wrong, just an example */
>>> struct my_struct my_data
>>>
>>> In your C code when you do this: my_data.e32 = 50, what is the
>>> compiler going to do? If the structure is not packed, it knows it can
>>> use an instruction that accesses words. If the structure is packed,
>>> well, I guess it is up to the compiler what to do. In the past, I have
>>> seen compilers that add code or call functions that will check the
>>> alignment of e32. If e32 happens to reside on a 4-byte boundary it
>>> will use a word instruction; if it happens to reside on a byte
>>> boundary it needs to access the bytes individually to put them in a
>>> register for use.
>
> Maybe I'm confusing something but isn't it that read from memory is always
> word sized? Even if one access single byte?
>
>>
>> I'm not really adding anything here, but here is something I realized
>> recently. When you tell gcc to pack a struct, it has two effects:
>>
>> 1. Eliminates padding.
>> 2. Assumes instances of the struct are not properly aligned.
>
> Yes, that is main reason to use packed structures - to eliminate padding
> and assume unaligned access.
>
> Maybe I wasn't clear enough, but those are suppose to be used *only*
> for mapping them to/from memory buffer ie only accessed as pointers.
> So above mentioned mydata.e32 =50 is not suppose to happen ever.
>
>> For MCUs which don't support unaligned accesses, the second effect may
>> carry some hidden costs. Even if the struct is defined such that it
>> wouldn't contain any padding, and even if all instances of the struct
>> are properly aligned, adding the __packed__ attribute will result in an
>> increase in code size. The increase occurs because gcc can no longer
>> assume that the struct or any of its members are aligned.
>
> But how is that worse then reading/writing byte by byte? You need to read
> whole word to access byte, right?
>
> I did quick test for nrf51dk (bletiny with SM enabled):
> development branch region `FLASH' overflowed by 26860 bytes
> sm branch region `FLASH' overflowed by 25968 bytes
>
> :-)
>
> --
> pozdrawiam
> Szymon K. Janc