Hi,
On 20 January 2017 at 21:42, will sanfilippo <[email protected]> wrote:
> Hopefully I am not going to drag this discussion on too long, but I like this
> stuff so…
>
> The cortex-M processors have byte, half-word and word instructions. It will
> use the appropriate instruction when you access bytes, half-words or words.
>
> For example, here is an excerpt of disassembled code. In this example, req
> and cp point to non-packed structures. These elements are 16-bits in their
> respective structures.
>
> /* Copy timeoput from cp to req */
> req->timeout = cp->timeout;
> deb8: 88fa ldrh r2, [r7, #6]
> deba: 811a strh r2, [r3, #8]
>
> This is the same code but with req being packed. Note that cp is not packed:
>
> /* Copy timeout from cp to req */
> req->timeout = cp->timeout;
> df2a: 7998 ldrb r0, [r3, #6]
> df2c: 7010 strb r0, [r2, #0]
> df2e: 79db ldrb r3, [r3, #7]
> df30: 7053 strb r3, [r2, #1]
>
> Since the compiler cannot assume that the 16-bit value is aligned within the
> req structure, it has to use byte instructions to store the bytes. This gets
> even worse for unaligned 32-bit values. This is what I was trying to point
> out as one of the pitfalls of using packed structures re: code size.
Right, but this is pretty much the same as result of (x is 32bit int)
u8ptr[0] = (uint8_t)x;
u8ptr[1] = (uint8_t)(x >> 8);
u8ptr[2] = (uint8_t)(x >> 16);
u8ptr[3] = (uint8_t)(x >> 24);
which you cannot avoid when constructing PDU do be send over the wire
if not using packed structure for those.
For the record:
https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/ARM-Options.html
-munaligned-access
-mno-unaligned-access
Enables (or disables) reading and writing of 16- and 32- bit values
from addresses that are not 16- or 32- bit aligned. By default
unaligned access is disabled for all pre-ARMv6 and all ARMv6-M
architectures, and enabled for all other architectures. If unaligned
access is not enabled then words in packed data structures are
accessed a byte at a time.
So I think we should be safe that GCC won't do any crazy things like
taking different code path depending on address being aligned or not.
>>
>> Maybe I wasn't clear enough, but those are suppose to be used *only*
>> for mapping them to/from memory buffer ie only accessed as pointers.
>> So above mentioned mydata.e32 =50 is not suppose to happen ever.
>
>
> I think I understand what you are getting at here. I agree; you should limit
> the use of the packed structures to this.
>
> In the end, I really think it depends on what you do with these packed
> structures as to whether or not you will save space, but with your caveat
> above (how you use them), I agree that we will probably see code savings
> given the way the current code is written.
>
> All good! Fun discussion.
Indeed! :)
>
>> On Jan 20, 2017, at 11:45 AM, Szymon Janc <[email protected]> wrote:
>>
>> Hi,
>>
>> On 20 January 2017 at 19:14, Christopher Collins <[email protected]> wrote:
>>> On Fri, Jan 20, 2017 at 09:45:07AM -0800, will sanfilippo wrote:
>>>> I was referring to C code that accesses a packed structure, not
>>>> necessarily the construction part of it. For example: (and in this example
>>>> I am assuming the processor can access bytes anywhere, 16-bit values on
>>>> 16-bit boundaries and 32-bit values on 32-bit boundaries).
>>>>
>>>> struct my_struct
>>>> {
>>>> uint8_t e8;
>>>> uint16_t e16;
>>>> uint32_t e32;
>>>> } __packed__ /* I know this syntax is wrong, just an example */
>>>> struct my_struct my_data
>>>>
>>>> In your C code when you do this: my_data.e32 = 50, what is the
>>>> compiler going to do? If the structure is not packed, it knows it can
>>>> use an instruction that accesses words. If the structure is packed,
>>>> well, I guess it is up to the compiler what to do. In the past, I have
>>>> seen compilers that add code or call functions that will check the
>>>> alignment of e32. If e32 happens to reside on a 4-byte boundary it
>>>> will use a word instruction; if it happens to reside on a byte
>>>> boundary it needs to access the bytes individually to put them in a
>>>> register for use.
>>
>> Maybe I'm confusing something but isn't it that read from memory is always
>> word sized? Even if one access single byte?
>>
>>>
>>> I'm not really adding anything here, but here is something I realized
>>> recently. When you tell gcc to pack a struct, it has two effects:
>>>
>>> 1. Eliminates padding.
>>> 2. Assumes instances of the struct are not properly aligned.
>>
>> Yes, that is main reason to use packed structures - to eliminate padding
>> and assume unaligned access.
>>
>> Maybe I wasn't clear enough, but those are suppose to be used *only*
>> for mapping them to/from memory buffer ie only accessed as pointers.
>> So above mentioned mydata.e32 =50 is not suppose to happen ever.
>>
>>> For MCUs which don't support unaligned accesses, the second effect may
>>> carry some hidden costs. Even if the struct is defined such that it
>>> wouldn't contain any padding, and even if all instances of the struct
>>> are properly aligned, adding the __packed__ attribute will result in an
>>> increase in code size. The increase occurs because gcc can no longer
>>> assume that the struct or any of its members are aligned.
>>
>> But how is that worse then reading/writing byte by byte? You need to read
>> whole word to access byte, right?
>>
>> I did quick test for nrf51dk (bletiny with SM enabled):
>> development branch region `FLASH' overflowed by 26860 bytes
>> sm branch region `FLASH' overflowed by 25968 bytes
>>
>> :-)
>>
>> --
>> pozdrawiam
>> Szymon K. Janc
>
--
pozdrawiam
Szymon K. Janc