Re: [RFC] Reducing size of BLE Security Manager

will sanfilippo Fri, 20 Jan 2017 12:43:23 -0800

Hopefully I am not going to drag this discussion on too long, but I like this 
stuff so…


The cortex-M processors have byte, half-word and word instructions. It will use 
the appropriate instruction when you access bytes, half-words or words.

For example, here is an excerpt of disassembled code. In this example, req and 
cp point to non-packed structures. These elements are 16-bits in their 
respective structures.

/* Copy timeoput from cp to req */
req->timeout = cp->timeout;
    deb8:       88fa            ldrh    r2, [r7, #6]
    deba:       811a           strh    r2, [r3, #8]

This is the same code but with req being packed. Note that cp is not packed:

/* Copy timeout from cp to req */
req->timeout = cp->timeout;
    df2a:       7998            ldrb    r0, [r3, #6]
    df2c:       7010            strb    r0, [r2, #0]
    df2e:       79db            ldrb    r3, [r3, #7]
    df30:       7053            strb    r3, [r2, #1]

Since the compiler cannot assume that the 16-bit value is aligned within the 
req structure, it has to use byte instructions to store the bytes. This gets 
even worse for unaligned 32-bit values. This is what I was trying to point out 
as one of the pitfalls of using packed structures re: code size.

> 
> Maybe I wasn't clear enough, but those are suppose to be used *only*
> for mapping them to/from memory buffer ie only accessed as pointers.
> So above mentioned  mydata.e32 =50 is not suppose to happen ever.


I think I understand what you are getting at here. I agree; you should limit 
the use of the packed structures to this.

In the end, I really think it depends on what you do with these packed 
structures as to whether or not you will save space, but with your caveat above 
(how you use them), I agree that we will probably see code savings given the 
way the current code is written.

All good! Fun discussion.

> On Jan 20, 2017, at 11:45 AM, Szymon Janc <[email protected]> wrote:
> 
> Hi,
> 
> On 20 January 2017 at 19:14, Christopher Collins <[email protected]> wrote:
>> On Fri, Jan 20, 2017 at 09:45:07AM -0800, will sanfilippo wrote:
>>> I was referring to C code that accesses a packed structure, not necessarily 
>>> the construction part of it. For example: (and in this example I am 
>>> assuming the processor can access bytes anywhere, 16-bit values on 16-bit 
>>> boundaries and 32-bit values on 32-bit boundaries).
>>> 
>>> struct my_struct
>>> {
>>>      uint8_t e8;
>>>      uint16_t e16;
>>>      uint32_t e32;
>>> } __packed__          /* I know this syntax is wrong, just an example */
>>> struct my_struct my_data
>>> 
>>> In your C code when you do this: my_data.e32 = 50, what is the
>>> compiler going to do? If the structure is not packed, it knows it can
>>> use an instruction that accesses words. If the structure is packed,
>>> well, I guess it is up to the compiler what to do. In the past, I have
>>> seen compilers that add code or call functions that will check the
>>> alignment of e32. If e32 happens to reside on a 4-byte boundary it
>>> will use a word instruction; if it happens to reside on a byte
>>> boundary it needs to access the bytes individually to put them in a
>>> register for use.
> 
> Maybe I'm confusing something but isn't it that read from memory is always
> word sized? Even if one access single byte?
> 
>> 
>> I'm not really adding anything here, but here is something I realized
>> recently.  When you tell gcc to pack a struct, it has two effects:
>> 
>>    1. Eliminates padding.
>>    2. Assumes instances of the struct are not properly aligned.
> 
> Yes, that is main reason to use packed structures - to eliminate padding
> and assume unaligned access.
> 
> Maybe I wasn't clear enough, but those are suppose to be used *only*
> for mapping them to/from memory buffer ie only accessed as pointers.
> So above mentioned  mydata.e32 =50 is not suppose to happen ever.
> 
>> For MCUs which don't support unaligned accesses, the second effect may
>> carry some hidden costs. Even if the struct is defined such that it
>> wouldn't contain any padding, and even if all instances of the struct
>> are properly aligned, adding the __packed__ attribute will result in an
>> increase in code size.  The increase occurs because gcc can no longer
>> assume that the struct or any of its members are aligned.
> 
> But how is that worse then reading/writing byte by byte? You need to read
> whole word to access byte, right?
> 
> I did quick test for nrf51dk (bletiny with SM enabled):
> development branch      region `FLASH' overflowed by 26860 bytes
> sm branch                    region `FLASH' overflowed by 25968 bytes
> 
> :-)
> 
> -- 
> pozdrawiam
> Szymon K. Janc

Re: [RFC] Reducing size of BLE Security Manager

Reply via email to