Re: [RFC] Reducing size of BLE Security Manager

Szymon Janc Fri, 20 Jan 2017 13:22:23 -0800

Hi,

On 20 January 2017 at 21:42, will sanfilippo <[email protected]> wrote:
> Hopefully I am not going to drag this discussion on too long, but I like this 
> stuff so…
>
> The cortex-M processors have byte, half-word and word instructions. It will 
> use the appropriate instruction when you access bytes, half-words or words.
>
> For example, here is an excerpt of disassembled code. In this example, req 
> and cp point to non-packed structures. These elements are 16-bits in their 
> respective structures.
>
> /* Copy timeoput from cp to req */
> req->timeout = cp->timeout;
>     deb8:       88fa            ldrh    r2, [r7, #6]
>     deba:       811a           strh    r2, [r3, #8]
>
> This is the same code but with req being packed. Note that cp is not packed:
>
> /* Copy timeout from cp to req */
> req->timeout = cp->timeout;
>     df2a:       7998            ldrb    r0, [r3, #6]
>     df2c:       7010            strb    r0, [r2, #0]
>     df2e:       79db            ldrb    r3, [r3, #7]
>     df30:       7053            strb    r3, [r2, #1]
>
> Since the compiler cannot assume that the 16-bit value is aligned within the 
> req structure, it has to use byte instructions to store the bytes. This gets 
> even worse for unaligned 32-bit values. This is what I was trying to point 
> out as one of the pitfalls of using packed structures re: code size.


Right, but this is pretty much the same as result of (x is 32bit int)

    u8ptr[0] = (uint8_t)x;
    u8ptr[1] = (uint8_t)(x >> 8);
    u8ptr[2] = (uint8_t)(x >> 16);
    u8ptr[3] = (uint8_t)(x >> 24);

which you cannot avoid when constructing PDU do be send over the wire
if not using packed structure for those.

For the record:
 https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/ARM-Options.html
-munaligned-access
-mno-unaligned-access
Enables (or disables) reading and writing of 16- and 32- bit values
from addresses that are not 16- or 32- bit aligned. By default
unaligned access is disabled for all pre-ARMv6 and all ARMv6-M
architectures, and enabled for all other architectures. If unaligned
access is not enabled then words in packed data structures are
accessed a byte at a time.

So I think we should be safe that GCC won't do any crazy things like
taking different code path depending on address being aligned or not.

>>
>> Maybe I wasn't clear enough, but those are suppose to be used *only*
>> for mapping them to/from memory buffer ie only accessed as pointers.
>> So above mentioned  mydata.e32 =50 is not suppose to happen ever.
>
>
> I think I understand what you are getting at here. I agree; you should limit 
> the use of the packed structures to this.
>
> In the end, I really think it depends on what you do with these packed 
> structures as to whether or not you will save space, but with your caveat 
> above (how you use them), I agree that we will probably see code savings 
> given the way the current code is written.
>
> All good! Fun discussion.

Indeed! :)

>
>> On Jan 20, 2017, at 11:45 AM, Szymon Janc <[email protected]> wrote:
>>
>> Hi,
>>
>> On 20 January 2017 at 19:14, Christopher Collins <[email protected]> wrote:
>>> On Fri, Jan 20, 2017 at 09:45:07AM -0800, will sanfilippo wrote:
>>>> I was referring to C code that accesses a packed structure, not 
>>>> necessarily the construction part of it. For example: (and in this example 
>>>> I am assuming the processor can access bytes anywhere, 16-bit values on 
>>>> 16-bit boundaries and 32-bit values on 32-bit boundaries).
>>>>
>>>> struct my_struct
>>>> {
>>>>      uint8_t e8;
>>>>      uint16_t e16;
>>>>      uint32_t e32;
>>>> } __packed__          /* I know this syntax is wrong, just an example */
>>>> struct my_struct my_data
>>>>
>>>> In your C code when you do this: my_data.e32 = 50, what is the
>>>> compiler going to do? If the structure is not packed, it knows it can
>>>> use an instruction that accesses words. If the structure is packed,
>>>> well, I guess it is up to the compiler what to do. In the past, I have
>>>> seen compilers that add code or call functions that will check the
>>>> alignment of e32. If e32 happens to reside on a 4-byte boundary it
>>>> will use a word instruction; if it happens to reside on a byte
>>>> boundary it needs to access the bytes individually to put them in a
>>>> register for use.
>>
>> Maybe I'm confusing something but isn't it that read from memory is always
>> word sized? Even if one access single byte?
>>
>>>
>>> I'm not really adding anything here, but here is something I realized
>>> recently.  When you tell gcc to pack a struct, it has two effects:
>>>
>>>    1. Eliminates padding.
>>>    2. Assumes instances of the struct are not properly aligned.
>>
>> Yes, that is main reason to use packed structures - to eliminate padding
>> and assume unaligned access.
>>
>> Maybe I wasn't clear enough, but those are suppose to be used *only*
>> for mapping them to/from memory buffer ie only accessed as pointers.
>> So above mentioned  mydata.e32 =50 is not suppose to happen ever.
>>
>>> For MCUs which don't support unaligned accesses, the second effect may
>>> carry some hidden costs. Even if the struct is defined such that it
>>> wouldn't contain any padding, and even if all instances of the struct
>>> are properly aligned, adding the __packed__ attribute will result in an
>>> increase in code size.  The increase occurs because gcc can no longer
>>> assume that the struct or any of its members are aligned.
>>
>> But how is that worse then reading/writing byte by byte? You need to read
>> whole word to access byte, right?
>>
>> I did quick test for nrf51dk (bletiny with SM enabled):
>> development branch      region `FLASH' overflowed by 26860 bytes
>> sm branch                    region `FLASH' overflowed by 25968 bytes
>>
>> :-)
>>
>> --
>> pozdrawiam
>> Szymon K. Janc
>



-- 
pozdrawiam
Szymon K. Janc

Re: [RFC] Reducing size of BLE Security Manager

Reply via email to