JannePeltonen replied on github web page:

platform/linux-generic/odp_crypto.c
@@ -137,15 +139,56 @@ struct odp_crypto_global_s {
        odp_spinlock_t                lock;
        odp_crypto_generic_session_t *free;
        odp_crypto_generic_session_t  sessions[MAX_SESSIONS];
+
+       /* These bitfields are cleared at alloc_session()
+        * together with the rest of data */
+       bitset_t hmac_valid[ODP_THREAD_COUNT_MAX]


Comment:
I think the benefits of using a bitset may not outweigh the drawbacks. In 
addition to making the code slightly more complicated, using a bitset forces us 
to use atomic operations (since, as you pointed out, we still have to handle 
the case where thread A allocates session X while thread B is initializing 
session Y in the same bit set, both updating the same memory location, although 
different bits).

By using a full byte for the validity flag would simplify the code and avoid 
the need for atomics since different threads would no longer write and read the 
same memory location at the same time (there must be some synchronization in 
the application side to ensure that a session is not destroyed when still in 
use and that creation of a session happens before its use).

The atomics can slow down ARM because the bitset tries to provide atomicity 
over a 16 byte range, which causes additional overhead (e.g. reading a valid 
bit is not just a plain memory load) even though not required in this context. 
A weaker atomic bitset would suffice, but I think we could do without atomics 
altogether.

A benefit of the bit set is reduced memory use (and possibly some improvement 
in memory access locality), but I doubt it is significant enough. The savings 
are less than MAX_SESSIONS * ODP_THREAD_COUNT_MAX bytes (after combining the 
crypto/auth valid bits as in the last path). At the same time the thread local 
crypto_local_t structs are using 12 or 24 times MAX_SESSIONS bytes per thread, 
depending on pointer size, plus the memory reserved by OpenSSL for all the 
pre-created contexts.

So, how about just using a full byte for the validity flag?


> Dmitry Eremin-Solenikov(lumag) wrote:
> @JannePeltonen even with your suggestion there can be congestion between 
> set-as-valid session1 and clear-valid session2 (which can happen in 
> completely different thread). In fact I'm balancing between two possible 
> solutions:
> 
> - always init all contexts on session init/destroy them on session destroy
> - just add a per-thread lock guarding writing to per-thread bits
> 
> Any suggestions? 


>> Dmitry Eremin-Solenikov(lumag) wrote:
>> Ok, I will add some cleaning to _create stage. At lease it is not a fast 
>> path operation, so performance should not matter that much.


>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>> Hmm. I like this idea. I will implement it, thank you.


>>>> JannePeltonen wrote
>>>> I think the atomic cmp-xchg is too heavyweight here.
>>>> 
>>>> AFAICS, it is needed only because of the way the valid bits for different 
>>>> threads are packed in the same memory locations. The need for the syncing 
>>>> atomic op could be avoided by just using a whole uint8_t for one thread. I 
>>>> do not think the increase in memory use would be so big that it would be a 
>>>> problem.
>>>> 
>>>> If you really want to keep packing the valid bits, then you could do so 
>>>> also by having per-thread arrays of valid bits for all sessions in the 
>>>> global SHM instead of the current approach of per-session arrays of valid 
>>>> bits for all threads. That would avoid the syncing needs since the valid 
>>>> bits would not share memory locations with other threads.
>>>> 
>>>> IOW, you would have something like this in the global SHM:
>>>> 
>>>> struct openssl_ctx_status {
>>>>    crypto_valid cipher_valid[(MAX_SESSIONS + CV_BITS - 1) / CV_BITS];
>>>>    crypto_valid   hmac_valid[(MAX_SESSIONS + CV_BITS - 1) / CV_BITS];
>>>> } [ODP_THREAD_COUNT_MAX];


>>>>> JannePeltonen wrote
>>>>> I did not say the keys should not be cleared at destroy() but that it 
>>>>> would be better (as in, more difficult to introduce a bug later) if the 
>>>>> alloc did not rely on the clearing of the valid bits etc having happened 
>>>>> earlier (in destroy or in global init). But this is not a sticking point 
>>>>> for me.
>>>>> 
>>>>> Btw, if you are concerned about storing keys in memory for too long, then 
>>>>> this patch does just that by keeping the openssl contexts alive until 
>>>>> local_term(). 


>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>> Checkpatch wants this enclosed in parentheses. Since `IS_VALID` follows 
>>>>>> that model, no harm in doing so with `SET_VALID` as well.
>>>>>> ```
>>>>>> ERROR: Macros with complex values should be enclosed in parentheses
>>>>>> #63: FILE: platform/linux-generic/odp_crypto.c:131:
>>>>>> +#define SET_VALID(session, type, id)    \
>>>>>> +        session->type ## _valid[id / CV_SIZE] |= (1 << (id % CV_SIZE))
>>>>>> ```


>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>> No, keys should be deleted ASAP - during `odp_crypto_session_destroy()`.


>>>>>>>> JannePeltonen wrote
>>>>>>>> Yes, the hmac reset of course (what was I thinking?). But for that 
>>>>>>>> this should be enough (I have used it in my test patch with OpenSSL 
>>>>>>>> 1.1.0 and 1.0.2): HMAC_Init_ex(ctx, NULL, 0, NULL, NULL) and be more 
>>>>>>>> clear in that key len is not being changed.
>>>>>>>> 
>>>>>>>> Looking at the man page I do not think HMAC_Init_Ex() is going to even 
>>>>>>>> look at key_len when key is NULL.


>>>>>>>>> JannePeltonen wrote
>>>>>>>>> ok, I can see it now. I think it would be less fragile to clear the 
>>>>>>>>> session at alloc() instead of relying it to be cleared at 
>>>>>>>>> init_global() and destroy().


>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>> OK, I will add a comment. This is necessary to restart (reset) HMAC 
>>>>>>>>>> calculation without resetting the key and auth length.


>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>>> Whole `odp_crypto_generic_session_t` data is cleared during 
>>>>>>>>>>> `odp_crypto_session_destroy()`


>>>>>>>>>>>> JannePeltonen wrote
>>>>>>>>>>>> Why is this needed here? The key length was already set. If this 
>>>>>>>>>>>> is because of some OpenSSL weirdness, an explanatory comment would 
>>>>>>>>>>>> be nice.


>>>>>>>>>>>>> JannePeltonen wrote
>>>>>>>>>>>>> The cipher_valid and hmac_valid bits are not cleared at session 
>>>>>>>>>>>>> allocation, so they can be left to what they were in an old and 
>>>>>>>>>>>>> freed session that occupied the same memory slot. 


>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>> Then I'd follow "Maxim's rule" and do the renames as a separate 
>>>>>>>>>>>>>> follow-on PR.


>>>>>>>>>>>>>>> muvarov wrote
>>>>>>>>>>>>>>> I prefer the rule that new code goes in in clean way. Even if 
>>>>>>>>>>>>>>> rest of the code used old rules.


>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>> That makes sense to me.


>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>>>>>>>>> Then I would prefer to use names w/o underscore for now and 
>>>>>>>>>>>>>>>>> to change all of them later.


>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>> I don't see the need to do that at this point as these are 
>>>>>>>>>>>>>>>>>> internal routines anyway. That type of restructure might be 
>>>>>>>>>>>>>>>>>> something to consider for Caterpillar. 


>>>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>>>>>>>>>>> Hmm. The rest of init/term functions do not have such 
>>>>>>>>>>>>>>>>>>> prefix. Should we change them?


>>>>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>>>>>>>>>>>> No, `sizeof uint32_t`


>>>>>>>>>>>>>>>>>>>>> muvarov wrote
>>>>>>>>>>>>>>>>>>>>> `_odp_crypto_term_local`


>>>>>>>>>>>>>>>>>>>>>> muvarov wrote
>>>>>>>>>>>>>>>>>>>>>> `_odp_crypto_init_local`


>>>>>>>>>>>>>>>>>>>>>>> muvarov wrote
>>>>>>>>>>>>>>>>>>>>>>> is 32 everywhere MAX_SESSIONS?


https://github.com/Linaro/odp/pull/342#discussion_r160913698
updated_at 2018-01-11 10:20:28

Reply via email to