Dmitry Eremin-Solenikov(lumag) replied on github web page:

platform/linux-generic/odp_crypto.c
@@ -137,15 +139,56 @@ struct odp_crypto_global_s {
        odp_spinlock_t                lock;
        odp_crypto_generic_session_t *free;
        odp_crypto_generic_session_t  sessions[MAX_SESSIONS];
+
+       /* These bitfields are cleared at alloc_session()
+        * together with the rest of data */
+       bitset_t hmac_valid[ODP_THREAD_COUNT_MAX]


Comment:
I feel uneasy for propagation of clear flags to other threads. What about using 
`atomic_flag` and `atomic_flag_test_and_set`/`atomic_flag_clear` from 
stdatomics?

> Dmitry Eremin-Solenikov(lumag) wrote:
> Consider a case of encryption + AES-GMAC.


>> JannePeltonen wrote
>> mac_cipher_ctx does not seem to be needed. There can be only cipher and mac 
>> in a session, so hmac_ctx and cipher_ctx should already be able to record 
>> all the contexts needed for a session, also for AES-GCM.


>>> JannePeltonen wrote
>>> I think the benefits of using a bitset may not outweigh the drawbacks. In 
>>> addition to making the code slightly more complicated, using a bitset 
>>> forces us to use atomic operations (since, as you pointed out, we still 
>>> have to handle the case where thread A allocates session X while thread B 
>>> is initializing session Y in the same bit set, both updating the same 
>>> memory location, although different bits).
>>> 
>>> By using a full byte for the validity flag would simplify the code and 
>>> avoid the need for atomics since different threads would no longer write 
>>> and read the same memory location at the same time (there must be some 
>>> synchronization in the application side to ensure that a session is not 
>>> destroyed when still in use and that creation of a session happens before 
>>> its use).
>>> 
>>> The atomics can slow down ARM because the bitset tries to provide atomicity 
>>> over a 16 byte range, which causes additional overhead (e.g. reading a 
>>> valid bit is not just a plain memory load) even though not required in this 
>>> context. A weaker atomic bitset would suffice, but I think we could do 
>>> without atomics altogether.
>>> 
>>> A benefit of the bit set is reduced memory use (and possibly some 
>>> improvement in memory access locality), but I doubt it is significant 
>>> enough. The savings are less than MAX_SESSIONS * ODP_THREAD_COUNT_MAX bytes 
>>> (after combining the crypto/auth valid bits as in the last path). At the 
>>> same time the thread local crypto_local_t structs are using 12 or 24 times 
>>> MAX_SESSIONS bytes per thread, depending on pointer size, plus the memory 
>>> reserved by OpenSSL for all the pre-created contexts.
>>> 
>>> So, how about just using a full byte for the validity flag?


>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>> @JannePeltonen even with your suggestion there can be congestion between 
>>>> set-as-valid session1 and clear-valid session2 (which can happen in 
>>>> completely different thread). In fact I'm balancing between two possible 
>>>> solutions:
>>>> 
>>>> - always init all contexts on session init/destroy them on session destroy
>>>> - just add a per-thread lock guarding writing to per-thread bits
>>>> 
>>>> Any suggestions? 


>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>> Ok, I will add some cleaning to _create stage. At lease it is not a fast 
>>>>> path operation, so performance should not matter that much.


>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>> Hmm. I like this idea. I will implement it, thank you.


>>>>>>> JannePeltonen wrote
>>>>>>> I think the atomic cmp-xchg is too heavyweight here.
>>>>>>> 
>>>>>>> AFAICS, it is needed only because of the way the valid bits for 
>>>>>>> different threads are packed in the same memory locations. The need for 
>>>>>>> the syncing atomic op could be avoided by just using a whole uint8_t 
>>>>>>> for one thread. I do not think the increase in memory use would be so 
>>>>>>> big that it would be a problem.
>>>>>>> 
>>>>>>> If you really want to keep packing the valid bits, then you could do so 
>>>>>>> also by having per-thread arrays of valid bits for all sessions in the 
>>>>>>> global SHM instead of the current approach of per-session arrays of 
>>>>>>> valid bits for all threads. That would avoid the syncing needs since 
>>>>>>> the valid bits would not share memory locations with other threads.
>>>>>>> 
>>>>>>> IOW, you would have something like this in the global SHM:
>>>>>>> 
>>>>>>> struct openssl_ctx_status {
>>>>>>>         crypto_valid cipher_valid[(MAX_SESSIONS + CV_BITS - 1) / 
>>>>>>> CV_BITS];
>>>>>>>         crypto_valid   hmac_valid[(MAX_SESSIONS + CV_BITS - 1) / 
>>>>>>> CV_BITS];
>>>>>>> } [ODP_THREAD_COUNT_MAX];


>>>>>>>> JannePeltonen wrote
>>>>>>>> I did not say the keys should not be cleared at destroy() but that it 
>>>>>>>> would be better (as in, more difficult to introduce a bug later) if 
>>>>>>>> the alloc did not rely on the clearing of the valid bits etc having 
>>>>>>>> happened earlier (in destroy or in global init). But this is not a 
>>>>>>>> sticking point for me.
>>>>>>>> 
>>>>>>>> Btw, if you are concerned about storing keys in memory for too long, 
>>>>>>>> then this patch does just that by keeping the openssl contexts alive 
>>>>>>>> until local_term(). 


>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>> Checkpatch wants this enclosed in parentheses. Since `IS_VALID` 
>>>>>>>>> follows that model, no harm in doing so with `SET_VALID` as well.
>>>>>>>>> ```
>>>>>>>>> ERROR: Macros with complex values should be enclosed in parentheses
>>>>>>>>> #63: FILE: platform/linux-generic/odp_crypto.c:131:
>>>>>>>>> +#define SET_VALID(session, type, id) \
>>>>>>>>> +     session->type ## _valid[id / CV_SIZE] |= (1 << (id % CV_SIZE))
>>>>>>>>> ```


>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>> No, keys should be deleted ASAP - during 
>>>>>>>>>> `odp_crypto_session_destroy()`.


>>>>>>>>>>> JannePeltonen wrote
>>>>>>>>>>> Yes, the hmac reset of course (what was I thinking?). But for that 
>>>>>>>>>>> this should be enough (I have used it in my test patch with OpenSSL 
>>>>>>>>>>> 1.1.0 and 1.0.2): HMAC_Init_ex(ctx, NULL, 0, NULL, NULL) and be 
>>>>>>>>>>> more clear in that key len is not being changed.
>>>>>>>>>>> 
>>>>>>>>>>> Looking at the man page I do not think HMAC_Init_Ex() is going to 
>>>>>>>>>>> even look at key_len when key is NULL.


>>>>>>>>>>>> JannePeltonen wrote
>>>>>>>>>>>> ok, I can see it now. I think it would be less fragile to clear 
>>>>>>>>>>>> the session at alloc() instead of relying it to be cleared at 
>>>>>>>>>>>> init_global() and destroy().


>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>>>>> OK, I will add a comment. This is necessary to restart (reset) 
>>>>>>>>>>>>> HMAC calculation without resetting the key and auth length.


>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>>>>>> Whole `odp_crypto_generic_session_t` data is cleared during 
>>>>>>>>>>>>>> `odp_crypto_session_destroy()`


>>>>>>>>>>>>>>> JannePeltonen wrote
>>>>>>>>>>>>>>> Why is this needed here? The key length was already set. If 
>>>>>>>>>>>>>>> this is because of some OpenSSL weirdness, an explanatory 
>>>>>>>>>>>>>>> comment would be nice.


>>>>>>>>>>>>>>>> JannePeltonen wrote
>>>>>>>>>>>>>>>> The cipher_valid and hmac_valid bits are not cleared at 
>>>>>>>>>>>>>>>> session allocation, so they can be left to what they were in 
>>>>>>>>>>>>>>>> an old and freed session that occupied the same memory slot. 


>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>> Then I'd follow "Maxim's rule" and do the renames as a 
>>>>>>>>>>>>>>>>> separate follow-on PR.


>>>>>>>>>>>>>>>>>> muvarov wrote
>>>>>>>>>>>>>>>>>> I prefer the rule that new code goes in in clean way. Even 
>>>>>>>>>>>>>>>>>> if rest of the code used old rules.


>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>> That makes sense to me.


>>>>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>>>>>>>>>>>> Then I would prefer to use names w/o underscore for now 
>>>>>>>>>>>>>>>>>>>> and to change all of them later.


>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>> I don't see the need to do that at this point as these 
>>>>>>>>>>>>>>>>>>>>> are internal routines anyway. That type of restructure 
>>>>>>>>>>>>>>>>>>>>> might be something to consider for Caterpillar. 


>>>>>>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>>>>>>>>>>>>>> Hmm. The rest of init/term functions do not have such 
>>>>>>>>>>>>>>>>>>>>>> prefix. Should we change them?


>>>>>>>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>>>>>>>>>>>>>>> No, `sizeof uint32_t`


>>>>>>>>>>>>>>>>>>>>>>>> muvarov wrote
>>>>>>>>>>>>>>>>>>>>>>>> `_odp_crypto_term_local`


>>>>>>>>>>>>>>>>>>>>>>>>> muvarov wrote
>>>>>>>>>>>>>>>>>>>>>>>>> `_odp_crypto_init_local`


>>>>>>>>>>>>>>>>>>>>>>>>>> muvarov wrote
>>>>>>>>>>>>>>>>>>>>>>>>>> is 32 everywhere MAX_SESSIONS?


https://github.com/Linaro/odp/pull/342#discussion_r160961927
updated_at 2018-01-11 13:47:59

Reply via email to