Dmitry Eremin-Solenikov(lumag) replied on github web page:

platform/linux-generic/odp_crypto.c
@@ -137,15 +139,56 @@ struct odp_crypto_global_s {
        odp_spinlock_t                lock;
        odp_crypto_generic_session_t *free;
        odp_crypto_generic_session_t  sessions[MAX_SESSIONS];
+
+       /* These bitfields are cleared at alloc_session()
+        * together with the rest of data */
+       bitset_t hmac_valid[ODP_THREAD_COUNT_MAX]


Comment:
@JannePeltonen this sounds logical, but not 101% convincing, sorry. When we are 
talking about security, I'll try to be on a safe side. Maybe I have to give it 
another thought.

The rest of linux-generic code heavily uses atomic operation (even worse than 
stdatomic, it uses GCC builtins). We do not have (even internal) API for memory 
barriers, so I assumed that atomic flags are best fit for this case (and each 
flags uses just one byte).

> JannePeltonen wrote
> ok, now I can see it. Ignore the previous comment.


>> JannePeltonen wrote
>> After some more pondering I still think additional synchronization for the 
>> valid flag is not necessary.
>> 
>> When an application creates a crypto session and gets a handle back, the 
>> side effects (memory stores) of the creation are not yet necessarily visible 
>> in other threads (according to Petri, this is an intended property of the 
>> ODP API). If the application passes the handle to another thread it must use 
>> some synchronization mechanism to ensure that creation is fully visible in 
>> the other thread before it tries to use the handle. The valid flag is just 
>> one more item in shared memory that needs to be visible before a crypto 
>> operation is started. Right?
>> 
>> Similarly, when a crypto session is destroyed, the crypto operations must 
>> have fully completed before session destroy can be called in another thread. 
>> This again requires some synchronization by the application between the 
>> threads, but that also ensures that setting of the valid bit inside a crypto 
>> operation happens-before session destroy which happens-before (due to the 
>> locking in session alloc and free) the storage of the session is reused 
>> which happens-before clearing the valid bit. So this case should also be ok, 
>> right?
>> 
>> And if the valid flags are all in their own memory locations (i.e. not 
>> bitfields of the same location), then there are no other inter-thread 
>> synchronization concerns regarding the valid flag.
>> 
>> But this requires more than just my word. If you cannot get convinced about 
>> not needing additional synchronization for the valid flag, then you could 
>> also consider memory barriers after each write to the valid flag instead.
>> 
>> If you want to stick with atomic ops, then I am not sure if using stdatomic 
>> is ok as it seems to create an additional dependency to a tool chain that 
>> supports it. Maybe ODP's own atomic primitives should be used.
>> 
>> I do not see why test-and-set would be needed.


>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>> I feel uneasy for propagation of clear flags to other threads. What about 
>>> using `atomic_flag` and `atomic_flag_test_and_set`/`atomic_flag_clear` from 
>>> stdatomics?


>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>> Consider a case of encryption + AES-GMAC.


>>>>> JannePeltonen wrote
>>>>> mac_cipher_ctx does not seem to be needed. There can be only cipher and 
>>>>> mac in a session, so hmac_ctx and cipher_ctx should already be able to 
>>>>> record all the contexts needed for a session, also for AES-GCM.


>>>>>> JannePeltonen wrote
>>>>>> I think the benefits of using a bitset may not outweigh the drawbacks. 
>>>>>> In addition to making the code slightly more complicated, using a bitset 
>>>>>> forces us to use atomic operations (since, as you pointed out, we still 
>>>>>> have to handle the case where thread A allocates session X while thread 
>>>>>> B is initializing session Y in the same bit set, both updating the same 
>>>>>> memory location, although different bits).
>>>>>> 
>>>>>> By using a full byte for the validity flag would simplify the code and 
>>>>>> avoid the need for atomics since different threads would no longer write 
>>>>>> and read the same memory location at the same time (there must be some 
>>>>>> synchronization in the application side to ensure that a session is not 
>>>>>> destroyed when still in use and that creation of a session happens 
>>>>>> before its use).
>>>>>> 
>>>>>> The atomics can slow down ARM because the bitset tries to provide 
>>>>>> atomicity over a 16 byte range, which causes additional overhead (e.g. 
>>>>>> reading a valid bit is not just a plain memory load) even though not 
>>>>>> required in this context. A weaker atomic bitset would suffice, but I 
>>>>>> think we could do without atomics altogether.
>>>>>> 
>>>>>> A benefit of the bit set is reduced memory use (and possibly some 
>>>>>> improvement in memory access locality), but I doubt it is significant 
>>>>>> enough. The savings are less than MAX_SESSIONS * ODP_THREAD_COUNT_MAX 
>>>>>> bytes (after combining the crypto/auth valid bits as in the last path). 
>>>>>> At the same time the thread local crypto_local_t structs are using 12 or 
>>>>>> 24 times MAX_SESSIONS bytes per thread, depending on pointer size, plus 
>>>>>> the memory reserved by OpenSSL for all the pre-created contexts.
>>>>>> 
>>>>>> So, how about just using a full byte for the validity flag?


>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>> @JannePeltonen even with your suggestion there can be congestion 
>>>>>>> between set-as-valid session1 and clear-valid session2 (which can 
>>>>>>> happen in completely different thread). In fact I'm balancing between 
>>>>>>> two possible solutions:
>>>>>>> 
>>>>>>> - always init all contexts on session init/destroy them on session 
>>>>>>> destroy
>>>>>>> - just add a per-thread lock guarding writing to per-thread bits
>>>>>>> 
>>>>>>> Any suggestions? 


>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>> Ok, I will add some cleaning to _create stage. At lease it is not a 
>>>>>>>> fast path operation, so performance should not matter that much.


>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>> Hmm. I like this idea. I will implement it, thank you.


>>>>>>>>>> JannePeltonen wrote
>>>>>>>>>> I think the atomic cmp-xchg is too heavyweight here.
>>>>>>>>>> 
>>>>>>>>>> AFAICS, it is needed only because of the way the valid bits for 
>>>>>>>>>> different threads are packed in the same memory locations. The need 
>>>>>>>>>> for the syncing atomic op could be avoided by just using a whole 
>>>>>>>>>> uint8_t for one thread. I do not think the increase in memory use 
>>>>>>>>>> would be so big that it would be a problem.
>>>>>>>>>> 
>>>>>>>>>> If you really want to keep packing the valid bits, then you could do 
>>>>>>>>>> so also by having per-thread arrays of valid bits for all sessions 
>>>>>>>>>> in the global SHM instead of the current approach of per-session 
>>>>>>>>>> arrays of valid bits for all threads. That would avoid the syncing 
>>>>>>>>>> needs since the valid bits would not share memory locations with 
>>>>>>>>>> other threads.
>>>>>>>>>> 
>>>>>>>>>> IOW, you would have something like this in the global SHM:
>>>>>>>>>> 
>>>>>>>>>> struct openssl_ctx_status {
>>>>>>>>>>      crypto_valid cipher_valid[(MAX_SESSIONS + CV_BITS - 1) / 
>>>>>>>>>> CV_BITS];
>>>>>>>>>>      crypto_valid   hmac_valid[(MAX_SESSIONS + CV_BITS - 1) / 
>>>>>>>>>> CV_BITS];
>>>>>>>>>> } [ODP_THREAD_COUNT_MAX];


>>>>>>>>>>> JannePeltonen wrote
>>>>>>>>>>> I did not say the keys should not be cleared at destroy() but that 
>>>>>>>>>>> it would be better (as in, more difficult to introduce a bug later) 
>>>>>>>>>>> if the alloc did not rely on the clearing of the valid bits etc 
>>>>>>>>>>> having happened earlier (in destroy or in global init). But this is 
>>>>>>>>>>> not a sticking point for me.
>>>>>>>>>>> 
>>>>>>>>>>> Btw, if you are concerned about storing keys in memory for too 
>>>>>>>>>>> long, then this patch does just that by keeping the openssl 
>>>>>>>>>>> contexts alive until local_term(). 


>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>> Checkpatch wants this enclosed in parentheses. Since `IS_VALID` 
>>>>>>>>>>>> follows that model, no harm in doing so with `SET_VALID` as well.
>>>>>>>>>>>> ```
>>>>>>>>>>>> ERROR: Macros with complex values should be enclosed in parentheses
>>>>>>>>>>>> #63: FILE: platform/linux-generic/odp_crypto.c:131:
>>>>>>>>>>>> +#define SET_VALID(session, type, id)      \
>>>>>>>>>>>> +  session->type ## _valid[id / CV_SIZE] |= (1 << (id % CV_SIZE))
>>>>>>>>>>>> ```


>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>>>>> No, keys should be deleted ASAP - during 
>>>>>>>>>>>>> `odp_crypto_session_destroy()`.


>>>>>>>>>>>>>> JannePeltonen wrote
>>>>>>>>>>>>>> Yes, the hmac reset of course (what was I thinking?). But for 
>>>>>>>>>>>>>> that this should be enough (I have used it in my test patch with 
>>>>>>>>>>>>>> OpenSSL 1.1.0 and 1.0.2): HMAC_Init_ex(ctx, NULL, 0, NULL, NULL) 
>>>>>>>>>>>>>> and be more clear in that key len is not being changed.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Looking at the man page I do not think HMAC_Init_Ex() is going 
>>>>>>>>>>>>>> to even look at key_len when key is NULL.


>>>>>>>>>>>>>>> JannePeltonen wrote
>>>>>>>>>>>>>>> ok, I can see it now. I think it would be less fragile to clear 
>>>>>>>>>>>>>>> the session at alloc() instead of relying it to be cleared at 
>>>>>>>>>>>>>>> init_global() and destroy().


>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>>>>>>>> OK, I will add a comment. This is necessary to restart (reset) 
>>>>>>>>>>>>>>>> HMAC calculation without resetting the key and auth length.


>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>>>>>>>>> Whole `odp_crypto_generic_session_t` data is cleared during 
>>>>>>>>>>>>>>>>> `odp_crypto_session_destroy()`


>>>>>>>>>>>>>>>>>> JannePeltonen wrote
>>>>>>>>>>>>>>>>>> Why is this needed here? The key length was already set. If 
>>>>>>>>>>>>>>>>>> this is because of some OpenSSL weirdness, an explanatory 
>>>>>>>>>>>>>>>>>> comment would be nice.


>>>>>>>>>>>>>>>>>>> JannePeltonen wrote
>>>>>>>>>>>>>>>>>>> The cipher_valid and hmac_valid bits are not cleared at 
>>>>>>>>>>>>>>>>>>> session allocation, so they can be left to what they were 
>>>>>>>>>>>>>>>>>>> in an old and freed session that occupied the same memory 
>>>>>>>>>>>>>>>>>>> slot. 


>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>> Then I'd follow "Maxim's rule" and do the renames as a 
>>>>>>>>>>>>>>>>>>>> separate follow-on PR.


>>>>>>>>>>>>>>>>>>>>> muvarov wrote
>>>>>>>>>>>>>>>>>>>>> I prefer the rule that new code goes in in clean way. 
>>>>>>>>>>>>>>>>>>>>> Even if rest of the code used old rules.


>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>> That makes sense to me.


>>>>>>>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>>>>>>>>>>>>>>> Then I would prefer to use names w/o underscore for now 
>>>>>>>>>>>>>>>>>>>>>>> and to change all of them later.


>>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote:
>>>>>>>>>>>>>>>>>>>>>>>> I don't see the need to do that at this point as these 
>>>>>>>>>>>>>>>>>>>>>>>> are internal routines anyway. That type of restructure 
>>>>>>>>>>>>>>>>>>>>>>>> might be something to consider for Caterpillar. 


>>>>>>>>>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> Hmm. The rest of init/term functions do not have such 
>>>>>>>>>>>>>>>>>>>>>>>>> prefix. Should we change them?


>>>>>>>>>>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> No, `sizeof uint32_t`


>>>>>>>>>>>>>>>>>>>>>>>>>>> muvarov wrote
>>>>>>>>>>>>>>>>>>>>>>>>>>> `_odp_crypto_term_local`


>>>>>>>>>>>>>>>>>>>>>>>>>>>> muvarov wrote
>>>>>>>>>>>>>>>>>>>>>>>>>>>> `_odp_crypto_init_local`


>>>>>>>>>>>>>>>>>>>>>>>>>>>>> muvarov wrote
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is 32 everywhere MAX_SESSIONS?


https://github.com/Linaro/odp/pull/342#discussion_r161362164
updated_at 2018-01-13 03:18:25

Reply via email to