Dmitry Eremin-Solenikov(lumag) replied on github web page: platform/linux-generic/odp_crypto.c @@ -137,15 +139,56 @@ struct odp_crypto_global_s { odp_spinlock_t lock; odp_crypto_generic_session_t *free; odp_crypto_generic_session_t sessions[MAX_SESSIONS]; + + /* These bitfields are cleared at alloc_session() + * together with the rest of data */ + bitset_t hmac_valid[ODP_THREAD_COUNT_MAX]
Comment: @JannePeltonen this sounds logical, but not 101% convincing, sorry. When we are talking about security, I'll try to be on a safe side. Maybe I have to give it another thought. The rest of linux-generic code heavily uses atomic operation (even worse than stdatomic, it uses GCC builtins). We do not have (even internal) API for memory barriers, so I assumed that atomic flags are best fit for this case (and each flags uses just one byte). > JannePeltonen wrote > ok, now I can see it. Ignore the previous comment. >> JannePeltonen wrote >> After some more pondering I still think additional synchronization for the >> valid flag is not necessary. >> >> When an application creates a crypto session and gets a handle back, the >> side effects (memory stores) of the creation are not yet necessarily visible >> in other threads (according to Petri, this is an intended property of the >> ODP API). If the application passes the handle to another thread it must use >> some synchronization mechanism to ensure that creation is fully visible in >> the other thread before it tries to use the handle. The valid flag is just >> one more item in shared memory that needs to be visible before a crypto >> operation is started. Right? >> >> Similarly, when a crypto session is destroyed, the crypto operations must >> have fully completed before session destroy can be called in another thread. >> This again requires some synchronization by the application between the >> threads, but that also ensures that setting of the valid bit inside a crypto >> operation happens-before session destroy which happens-before (due to the >> locking in session alloc and free) the storage of the session is reused >> which happens-before clearing the valid bit. So this case should also be ok, >> right? >> >> And if the valid flags are all in their own memory locations (i.e. not >> bitfields of the same location), then there are no other inter-thread >> synchronization concerns regarding the valid flag. >> >> But this requires more than just my word. If you cannot get convinced about >> not needing additional synchronization for the valid flag, then you could >> also consider memory barriers after each write to the valid flag instead. >> >> If you want to stick with atomic ops, then I am not sure if using stdatomic >> is ok as it seems to create an additional dependency to a tool chain that >> supports it. Maybe ODP's own atomic primitives should be used. >> >> I do not see why test-and-set would be needed. >>> Dmitry Eremin-Solenikov(lumag) wrote: >>> I feel uneasy for propagation of clear flags to other threads. What about >>> using `atomic_flag` and `atomic_flag_test_and_set`/`atomic_flag_clear` from >>> stdatomics? >>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>> Consider a case of encryption + AES-GMAC. >>>>> JannePeltonen wrote >>>>> mac_cipher_ctx does not seem to be needed. There can be only cipher and >>>>> mac in a session, so hmac_ctx and cipher_ctx should already be able to >>>>> record all the contexts needed for a session, also for AES-GCM. >>>>>> JannePeltonen wrote >>>>>> I think the benefits of using a bitset may not outweigh the drawbacks. >>>>>> In addition to making the code slightly more complicated, using a bitset >>>>>> forces us to use atomic operations (since, as you pointed out, we still >>>>>> have to handle the case where thread A allocates session X while thread >>>>>> B is initializing session Y in the same bit set, both updating the same >>>>>> memory location, although different bits). >>>>>> >>>>>> By using a full byte for the validity flag would simplify the code and >>>>>> avoid the need for atomics since different threads would no longer write >>>>>> and read the same memory location at the same time (there must be some >>>>>> synchronization in the application side to ensure that a session is not >>>>>> destroyed when still in use and that creation of a session happens >>>>>> before its use). >>>>>> >>>>>> The atomics can slow down ARM because the bitset tries to provide >>>>>> atomicity over a 16 byte range, which causes additional overhead (e.g. >>>>>> reading a valid bit is not just a plain memory load) even though not >>>>>> required in this context. A weaker atomic bitset would suffice, but I >>>>>> think we could do without atomics altogether. >>>>>> >>>>>> A benefit of the bit set is reduced memory use (and possibly some >>>>>> improvement in memory access locality), but I doubt it is significant >>>>>> enough. The savings are less than MAX_SESSIONS * ODP_THREAD_COUNT_MAX >>>>>> bytes (after combining the crypto/auth valid bits as in the last path). >>>>>> At the same time the thread local crypto_local_t structs are using 12 or >>>>>> 24 times MAX_SESSIONS bytes per thread, depending on pointer size, plus >>>>>> the memory reserved by OpenSSL for all the pre-created contexts. >>>>>> >>>>>> So, how about just using a full byte for the validity flag? >>>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>>> @JannePeltonen even with your suggestion there can be congestion >>>>>>> between set-as-valid session1 and clear-valid session2 (which can >>>>>>> happen in completely different thread). In fact I'm balancing between >>>>>>> two possible solutions: >>>>>>> >>>>>>> - always init all contexts on session init/destroy them on session >>>>>>> destroy >>>>>>> - just add a per-thread lock guarding writing to per-thread bits >>>>>>> >>>>>>> Any suggestions? >>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>>>> Ok, I will add some cleaning to _create stage. At lease it is not a >>>>>>>> fast path operation, so performance should not matter that much. >>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>>>>> Hmm. I like this idea. I will implement it, thank you. >>>>>>>>>> JannePeltonen wrote >>>>>>>>>> I think the atomic cmp-xchg is too heavyweight here. >>>>>>>>>> >>>>>>>>>> AFAICS, it is needed only because of the way the valid bits for >>>>>>>>>> different threads are packed in the same memory locations. The need >>>>>>>>>> for the syncing atomic op could be avoided by just using a whole >>>>>>>>>> uint8_t for one thread. I do not think the increase in memory use >>>>>>>>>> would be so big that it would be a problem. >>>>>>>>>> >>>>>>>>>> If you really want to keep packing the valid bits, then you could do >>>>>>>>>> so also by having per-thread arrays of valid bits for all sessions >>>>>>>>>> in the global SHM instead of the current approach of per-session >>>>>>>>>> arrays of valid bits for all threads. That would avoid the syncing >>>>>>>>>> needs since the valid bits would not share memory locations with >>>>>>>>>> other threads. >>>>>>>>>> >>>>>>>>>> IOW, you would have something like this in the global SHM: >>>>>>>>>> >>>>>>>>>> struct openssl_ctx_status { >>>>>>>>>> crypto_valid cipher_valid[(MAX_SESSIONS + CV_BITS - 1) / >>>>>>>>>> CV_BITS]; >>>>>>>>>> crypto_valid hmac_valid[(MAX_SESSIONS + CV_BITS - 1) / >>>>>>>>>> CV_BITS]; >>>>>>>>>> } [ODP_THREAD_COUNT_MAX]; >>>>>>>>>>> JannePeltonen wrote >>>>>>>>>>> I did not say the keys should not be cleared at destroy() but that >>>>>>>>>>> it would be better (as in, more difficult to introduce a bug later) >>>>>>>>>>> if the alloc did not rely on the clearing of the valid bits etc >>>>>>>>>>> having happened earlier (in destroy or in global init). But this is >>>>>>>>>>> not a sticking point for me. >>>>>>>>>>> >>>>>>>>>>> Btw, if you are concerned about storing keys in memory for too >>>>>>>>>>> long, then this patch does just that by keeping the openssl >>>>>>>>>>> contexts alive until local_term(). >>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>> Checkpatch wants this enclosed in parentheses. Since `IS_VALID` >>>>>>>>>>>> follows that model, no harm in doing so with `SET_VALID` as well. >>>>>>>>>>>> ``` >>>>>>>>>>>> ERROR: Macros with complex values should be enclosed in parentheses >>>>>>>>>>>> #63: FILE: platform/linux-generic/odp_crypto.c:131: >>>>>>>>>>>> +#define SET_VALID(session, type, id) \ >>>>>>>>>>>> + session->type ## _valid[id / CV_SIZE] |= (1 << (id % CV_SIZE)) >>>>>>>>>>>> ``` >>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>>>>>>>>> No, keys should be deleted ASAP - during >>>>>>>>>>>>> `odp_crypto_session_destroy()`. >>>>>>>>>>>>>> JannePeltonen wrote >>>>>>>>>>>>>> Yes, the hmac reset of course (what was I thinking?). But for >>>>>>>>>>>>>> that this should be enough (I have used it in my test patch with >>>>>>>>>>>>>> OpenSSL 1.1.0 and 1.0.2): HMAC_Init_ex(ctx, NULL, 0, NULL, NULL) >>>>>>>>>>>>>> and be more clear in that key len is not being changed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Looking at the man page I do not think HMAC_Init_Ex() is going >>>>>>>>>>>>>> to even look at key_len when key is NULL. >>>>>>>>>>>>>>> JannePeltonen wrote >>>>>>>>>>>>>>> ok, I can see it now. I think it would be less fragile to clear >>>>>>>>>>>>>>> the session at alloc() instead of relying it to be cleared at >>>>>>>>>>>>>>> init_global() and destroy(). >>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>>>>>>>>>>>> OK, I will add a comment. This is necessary to restart (reset) >>>>>>>>>>>>>>>> HMAC calculation without resetting the key and auth length. >>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>>>>>>>>>>>>> Whole `odp_crypto_generic_session_t` data is cleared during >>>>>>>>>>>>>>>>> `odp_crypto_session_destroy()` >>>>>>>>>>>>>>>>>> JannePeltonen wrote >>>>>>>>>>>>>>>>>> Why is this needed here? The key length was already set. If >>>>>>>>>>>>>>>>>> this is because of some OpenSSL weirdness, an explanatory >>>>>>>>>>>>>>>>>> comment would be nice. >>>>>>>>>>>>>>>>>>> JannePeltonen wrote >>>>>>>>>>>>>>>>>>> The cipher_valid and hmac_valid bits are not cleared at >>>>>>>>>>>>>>>>>>> session allocation, so they can be left to what they were >>>>>>>>>>>>>>>>>>> in an old and freed session that occupied the same memory >>>>>>>>>>>>>>>>>>> slot. >>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>> Then I'd follow "Maxim's rule" and do the renames as a >>>>>>>>>>>>>>>>>>>> separate follow-on PR. >>>>>>>>>>>>>>>>>>>>> muvarov wrote >>>>>>>>>>>>>>>>>>>>> I prefer the rule that new code goes in in clean way. >>>>>>>>>>>>>>>>>>>>> Even if rest of the code used old rules. >>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>> That makes sense to me. >>>>>>>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>>>>>>>>>>>>>>>>>>> Then I would prefer to use names w/o underscore for now >>>>>>>>>>>>>>>>>>>>>>> and to change all of them later. >>>>>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>>>>> I don't see the need to do that at this point as these >>>>>>>>>>>>>>>>>>>>>>>> are internal routines anyway. That type of restructure >>>>>>>>>>>>>>>>>>>>>>>> might be something to consider for Caterpillar. >>>>>>>>>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>>>>>>>>>>>>>>>>>>>>> Hmm. The rest of init/term functions do not have such >>>>>>>>>>>>>>>>>>>>>>>>> prefix. Should we change them? >>>>>>>>>>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> No, `sizeof uint32_t` >>>>>>>>>>>>>>>>>>>>>>>>>>> muvarov wrote >>>>>>>>>>>>>>>>>>>>>>>>>>> `_odp_crypto_term_local` >>>>>>>>>>>>>>>>>>>>>>>>>>>> muvarov wrote >>>>>>>>>>>>>>>>>>>>>>>>>>>> `_odp_crypto_init_local` >>>>>>>>>>>>>>>>>>>>>>>>>>>>> muvarov wrote >>>>>>>>>>>>>>>>>>>>>>>>>>>>> is 32 everywhere MAX_SESSIONS? https://github.com/Linaro/odp/pull/342#discussion_r161362164 updated_at 2018-01-13 03:18:25