Dmitry Eremin-Solenikov(lumag) replied on github web page: platform/linux-generic/odp_crypto.c @@ -137,15 +139,56 @@ struct odp_crypto_global_s { odp_spinlock_t lock; odp_crypto_generic_session_t *free; odp_crypto_generic_session_t sessions[MAX_SESSIONS]; + + /* These bitfields are cleared at alloc_session() + * together with the rest of data */ + bitset_t hmac_valid[ODP_THREAD_COUNT_MAX]
Comment: I feel uneasy for propagation of clear flags to other threads. What about using `atomic_flag` and `atomic_flag_test_and_set`/`atomic_flag_clear` from stdatomics? > Dmitry Eremin-Solenikov(lumag) wrote: > Consider a case of encryption + AES-GMAC. >> JannePeltonen wrote >> mac_cipher_ctx does not seem to be needed. There can be only cipher and mac >> in a session, so hmac_ctx and cipher_ctx should already be able to record >> all the contexts needed for a session, also for AES-GCM. >>> JannePeltonen wrote >>> I think the benefits of using a bitset may not outweigh the drawbacks. In >>> addition to making the code slightly more complicated, using a bitset >>> forces us to use atomic operations (since, as you pointed out, we still >>> have to handle the case where thread A allocates session X while thread B >>> is initializing session Y in the same bit set, both updating the same >>> memory location, although different bits). >>> >>> By using a full byte for the validity flag would simplify the code and >>> avoid the need for atomics since different threads would no longer write >>> and read the same memory location at the same time (there must be some >>> synchronization in the application side to ensure that a session is not >>> destroyed when still in use and that creation of a session happens before >>> its use). >>> >>> The atomics can slow down ARM because the bitset tries to provide atomicity >>> over a 16 byte range, which causes additional overhead (e.g. reading a >>> valid bit is not just a plain memory load) even though not required in this >>> context. A weaker atomic bitset would suffice, but I think we could do >>> without atomics altogether. >>> >>> A benefit of the bit set is reduced memory use (and possibly some >>> improvement in memory access locality), but I doubt it is significant >>> enough. The savings are less than MAX_SESSIONS * ODP_THREAD_COUNT_MAX bytes >>> (after combining the crypto/auth valid bits as in the last path). At the >>> same time the thread local crypto_local_t structs are using 12 or 24 times >>> MAX_SESSIONS bytes per thread, depending on pointer size, plus the memory >>> reserved by OpenSSL for all the pre-created contexts. >>> >>> So, how about just using a full byte for the validity flag? >>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>> @JannePeltonen even with your suggestion there can be congestion between >>>> set-as-valid session1 and clear-valid session2 (which can happen in >>>> completely different thread). In fact I'm balancing between two possible >>>> solutions: >>>> >>>> - always init all contexts on session init/destroy them on session destroy >>>> - just add a per-thread lock guarding writing to per-thread bits >>>> >>>> Any suggestions? >>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>> Ok, I will add some cleaning to _create stage. At lease it is not a fast >>>>> path operation, so performance should not matter that much. >>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>> Hmm. I like this idea. I will implement it, thank you. >>>>>>> JannePeltonen wrote >>>>>>> I think the atomic cmp-xchg is too heavyweight here. >>>>>>> >>>>>>> AFAICS, it is needed only because of the way the valid bits for >>>>>>> different threads are packed in the same memory locations. The need for >>>>>>> the syncing atomic op could be avoided by just using a whole uint8_t >>>>>>> for one thread. I do not think the increase in memory use would be so >>>>>>> big that it would be a problem. >>>>>>> >>>>>>> If you really want to keep packing the valid bits, then you could do so >>>>>>> also by having per-thread arrays of valid bits for all sessions in the >>>>>>> global SHM instead of the current approach of per-session arrays of >>>>>>> valid bits for all threads. That would avoid the syncing needs since >>>>>>> the valid bits would not share memory locations with other threads. >>>>>>> >>>>>>> IOW, you would have something like this in the global SHM: >>>>>>> >>>>>>> struct openssl_ctx_status { >>>>>>> crypto_valid cipher_valid[(MAX_SESSIONS + CV_BITS - 1) / >>>>>>> CV_BITS]; >>>>>>> crypto_valid hmac_valid[(MAX_SESSIONS + CV_BITS - 1) / >>>>>>> CV_BITS]; >>>>>>> } [ODP_THREAD_COUNT_MAX]; >>>>>>>> JannePeltonen wrote >>>>>>>> I did not say the keys should not be cleared at destroy() but that it >>>>>>>> would be better (as in, more difficult to introduce a bug later) if >>>>>>>> the alloc did not rely on the clearing of the valid bits etc having >>>>>>>> happened earlier (in destroy or in global init). But this is not a >>>>>>>> sticking point for me. >>>>>>>> >>>>>>>> Btw, if you are concerned about storing keys in memory for too long, >>>>>>>> then this patch does just that by keeping the openssl contexts alive >>>>>>>> until local_term(). >>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>> Checkpatch wants this enclosed in parentheses. Since `IS_VALID` >>>>>>>>> follows that model, no harm in doing so with `SET_VALID` as well. >>>>>>>>> ``` >>>>>>>>> ERROR: Macros with complex values should be enclosed in parentheses >>>>>>>>> #63: FILE: platform/linux-generic/odp_crypto.c:131: >>>>>>>>> +#define SET_VALID(session, type, id) \ >>>>>>>>> + session->type ## _valid[id / CV_SIZE] |= (1 << (id % CV_SIZE)) >>>>>>>>> ``` >>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>>>>>> No, keys should be deleted ASAP - during >>>>>>>>>> `odp_crypto_session_destroy()`. >>>>>>>>>>> JannePeltonen wrote >>>>>>>>>>> Yes, the hmac reset of course (what was I thinking?). But for that >>>>>>>>>>> this should be enough (I have used it in my test patch with OpenSSL >>>>>>>>>>> 1.1.0 and 1.0.2): HMAC_Init_ex(ctx, NULL, 0, NULL, NULL) and be >>>>>>>>>>> more clear in that key len is not being changed. >>>>>>>>>>> >>>>>>>>>>> Looking at the man page I do not think HMAC_Init_Ex() is going to >>>>>>>>>>> even look at key_len when key is NULL. >>>>>>>>>>>> JannePeltonen wrote >>>>>>>>>>>> ok, I can see it now. I think it would be less fragile to clear >>>>>>>>>>>> the session at alloc() instead of relying it to be cleared at >>>>>>>>>>>> init_global() and destroy(). >>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>>>>>>>>> OK, I will add a comment. This is necessary to restart (reset) >>>>>>>>>>>>> HMAC calculation without resetting the key and auth length. >>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>>>>>>>>>> Whole `odp_crypto_generic_session_t` data is cleared during >>>>>>>>>>>>>> `odp_crypto_session_destroy()` >>>>>>>>>>>>>>> JannePeltonen wrote >>>>>>>>>>>>>>> Why is this needed here? The key length was already set. If >>>>>>>>>>>>>>> this is because of some OpenSSL weirdness, an explanatory >>>>>>>>>>>>>>> comment would be nice. >>>>>>>>>>>>>>>> JannePeltonen wrote >>>>>>>>>>>>>>>> The cipher_valid and hmac_valid bits are not cleared at >>>>>>>>>>>>>>>> session allocation, so they can be left to what they were in >>>>>>>>>>>>>>>> an old and freed session that occupied the same memory slot. >>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>> Then I'd follow "Maxim's rule" and do the renames as a >>>>>>>>>>>>>>>>> separate follow-on PR. >>>>>>>>>>>>>>>>>> muvarov wrote >>>>>>>>>>>>>>>>>> I prefer the rule that new code goes in in clean way. Even >>>>>>>>>>>>>>>>>> if rest of the code used old rules. >>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>> That makes sense to me. >>>>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>>>>>>>>>>>>>>>> Then I would prefer to use names w/o underscore for now >>>>>>>>>>>>>>>>>>>> and to change all of them later. >>>>>>>>>>>>>>>>>>>>> Bill Fischofer(Bill-Fischofer-Linaro) wrote: >>>>>>>>>>>>>>>>>>>>> I don't see the need to do that at this point as these >>>>>>>>>>>>>>>>>>>>> are internal routines anyway. That type of restructure >>>>>>>>>>>>>>>>>>>>> might be something to consider for Caterpillar. >>>>>>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>>>>>>>>>>>>>>>>>> Hmm. The rest of init/term functions do not have such >>>>>>>>>>>>>>>>>>>>>> prefix. Should we change them? >>>>>>>>>>>>>>>>>>>>>>> Dmitry Eremin-Solenikov(lumag) wrote: >>>>>>>>>>>>>>>>>>>>>>> No, `sizeof uint32_t` >>>>>>>>>>>>>>>>>>>>>>>> muvarov wrote >>>>>>>>>>>>>>>>>>>>>>>> `_odp_crypto_term_local` >>>>>>>>>>>>>>>>>>>>>>>>> muvarov wrote >>>>>>>>>>>>>>>>>>>>>>>>> `_odp_crypto_init_local` >>>>>>>>>>>>>>>>>>>>>>>>>> muvarov wrote >>>>>>>>>>>>>>>>>>>>>>>>>> is 32 everywhere MAX_SESSIONS? https://github.com/Linaro/odp/pull/342#discussion_r160961927 updated_at 2018-01-11 13:47:59