On 11/01/2016 10:35 AM, Pádraig Brady wrote:
In your testing below showing that the blake2bp digest is dependent on OMP_DYNAMIC env variable, that's just wrong, whether it's a bug in the implementation or assumptions, it's a bug.
I suspect it's a wrong assumption about the implementation. The code has: #define PARALLELISM_DEGREE 4 #if defined(_OPENMP) #pragma omp parallel shared(S), num_threads(PARALLELISM_DEGREE) #else for( i = 0; i < PARALLELISM_DEGREE; ++i ) #endif { #if defined(_OPENMP) size_t i = omp_get_thread_num(); #endif [[DO SOMETHING WITH i INSIDE THE BLOCK]] } Which seems to imply that 'num_threads(X)' is equivalent to 'for(i=0;<X;++i)' - you either get 4 threads or do 4 iterations serially. But IMHO the above code is incorrect: OpenMP doesn't have to give you 4 threads (as evident by OMP_DYNAMIC=TRUE). The equivalent OpenMP code should be something like this (not tested): #pragma omp parallel for shared(S) for( i = 0; i < PARALLELISM_DEGREE; ++i ) { [[DO SOMETHING WITH i INSIDE THE BLOCK]] } Which then executes the block correctly, regardless of number of threads (and also respects OMP_NUM_THREADS). However, the current blake2 implementation might need to be adjusted for this modification to work efficiently, and ensure too many threads don't cause problems (if the algorithm can handle only 4). But regardless, I still think we should not include different variants of the same algorithm - it will cause too much confusion. -assaf