On 11/01/2016 10:35 AM, Pádraig Brady wrote:
In your testing below showing that the blake2bp digest is
dependent on OMP_DYNAMIC env variable, that's just wrong,
whether it's a bug in the implementation or assumptions,
it's a bug.


I suspect it's a wrong assumption about the implementation.
The code has:

    #define PARALLELISM_DEGREE 4

    #if defined(_OPENMP)
      #pragma omp parallel shared(S), num_threads(PARALLELISM_DEGREE)
    #else
      for( i = 0; i < PARALLELISM_DEGREE; ++i )
    #endif
      {
    #if defined(_OPENMP)
        size_t      i = omp_get_thread_num();
    #endif

        [[DO SOMETHING WITH i INSIDE THE BLOCK]]
      }


Which seems to imply that 'num_threads(X)' is equivalent to 'for(i=0;<X;++i)' - 
you either get 4 threads or do 4 iterations serially.
But IMHO the above code is incorrect: OpenMP doesn't have to give you 4 threads 
(as evident by OMP_DYNAMIC=TRUE).


The equivalent OpenMP code should be something like this (not tested):

      #pragma omp parallel for shared(S)
      for( i = 0; i < PARALLELISM_DEGREE; ++i )
      {
        [[DO SOMETHING WITH i INSIDE THE BLOCK]]
      }

Which then executes the block correctly, regardless of number of threads (and 
also respects OMP_NUM_THREADS).
However, the current blake2 implementation might need to be adjusted for this 
modification to work efficiently,
and ensure too many threads don't cause problems (if the algorithm can handle 
only 4).


But regardless, I still think we should not include different variants of the 
same algorithm - it will cause too much confusion.

-assaf


Reply via email to