A little guidance on threading needed.

mratsim Fri, 19 Jan 2024 12:45:18 -0800

Looking at the algorithm on Wikipedia: <https://en.wikipedia.org/wiki/Argon2>
    
    
    Function Argon2
       Inputs:
          password (P):       Bytes (0..232-1)    Password (or message) to be 
hashed
          salt (S):           Bytes (8..232-1)    Salt (16 bytes recommended 
for password hashing)
          parallelism (p):    Number (1..224-1)   Degree of parallelism (i.e. 
number of threads)
          tagLength (T):      Number (4..232-1)   Desired number of returned 
bytes
          memorySizeKB (m):   Number (8p..232-1)  Amount of memory (in 
kibibytes) to use
          iterations (t):     Number (1..232-1)   Number of iterations to 
perform
          version (v):        Number (0x13)       The current version is 0x13 
(19 decimal)
          key (K):            Bytes (0..232-1)    Optional key (Errata: PDF 
says 0..32 bytes, RFC says 0..232 bytes)
          associatedData (X): Bytes (0..232-1)    Optional arbitrary extra data
          hashType (y):       Number (0=Argon2d, 1=Argon2i, 2=Argon2id)
       Output:
          tag:                Bytes (tagLength)   The resulting generated 
bytes, tagLength bytes long
       
       Generate initial 64-byte block H0.
        All the input parameters are concatenated and input as a source of 
additional entropy.
        Errata: RFC says H0 is 64-bits; PDF says H0 is 64-bytes.
        Errata: RFC says the Hash is H^, the PDF says it's ℋ (but doesn't 
document what ℋ is). It's actually Blake2b.
        Variable length items are prepended with their length as 32-bit 
little-endian integers.
       buffer ← parallelism ∥ tagLength ∥ memorySizeKB ∥ iterations ∥ version ∥ 
hashType
             ∥ Length(password)       ∥ Password
             ∥ Length(salt)           ∥ salt
             ∥ Length(key)            ∥ key
             ∥ Length(associatedData) ∥ associatedData
       H0 ← Blake2b(buffer, 64) //default hash size of Blake2b is 64-bytes
       
       Calculate number of 1 KB blocks by rounding down memorySizeKB to the 
nearest multiple of 4*parallelism kibibytes
       blockCount ← Floor(memorySizeKB, 4*parallelism)
       
       Allocate two-dimensional array of 1 KiB blocks (parallelism rows x 
columnCount columns)
       columnCount ← blockCount / parallelism;   //In the RFC, columnCount is 
referred to as q
       
       Compute the first and second block (i.e. column zero and one ) of each 
lane (i.e. row)
       for i ← 0 to parallelism-1 do for each row
          Bi[0] ← Hash(H0 ∥ 0 ∥ i, 1024) //Generate a 1024-byte digest
          Bi[1] ← Hash(H0 ∥ 1 ∥ i, 1024) //Generate a 1024-byte digest
       
       Compute remaining columns of each lane
       for i ← 0 to parallelism-1 do //for each row
          for j ← 2 to columnCount-1 do //for each subsequent column
             //i' and j' indexes depend if it's Argon2i, Argon2d, or Argon2id 
(See section 3.4)
             i′, j′ ← GetBlockIndexes(i, j)  //the GetBlockIndexes function is 
not defined
             Bi[j] = G(Bi[j-1], Bi′[j′]) //the G hash function is not defined
       
       Further passes when iterations > 1
       for nIteration ← 2 to iterations do
          for i ← 0 to parallelism-1 do for each row
            for j ← 0 to columnCount-1 do //for each subsequent column
               //i' and j' indexes depend if it's Argon2i, Argon2d, or Argon2id 
(See section 3.4)
               i′, j′ ← GetBlockIndexes(i, j)
               if j == 0 then
                 Bi[0] = Bi[0] xor G(Bi[columnCount-1], Bi′[j′])
               else
                 Bi[j] = Bi[j] xor G(Bi[j-1], Bi′[j′])
       
       Compute final block C as the XOR of the last column of each row
       C ← B0[columnCount-1]
       for i ← 1 to parallelism-1 do
          C ← C xor Bi[columnCount-1]
       
       Compute output tag
       return Hash(C, tagLength)
    
    
    Run


You need a threadpool that supports data parallelism / parallel for.

You can get away with OpenMP by using the `||` operator 
<https://nim-lang.org/docs/system.html#%7C%7C.i%2CS%2CT%2Cstaticstring>

Using <https://github.com/mratsim/laser/blob/master/laser/openmp.nim> for extra 
syntax sugar, you can use
    
    
    for i in 0 || (omp_get_num_threads() - 1):
      myArray[i] = <...>
    
    
    Run

For OpenMP to work, you normally need to pass `passc:-fopenmp` and 
`passl:-fopenmp` either in the command line or via pragma. With the openmp.nim 
utility you just need to pass `-d:openmp` on the command-line.

(On a Mac, the default Clang deos not support OpenMP you have to install GCC or 
Clang from Homebrew).

A little guidance on threading needed.

Reply via email to