Looking at the algorithm on Wikipedia: <https://en.wikipedia.org/wiki/Argon2> Function Argon2 Inputs: password (P): Bytes (0..232-1) Password (or message) to be hashed salt (S): Bytes (8..232-1) Salt (16 bytes recommended for password hashing) parallelism (p): Number (1..224-1) Degree of parallelism (i.e. number of threads) tagLength (T): Number (4..232-1) Desired number of returned bytes memorySizeKB (m): Number (8p..232-1) Amount of memory (in kibibytes) to use iterations (t): Number (1..232-1) Number of iterations to perform version (v): Number (0x13) The current version is 0x13 (19 decimal) key (K): Bytes (0..232-1) Optional key (Errata: PDF says 0..32 bytes, RFC says 0..232 bytes) associatedData (X): Bytes (0..232-1) Optional arbitrary extra data hashType (y): Number (0=Argon2d, 1=Argon2i, 2=Argon2id) Output: tag: Bytes (tagLength) The resulting generated bytes, tagLength bytes long Generate initial 64-byte block H0. All the input parameters are concatenated and input as a source of additional entropy. Errata: RFC says H0 is 64-bits; PDF says H0 is 64-bytes. Errata: RFC says the Hash is H^, the PDF says it's ℋ (but doesn't document what ℋ is). It's actually Blake2b. Variable length items are prepended with their length as 32-bit little-endian integers. buffer ← parallelism ∥ tagLength ∥ memorySizeKB ∥ iterations ∥ version ∥ hashType ∥ Length(password) ∥ Password ∥ Length(salt) ∥ salt ∥ Length(key) ∥ key ∥ Length(associatedData) ∥ associatedData H0 ← Blake2b(buffer, 64) //default hash size of Blake2b is 64-bytes Calculate number of 1 KB blocks by rounding down memorySizeKB to the nearest multiple of 4*parallelism kibibytes blockCount ← Floor(memorySizeKB, 4*parallelism) Allocate two-dimensional array of 1 KiB blocks (parallelism rows x columnCount columns) columnCount ← blockCount / parallelism; //In the RFC, columnCount is referred to as q Compute the first and second block (i.e. column zero and one ) of each lane (i.e. row) for i ← 0 to parallelism-1 do for each row Bi[0] ← Hash(H0 ∥ 0 ∥ i, 1024) //Generate a 1024-byte digest Bi[1] ← Hash(H0 ∥ 1 ∥ i, 1024) //Generate a 1024-byte digest Compute remaining columns of each lane for i ← 0 to parallelism-1 do //for each row for j ← 2 to columnCount-1 do //for each subsequent column //i' and j' indexes depend if it's Argon2i, Argon2d, or Argon2id (See section 3.4) i′, j′ ← GetBlockIndexes(i, j) //the GetBlockIndexes function is not defined Bi[j] = G(Bi[j-1], Bi′[j′]) //the G hash function is not defined Further passes when iterations > 1 for nIteration ← 2 to iterations do for i ← 0 to parallelism-1 do for each row for j ← 0 to columnCount-1 do //for each subsequent column //i' and j' indexes depend if it's Argon2i, Argon2d, or Argon2id (See section 3.4) i′, j′ ← GetBlockIndexes(i, j) if j == 0 then Bi[0] = Bi[0] xor G(Bi[columnCount-1], Bi′[j′]) else Bi[j] = Bi[j] xor G(Bi[j-1], Bi′[j′]) Compute final block C as the XOR of the last column of each row C ← B0[columnCount-1] for i ← 1 to parallelism-1 do C ← C xor Bi[columnCount-1] Compute output tag return Hash(C, tagLength) Run
You need a threadpool that supports data parallelism / parallel for. You can get away with OpenMP by using the `||` operator <https://nim-lang.org/docs/system.html#%7C%7C.i%2CS%2CT%2Cstaticstring> Using <https://github.com/mratsim/laser/blob/master/laser/openmp.nim> for extra syntax sugar, you can use for i in 0 || (omp_get_num_threads() - 1): myArray[i] = <...> Run For OpenMP to work, you normally need to pass `passc:-fopenmp` and `passl:-fopenmp` either in the command line or via pragma. With the openmp.nim utility you just need to pass `-d:openmp` on the command-line. (On a Mac, the default Clang deos not support OpenMP you have to install GCC or Clang from Homebrew).