A couple of embarrassing cases of email incompetence later, let's see if I can manage send this one without breaking something! (I'm voting no.) ;)
On Tue, Aug 7, 2012 at 2:36 AM, Solar Designer <[email protected]> wrote: > Please do write about this. Will do. > It is curious how you deliberately use a narrow pipe for the expensive > component of the KDF (but not for the KDF as a whole). You'd need to > include some explanation/proof that this is safe (given certain threat > models). Here is the reasoning. After the initial hashing, we have k = H(input). Now we must compute H(k || S(k)); one way is to actually compute S(k). The second way is to just compute the outer H for all possible values of S(k), as you mention: On Tue, Aug 7, 2012 at 4:34 AM, Solar Designer <[email protected]> wrote: > Here's an attack: [snip] > This puts an upper bound on the cost, regardless of the (possibly > configurable) cost of the expensive component. It also lets an attacker > avoid the memory cost altogether (but pay for this with the processing > cost of the algorithm above). Indeed, and the function is specified with the intention of reaching this bound. This why I claim the construction's AT cost as being "similar to 2**32 iterations of SHA-1" on the webpage. Notice that the upper bound in fact is 2**32, and not 2**31. You did about (2**32 / 2) * input_space hash function calls. Compare to a hash that is iterated 2**32 times: you would need to do about 2**32 * (input_space / 2) calls. No difference. > Overall, I think that having a narrower-pipe expensive component in a > KDF might be OK (especially if there are multiple invocations of such > components per a full KDF computation), but 32 bits for the only > expensive component is likely too narrow for some otherwise reasonable > uses (cost settings). It is true that 32 bits is quite limiting. This structure works fine with 64-bit words as well, but I want to avoid uint64_t in this program. Going back. On Tue, Aug 7, 2012 at 2:36 AM, Solar Designer <[email protected]> wrote: > Also, I am curious how you derive your area*time cost estimate, and Here are some back-of-the-envelope numbers. SHA-1 in ASIC is 8k GEs and hashes a short input in 85 cycles.[1] 2**32 iterations of this would be about 2**51.4 GEs*cycles, plus unknown overhead. One bit of memory is 1.5 GEs in eDRAM. zen32 uses 2**23 * 32 bits of memory. Let's be pessimistic and assume that the attacker has zero memory latency. A straightforward implementation would take about 2**23 * 4 cycles. This comes down to about 2**53.6 GEs*cycles, plus unknown overhead. There are obvious ways to reduce the latter cost slightly, at least if we stick to the "zero memory latency" story, but it probably cannot be much cheaper than the iterated SHA-1. > whether you considered other metrics of cost (those more relevant for > non-ASIC attacks). Your zen32() does use RAM, but it only uses a tiny > fraction of the CPU's logic (little more than a 32-bit ALU - no SIMD), > which is suboptimal against attacks with CPUs. It looks like CPUs and other general-purpose hardware are only useful against passphrases that are too weak for this purpose in the first place. Granted, some users will use stupid passphrases despite being told how to do it securely, but being harder on the CPU is, to me, not a goal in itself if the attacker isn't constrained to general-purpose hardware. I try to go for cryptography that not even the best-equipped attackers can break. > SIMD-enhanced attacks won't even require scatter/gather addressing > (which, by the way, might become widely available, such as with Intel > MIC) since your array indices are not data-dependent. Also, such > attacks would use the available memory bandwidth more fully. When you > make a 32-bit memory access, an entire cache line is fetched anyway, but > your second loop only uses the 32-bit value. With SIMD or/and similar > grouping of memory accesses for multiple keys, the entire cache line > would be actually made use of. I considered data-dependent array lookups in the manner of ISAAC. It's a good idea in some circumstances, and definitely forces one to face the issue of latency. I decided against it because CPUs leak information via side channels about those lookups. Consider this, for example: attacker gathers a small amount of noisy information about memory lookups near the beginning of this type of random access function. He then runs brute force on the full construction; he can compare the known right memory lookup pattern to the one he ends up doing near the beginning of the random access function, detect gross mismatches and skip to the next candidate passphrase without computing the entire function. > In general, your algorithm appears to be memory bandwidth bound, whereas > scrypt avoids that (by including some non-trivial computation) - it is > memory-hard, but not memory bandwidth bound. > > With your algorithm, an attacker with the same amount of RAM, but with > higher memory bandwidth for random 32-bit accesses (such as with many > independent 32-bit buses), would have an advantage. This is probably > not what you wanted. Specialized hardware is indeed advantageous. One generally wants to increase attack cost per user inconvenience in deriving a key from a passphrase; using longer blocks of memory may indeed help slightly. It would also make the program more complex. > It is entirely possible that I am missing something or/and that I missed > a more serious attack. In short, I mostly care about the asymptotically best brute force attacks, which work by special-purpose hardware. I'm intending to write more formally about this structure when I have the time. [1]: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1428517 _______________________________________________ cryptography mailing list [email protected] http://lists.randombit.net/mailman/listinfo/cryptography
