Hal Finney wrote: >> Steven M. Bellovin writes: >> > >>>>Dan Bernstein has a new cache timing attack on AES: >>>> http://cr.yp.to/antiforgery/cachetiming-20050414.pdf > >> >> This is a pretty alarming attack. Bernstein was actually able to recover >> the AES key using a somewhat artificial server which reported its own >> timing to do an AES operation. In principle the same attack would be >> possible on a typical remote server, using a larger number of requests >> to average out timing variations.
This is a very nice piece of work by Bernstein but I am not convinced about the practical significance of the attack. And I certainly don't see any reason to abandon some of the design approaches (e.g table lookup) as he has been suggesting just because they are exploitable in some situations. In many situations they are not exploitable at all and in those situations where they might cause problems it is up to system designers to decide whether or not they need to deploy countermeasures. >> He also had some critical things to say about the AES selection >> process, which apparently dropped the ball on this issue. Other ciphers >> got dinged for nonconstant execution time, but no one thought that cache >> misses would be that significant. >> Dan has more information at http://cr.yp.to/mac.html, including >> graphs showing the variability in timings for various >> implementations at http://cr.yp.to/mac/variability1.html and >> http://cr.yp.to/mac/variability2.html, and his own attempt at a (nearly) >> constant-time AES implementation as part of his poly1305-aes MAC function. >> >> It would be interesting to see how the speed of his AES implementation >> compares to that of other widely used versions like Brian Gladman's. >> How much speed penalty do you pay to get the constant speed? As Dan >> notes, you can easily make a slow AES that runs at constant speed, but >> a fast one is far more difficult. Nevertheless Bernstein has shown up one issue that I had not been conscious of and this is that on modern (Intel) x86 systems there is no longer a significant speed penalty for unaligned memory accesses to 32-bit words, a feature that allows AES to be implemented with very much less table space than is normally the case. There is almost no speed penalty in terms of best speed and the typical speed is likely to be a lot better in most practical situations because the load on the cache is greatly reduced. And the timing variability of this code is greatly reduced so its an all round win on the x86. The downside is that, although unaligned accesses on x86 are ok, on many other architectures these cause exceptions and this makes it tedious to build compressed table operation into portable C code. In fact it is so tedious that I am not going to offer this and have instead simply published x86 assembler code which I report on here: http://fp.gladman.plus.com/AES/index.htm For those who can live with x86 only, and with an assembler implementation, this code matches the maximum speed of my large table assembler version on the P3 and P4. Another issue that this raises is that of the crypto API since in those situations where the timing attack matters it is necessary to control the position of the expanded AES key on the stack and this requires that key expansion and encryption is done as one integrated API call, aka: encrypt(key[], in[], out[], no_of_blocks) I hope this helps but if not I will try and answer any other questions. Brian Gladman --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
