http://frozencache.blogspot.com/Frozen CacheA blog about the development of a general-purpose solution for mitigating cold-boot attacks on Full-Disk-Encryption solutions. The concept
Cold boot attacks are a major risk for the protection that
Full-Disk-Encryption solutions provide. Any powered-on computer is
vulnerable to this attack and until now there has been no
general-purpose solution to this problem.
This entry details my solution for mitigating cold boot attacks. Future posts will delve into additional details and describe what other ideas (and problems) were considered on the path to finding a general-purpose solution. Why is this blog called "Frozen Cache"? Because we're proposing the use of the CPU cache for thwarting cold boot attacks (at least on most X86 systems). In contrast to most blogs, the entries of this blog will appear in normal chronological order on the website (for a more paper-like reading). Feeds however, will be sorted as expected (reverse chronological order). The concept is easy: by switching the cache into a special mode one can force that data remains in the cache and is not written to the backing RAM locations. Thus, the encryption key can't be extracted from RAM. This technique is actually not new: LinuxBIOS/CoreBoot calls this Cache-as-RAM. They use it to allow "RAM access", even before the memory controller is initialized. The following (simplified and technically not 100% correct/complete) steps load and maintain a 256 bit encryption key in the CPU cache. The demo assembly code assumes the encryption key is stored at linear address X in RAM on a page boundary for simplicity:
Please note that this post only described the very basic concept, there are many aspects that haven't been covered. Upcoming posts will address things like multi-CPU/Core issues, performance considerations/optimizations and lots of other stuff. Performance aspectsAs previously mentioned,
performance is a major concern when the cache is switched into no-fill
mode ("frozen"). The broad range of CPU architectures (multi-CPU,
multi-core, multi-threads) and respective cache configurations makes
the matter even more complex.
First off: only a single CPU's cache needs to be "frozen" in order to effectively protect the encryption key; other CPUs are allowed to operate in normal cache mode. This is true for as long as each (logical/virtual) CPU uses its own cache exclusively: CPUs that employ threading technology (like Intel's HyperThreading) appear as two (or potentially more) logical CPUs, but these two CPUs share the same cache and because of this must both change into no-fill cache mode. The situation may be different for multi-core CPUs, if the cores all have their own (L1 and L2) caches. The encryption key resides only in a single CPU's cache. Only this CPU must therefore execute the encryption and decryption routines. The most prevalent architecture among Full-Disk-Encryption solutions is to employ a kernel module, which spawns a designated kernel thread for the encryption and decryption logic. Kernel threads are schedulable entities and are therefore bindable onto the CPU, which holds the encryption key in its cache. Back to more "traditional" performance aspects. What can be done to minimize the impact of freezing the CPU cache? Loading the most frequently used memory areas into the cache (before freezing it) is a great start. Among the highest potential candidates are: the system call entry point, the timer interrupt routine and its "helper" functions and the encryption/decryption functions executed by the kernel thread. Current L2 caches are usually large enough to hold all this code, but one also needs to consider the cache's associativity in order to not shoot one into the foot. Another good idea is to schedule all other processes onto any of the other available CPUs (which don't use the frozen cache): this allows for them to be executed at "normal" speed. There's another why this is important, but we'll get to this some other time. It should be obvious by now, that an implementation will have to identify the specific CPU/cache components at runtime and "manage" them accordingly. My proof-of-concept implementation for Linux will be purely for single-CPU systems (for simplicity), but I'll explain the technical details in a future post (like so many other things). Lack of cache controlManagement of the cache contents isn't over once the cache has
been "frozen": it is also important, that the data in the cache (the
encryption key) isn't written back to memory. Unfortunately, the Intel
architecture allows only very minimalistic cache control:
That's it. There are no processor instructions for querying the status of the cache (the currently held RAM locations in the individual cache lines) or any other "advanced" cache management functions. Therefore, it is nearly impossible to verify that the encryption key is really only present in the CPU cache. With the frozen cache setup, it's pretty much guaranteed that the key will be present in the cache - but that doesn't say anything about whether that data hasn't been sync'ed to RAM. This happens (in the frozen cache setup) whenever the wbinvd instruction is executed; this instruction can be executed by any code running in ring 0 (kernel). Therefore, it is important to minimize the (kernel) code that runs on the CPU which holds the encryption key in its cache. This is why binding the other schedulable entities (at least all other kernel threads) onto other CPUs (if present) is important, too. One
way to minimize the impact of "unintentional" cache flushes
(unintentional from our point of view) is to repeat the cache freezing
procedure periodically in order to reverse the effects of
"unintentional" cache flushes (wbinvd). Fortunately, there's at least a
(theoretically) better solution for Linux: modify the function/macro
that executes/wraps the invd/wbinvd instructions in the kernel to
trigger the re-execution of the cache freezing (independent of how
realistic the chances of integration of such a patch seem). Protecting the encryption key
The cryptographic key is not the only data item that needs to be kept
in the CPU cache in order to keep it from prying eyes/spraycans.
Key scheduling is an established paradigm in modern cryptographic ciphers: the encryption and decryption routines don't use the encryption key directly, but rather employ "round keys". These are derived from the encryption key (by a cipher-specific algorithm) and are than used in the encryption/decryption in the various rounds of the algorithm. The AES standard defines that a 128 bit AES key is used to generate/derive 10 round keys of 128 bit each (for 192/256 bit AES keys 12/14 round keys are calculated respectively). One would expect that these round keys are calculated from the encryption key by some non-reversible hash-like-function. Unfortunately, this isn't true for AES: the encryption key is easily re-calculatable from any of the 10/12/14 round keys. Thus: these round keys need to be kept inside the CPU cache as well (at least for AES, which is used quite often). Sounds easy in theory, but especially for Linux it's quite a challenge to find a nicely structured approach instead of just hacking something together. Locking the screenOne important aspect of my
proposition is that the performance impact is only in effect whenever
the screen is locked (only than are the keys stored "safely" in the CPU
cache). However, there are very likely situations in which one would
like to lock the screen but not suffer the performance impact (such as
compiling software over lunch).
I foresee two strategies for maintaining native system performance:
The time-window approach could be something like a count-down which starts right when the screen is locked (and the user might be still in front of the computer). Clicking on the "don't freeze the cache" button during the countdown would prevent the key protection - while ignoring it would lead to the desired protection (thus addressing the case the computer auto-locks the screen, it would only add a small window of additional exposure for the encryption key). Protecting the encryption key: it is not just the encryption key
I've followed the coverage on Slashdot,
Hack-a-day
and other sites: it's great that my effort is being exposed to such a
broad audience, but it seems that there is a misunderstanding about the
details of my research.
I supposedly suggested to protect only the encryption key by "removing" it from RAM and keeping in the CPU cache. However, this is not the case, as I've previously stated in the entry "Protecting the encryption key": Thus: these round keys need to be kept inside the CPU cache as well (at least for AES, which is used quite often).Maybe the misunderstanding about what all should be "protected" arose because of the demo code in my first blog entry; it only shows how to "move" 256 bits to the CPU cache and how to than "freeze" the CPU cache. That assembly code was only meant to demonstrate the core concept, not more. Anyhow, I would like to take this opportunity and to explain a bit more thoroughly what all needs to be kept in the CPU cache in order to achieve "perfect" protection against cold-boot-attacks. Obviously, the "encryption key" needs to be protected (duh). Secondly, the "key schedule" (I previously called these "round keys") need to be protected; the "key schedule" is derived directly from the "encryption key" and could be seen as being an "expanded" version of the "encryption key". Thirdly, one should aim to protect the "Initialization Vector" (IV). Whether the IV in fact needs to be really kept secret depends on how the IV is determined/generated. The "Encrypted Salt-Sector Initialization Vector" (ESSIV) is one example of an IV that should definitively be protected: the ESSIV is a hash of the "encryption key" and is the default IV used by dm-crypt on Linux. Forthly, any buffers containing the contents of decrypted sectors should be protected in order to prevent known plain-text attacks for ciphers that are vulnerable to this attack (however, protecting these memory buffers is especially tricky). Lastly, any data values calculated during the encryption/decryption should be "securely" stored in the CPU cache in order to prevent key analysis. I hinted in the "Protecting the encryption key" entry that designing an elegant implementation is somewhat troublesome on Linux. I would like to explain why: some key relevant data is kept in data structures maintained by dm-crypt(.c), but other data items are calculated and managed by the crypto API. The proof-of-concept implementation therefore requires changes in multiple parts of the Linux kernel, this makes it a bit more challenging - unless one is willing to resort to ugly hacks (which I am not). Controlling the uncontrollable cache
I think I found a solution for the last significant challenge, which
I've described in the blog entry "Lack
of cache control":
it's the uncertainty about whether any data in the CPU cache has been
flushed out into RAM. This flushing could be initiated by CPU
instructions like invd, wbinvd and clflush (Thanks, haxwell) or even external events
like signals on a CPU pin (although this is just my personal
speculation).
I've previously suggested this approach to minimize this risk: One way to minimize the impact of "unintentional" cache flushes (unintentional from our point of view) is to repeat the cache freezing procedure periodically in order to reverse the effects of "unintentional" cache flushes (wbinvd).My new idea would eliminate this risk all together. However, I haven't actually verified yet, whether this idea can actually be implemented. Keep this in mind while reading the next paragraphs. It is also important to understand the difference between physical/linear and virtual memory addresses; if you don't know what they are then you should read this before you read on. The idea is actually quite simple: keep the data in the cache on physical/linear addresses which aren't backed by RAM on the system. This would guarantee that the data won't leave the CPU cache, even if a cache flush is triggered (a GP would be raised). What I haven't verified, is whether it is actually possible to set up this scenario. The setup procedure might look something like this:
One last note: obviously, if one has 4GB of RAM then there would be no invalid linear addresses - PAE might be a possibility, but that's a problem for much later. Subscribe to:
Posts (Atom)
|