Re: [cryptography] RDRAND and Is it possible to protect against malicious hw accelerators?

Marsh Ray Sat, 18 Jun 2011 17:17:43 -0700

On 06/18/2011 03:08 PM, slinky wrote:


Likewise, for accelerated component functions the hardware will know
what is a key and what is input data - again, it needs this
information in order to operate. Contrast this to a general purpose
processor which can't really deduce what is a key and what isn't
while processing code that happens to be AES.


Why not?

As Peter Gutmann just said "They really have waaaay too much die space
to spare don't they?"

Intel bought McAfee a while ago. Speaking informally with some chippeople (not necessarily from Intel and definitely not under anyconfidentiality) there's active research around the idea of buildinginstruction stream validation in support of antivirus directly into theprocessor. Recognizing an intentionally-obfuscated virus seems no easierthan recognizing AES.

The Intel hardware RNG ("DRBG") is an example of how not do do it. Ithas weird timings and magic numbers:

There are two approaches to structuring RdRand invocations such that
DRBG reseeding can be guaranteed: Iteratively execute RdRand beyond
the DRBG upper bound by executing more than 1022 64-bit RdRands, and
Iteratively execute 32 RdRand invocations with a 10us wait period
per iteration. The latter approach has the effect of forcing a
reseeding event since the DRBG aggressively reseeds during idle
periods.

But on any kind of networked or mutitasking system "idle periods" happenonly at the consent of the attacker.

Within the context of virtualization, the DRNG's stateless design
and atomic instruction access means that RdRand can be used freely
by multiple VMs without the need for hypervisor intervention or
resource management.


Hmm, are we sure none of this carries across any security boundaries?

In another place they say:

After invoking the RdRand instruction, the caller must examine the
Carry Flag (CF) to determine whether a random value was available at
the time the RdRand instruction was executed.

So it's not stateless after all because it keeps a FIFO of numbers andemits all zeroes when it "runs out".


To me, this only makes sense if one of the following might be true:
* the entropy pool is so small that it's in danger of brute-forcing
* the pool's contents can be read out somehow, or

* the extraction process (NIST SP 800-90 CTR_DRBG AES) may not bestrongly one-way

But I know there are other opinions :-). More than likely though,they're doing this to follow "best practices".

At the very least this is going to disclose to an attacker on anothercore how many random numbers you're consuming. Random numbers can oftenbe requested via the network, so he sends changing rates of SSL or IPsechandshake requests while watching how fast the pool depletes. Thisenables him to determine if he's running on the same processor as yourcrypto thread. It may also create a covert channel for exfiltration. Ofcourse, there are other shared resources that might already be an easierway to do this.

Still, we can predict how this story will turn out because we haveseveral examples of what happens in practice when RNGs decide that theyhave "run out" of numbers: the client code continues running withwhatever garbage it gets because it's not a situation that the softwaredeveloper ever encountered in his debugger, or one which a QA team evernoticed in the lab. At best, it will continue along an untested code path.

The thing _least_ likely to happen is for the operation actually needingthe CSRNs to fail, because that would be a conspicuous bug which wouldhave to be "fixed" somehow.

So the Intel "DRNG" has observable shared internal state and is sharedamong multiple cores. Even worse, *an attacker running on one core hasthe ability to cause the RDRAND instruction to write zeroes to thedestination register* !

Note that the carry flag isn't accessible from C. The RDRAND instructionisn't either, but there will be inline assembler snippets floatingaround any day now.


Just to pick on Peter Gutmann:

How would you encode, for example, 'RdRand eax'?
I'd like to get the encoded form to implement it as '__asm _emit 0x0F __asm
_emit 0xC7 __asm _emit <something>' (in the case of MSVC).

Note that he's not asking about how to check the carry flag too. I'msure he of all people wouldn't forget this, but not so for your typicaldeveloper.


It's possible to check the carry flag from inline asm:
http://stackoverflow.com/questions/3139772/check-if-carry-flag-is-set

So if you were a C programmer who didn't know x64 assembler *maybe* youcould find the right advice in that thread and get the carry flag outreliably. But how would you test it? How would QA test it?


It's certainly possible to get everything right, but
* requires significant skill and effort on the developer's part
* zero observable benefit in the normal case
* proper error handling reduces perceived reliability
* silent vulnerable failure mode
* difficult to repro on the developers box (masked by debugger)
* difficult to repro on the QA box
* may change with new processor revisions
* relatively easy for the attacker under the right conditions

This is a recipe for fail.

For comparison, look at the RDTSC instruction. Here's a nice helpfulpage (I've referred to it myself) with code to read the processortimestamp counter:

http://www.mcs.anl.gov/~kazutomo/rdtsc.html
Microsoft's compiler even provides a compiler intrinsic for it:
http://msdn.microsoft.com/en-us/library/twchhe95.aspx

Note that neither of these two sources provide any code to determine theprocessor support for and the validity of the values returned by thisinstruction! In fact, the numbers it returns can be really wacky onmulti-cpu architectures. Intel and AMD behave differently, even acrosschip models. You should probably set CPU affinity on any thread that'sgoing to use it and be prepared to handle a certain amount of wackymeasurements.

Now, put on your tinfoil beanie and suppose the hw accelerator is a
Mallory. Suppose there is some kind of a built-in weakness/backdoor,
for instance as a persistent memory inside the chip, which stores
the last N keys.

Years ago, somebody ran "strings" on a Windows system DLL and found thestring "NSAKEY". http://en.wikipedia.org/wiki/NSAKEY There was a lot ofspeculation and outright accusations that this was a backdoor to Windows.


Ah, those were innocent times. :-)

Since then we've learned more about exploiting ordinary software bugsand there have been *thousands* of remote holes discovered in Microsoftproducts:

http://web.nvd.nist.gov/view/vuln/search-results?adv_search=true&cves=on&cve_id=&query=microsoft&cwe_id=&pub_date_start_month=-1&pub_date_start_year=-1&pub_date_end_month=-1&pub_date_end_year=-1&mod_date_start_month=-1&mod_date_start_year=-1&mod_date_end_month=-1&mod_date_end_year=-1&cvss_sev_base=&cvss_av=NETWORK&cvss_ac=&cvss_au=&cvss_c=&cvss_i=&cvss_a=

The idea that the NSA (or any skilled attacker) would have neededMicrosoft's help to execute code remotely on Windows NT is now simplylaughable. The same goes for other commonly-used OSes.

Microsoft's code quality is vastly improved and now far ahead of most ofthe rest of the industry. But we know there are still hundreds of"trusted" root CAs, many from governments, that will silently installthemselves into Windows at the request of any website. Some of theseeven have code signing capabilities.

I guess the point here is that unless you're talking about hardwareconstructed specifically for a high-security environment, sneakingspecial key detection logic into the microarchitecture seems like theworld's most overcomplicated way of pwning a commodity box.

Having physical access to the machine would yield
the keys (thus subverting e.g. any disk encryption). And even more
paranoidly, a proper instruction sequence could blurt the key cache
out for convenient remote access by malware crafted by the People
Who Know The Secrets.


It wouldn't need to be anything that obvious. Just a circuit
configuration that leaks enough secret stuff via a side channel (timing,
power, RF, etc.) that it could be captured.

Things like data caches and branch prediction buffers have been shown to
do this as a natural consequence of how they operate. I think when an
incomprehensibly complex black-box system has things like subtle info
leaks happening by accident it's a good sign that intentional behaviors
could easily be hidden.

My questions: 1. How can one ensure this blackbox device really
isn't a Mallory?


Mallory lives on the communications link, which is neither Alice's nor
Bob's CPU by definition.

But it could certainly be malicious and there's probably no way to prove
that it isn't, particularly on a chip as dense as a modern x64 CPU.

Being in your chips is the ultimate in physical access to the hardware,
so you have no choice but to use only hardware that you "trust". But

there may be ways to split up the data such that the attacker would needto have previously compromised multiple, disparate systems.

2. Are there techniques, such as encrypting a lot of useless junk
before/after the real deal to flush out the real key, as a way to
reduce the impact of untrusted hardware, while still being able to
use the hw-accelerated capabilities?   And if you know of any good
papers around this subject, feel free to
 mention them :)


Encrypting a lot of stuff with the same key would probably just make the

side channel easier to read. Switching keys on every block might help,but only if the attacker didn't expect and account for that when hedesigned the backdoor.


- Marsh
_______________________________________________
cryptography mailing list
[email protected]
http://lists.randombit.net/mailman/listinfo/cryptography

Re: [cryptography] RDRAND and Is it possible to protect against malicious hw accelerators?

Reply via email to