On 06/18/2011 03:08 PM, slinky wrote:

Likewise, for accelerated component functions the hardware will know
what is a key and what is input data - again, it needs this
information in order to operate. Contrast this to a general purpose
processor which can't really deduce what is a key and what isn't
while processing code that happens to be AES.

Why not?

As Peter Gutmann just said "They really have waaaay too much die space
to spare don't they?"

Intel bought McAfee a while ago. Speaking informally with some chip people (not necessarily from Intel and definitely not under any confidentiality) there's active research around the idea of building instruction stream validation in support of antivirus directly into the processor. Recognizing an intentionally-obfuscated virus seems no easier than recognizing AES.

The Intel hardware RNG ("DRBG") is an example of how not do do it. It has weird timings and magic numbers:
There are two approaches to structuring RdRand invocations such that
DRBG reseeding can be guaranteed: Iteratively execute RdRand beyond
the DRBG upper bound by executing more than 1022 64-bit RdRands, and
Iteratively execute 32 RdRand invocations with a 10us wait period
per iteration. The latter approach has the effect of forcing a
reseeding event since the DRBG aggressively reseeds during idle
periods.

But on any kind of networked or mutitasking system "idle periods" happen only at the consent of the attacker.

Within the context of virtualization, the DRNG's stateless design
and atomic instruction access means that RdRand can be used freely
by multiple VMs without the need for hypervisor intervention or
resource management.

Hmm, are we sure none of this carries across any security boundaries?

In another place they say:
After invoking the RdRand instruction, the caller must examine the
Carry Flag (CF) to determine whether a random value was available at
the time the RdRand instruction was executed.

So it's not stateless after all because it keeps a FIFO of numbers and emits all zeroes when it "runs out".

To me, this only makes sense if one of the following might be true:
* the entropy pool is so small that it's in danger of brute-forcing
* the pool's contents can be read out somehow, or
* the extraction process (NIST SP 800-90 CTR_DRBG AES) may not be strongly one-way

But I know there are other opinions :-). More than likely though, they're doing this to follow "best practices".

At the very least this is going to disclose to an attacker on another core how many random numbers you're consuming. Random numbers can often be requested via the network, so he sends changing rates of SSL or IPsec handshake requests while watching how fast the pool depletes. This enables him to determine if he's running on the same processor as your crypto thread. It may also create a covert channel for exfiltration. Of course, there are other shared resources that might already be an easier way to do this.

Still, we can predict how this story will turn out because we have several examples of what happens in practice when RNGs decide that they have "run out" of numbers: the client code continues running with whatever garbage it gets because it's not a situation that the software developer ever encountered in his debugger, or one which a QA team ever noticed in the lab. At best, it will continue along an untested code path.

The thing _least_ likely to happen is for the operation actually needing the CSRNs to fail, because that would be a conspicuous bug which would have to be "fixed" somehow.

So the Intel "DRNG" has observable shared internal state and is shared among multiple cores. Even worse, *an attacker running on one core has the ability to cause the RDRAND instruction to write zeroes to the destination register* !

Note that the carry flag isn't accessible from C. The RDRAND instruction isn't either, but there will be inline assembler snippets floating around any day now.

Just to pick on Peter Gutmann:
How would you encode, for example, 'RdRand eax'?
I'd like to get the encoded form to implement it as '__asm _emit 0x0F __asm
_emit 0xC7 __asm _emit <something>' (in the case of MSVC).

Note that he's not asking about how to check the carry flag too. I'm sure he of all people wouldn't forget this, but not so for your typical developer.

It's possible to check the carry flag from inline asm:
http://stackoverflow.com/questions/3139772/check-if-carry-flag-is-set

So if you were a C programmer who didn't know x64 assembler *maybe* you could find the right advice in that thread and get the carry flag out reliably. But how would you test it? How would QA test it?

It's certainly possible to get everything right, but
* requires significant skill and effort on the developer's part
* zero observable benefit in the normal case
* proper error handling reduces perceived reliability
* silent vulnerable failure mode
* difficult to repro on the developers box (masked by debugger)
* difficult to repro on the QA box
* may change with new processor revisions
* relatively easy for the attacker under the right conditions

This is a recipe for fail.

For comparison, look at the RDTSC instruction. Here's a nice helpful page (I've referred to it myself) with code to read the processor timestamp counter:
http://www.mcs.anl.gov/~kazutomo/rdtsc.html
Microsoft's compiler even provides a compiler intrinsic for it:
http://msdn.microsoft.com/en-us/library/twchhe95.aspx

Note that neither of these two sources provide any code to determine the processor support for and the validity of the values returned by this instruction! In fact, the numbers it returns can be really wacky on multi-cpu architectures. Intel and AMD behave differently, even across chip models. You should probably set CPU affinity on any thread that's going to use it and be prepared to handle a certain amount of wacky measurements.

Now, put on your tinfoil beanie and suppose the hw accelerator is a
Mallory. Suppose there is some kind of a built-in weakness/backdoor,
for instance as a persistent memory inside the chip, which stores
the last N keys.

Years ago, somebody ran "strings" on a Windows system DLL and found the string "NSAKEY". http://en.wikipedia.org/wiki/NSAKEY There was a lot of speculation and outright accusations that this was a backdoor to Windows.

Ah, those were innocent times. :-)

Since then we've learned more about exploiting ordinary software bugs and there have been *thousands* of remote holes discovered in Microsoft products:
http://web.nvd.nist.gov/view/vuln/search-results?adv_search=true&cves=on&cve_id=&query=microsoft&cwe_id=&pub_date_start_month=-1&pub_date_start_year=-1&pub_date_end_month=-1&pub_date_end_year=-1&mod_date_start_month=-1&mod_date_start_year=-1&mod_date_end_month=-1&mod_date_end_year=-1&cvss_sev_base=&cvss_av=NETWORK&cvss_ac=&cvss_au=&cvss_c=&cvss_i=&cvss_a=

The idea that the NSA (or any skilled attacker) would have needed Microsoft's help to execute code remotely on Windows NT is now simply laughable. The same goes for other commonly-used OSes.

Microsoft's code quality is vastly improved and now far ahead of most of the rest of the industry. But we know there are still hundreds of "trusted" root CAs, many from governments, that will silently install themselves into Windows at the request of any website. Some of these even have code signing capabilities.

I guess the point here is that unless you're talking about hardware constructed specifically for a high-security environment, sneaking special key detection logic into the microarchitecture seems like the world's most overcomplicated way of pwning a commodity box.

Having physical access to the machine would yield
the keys (thus subverting e.g. any disk encryption). And even more
paranoidly, a proper instruction sequence could blurt the key cache
out for convenient remote access by malware crafted by the People
Who Know The Secrets.

It wouldn't need to be anything that obvious. Just a circuit
configuration that leaks enough secret stuff via a side channel (timing,
power, RF, etc.) that it could be captured.

Things like data caches and branch prediction buffers have been shown to
do this as a natural consequence of how they operate. I think when an
incomprehensibly complex black-box system has things like subtle info
leaks happening by accident it's a good sign that intentional behaviors
could easily be hidden.

My questions: 1. How can one ensure this blackbox device really
isn't a Mallory?

Mallory lives on the communications link, which is neither Alice's nor
Bob's CPU by definition.

But it could certainly be malicious and there's probably no way to prove
that it isn't, particularly on a chip as dense as a modern x64 CPU.

Being in your chips is the ultimate in physical access to the hardware,
so you have no choice but to use only hardware that you "trust". But
there may be ways to split up the data such that the attacker would need to have previously compromised multiple, disparate systems.

2. Are there techniques, such as encrypting a lot of useless junk
before/after the real deal to flush out the real key, as a way to
reduce the impact of untrusted hardware, while still being able to
use the hw-accelerated capabilities?   And if you know of any good
papers around this subject, feel free to
 mention them :)

Encrypting a lot of stuff with the same key would probably just make the
side channel easier to read. Switching keys on every block might help, but only if the attacker didn't expect and account for that when he designed the backdoor.

- Marsh
_______________________________________________
cryptography mailing list
[email protected]
http://lists.randombit.net/mailman/listinfo/cryptography

Reply via email to