Hi everyone.
This is very possibly a newb question (or series of questions), and if
so I apologize in advance. I scoured everywhere I could think of for the
last couple of days trying to find information on this and came up
empty, but maybe I just didn't know the right terms to search for.
--- Background ---
I was reverse-engineering a system recently and came across an issue
that I know from experience and training is pretty widespread when
developers without a strong cryptography background use cryptography
without thinking things through: although strong cryptography (AES) was
used, and the key was stored securely, the system itself unintentionally
provided a means for an attacker to decrypt arbitrary data without ever
knowing the key. I have a set of recommendations in mind for how to
avoid this type of vulnerability, but I'd like to sanity-check them with
people who actually do have a cryptography background.
What I'm hoping to avoid is being the security guy who makes a
recommendation for improving something, but unintentionally introduces a
different vulnerability as a result. I have gotten pretty good at
exploiting commonly-made mistakes in software that uses cryptography,
but I am not a cryptography expert, or even cryptography adept.
The system in question uses a fairly common mechanism where the state of
certain non-sensitive variables is maintained on the client by means of
encrypted data which the client doesn't have the key to decrypt. The
only reason for the encryption is to prevent the client from tampering
with the data. This allows multiple different load-balanced nodes on the
back-end to respond to requests from the same client without having to
sync their state. Think of ASP.NET's ViewState, except that here the
variables are broken out into individual components instead of there
being one giant encrypted blob that contains all of the data.
Like many systems that use this model, there is a flaw that would
(probably?) be trivial in the absence of other factors: some of those
non-sensitive values are displayed back to the user after being
decrypted. In other words, as I mentioned above, the system includes the
unintentional ability for users to decrypt arbitrary data, as long as
that data was encrypted using the same key as the data it actually expects.
Unfortunately, there is other - sensitive - data in the system which is
also encrypted using the same key. It's data that must be stored in a
reversibly-encrypted format, but which end users should not be able to
retrieve. For the sake of argument, let's say it's the password for a
service account that the system uses to execute batch jobs, or a stored
credit card number used to make purchases by a customer. In both cases,
the system needs the ability to obtain the original value, but end users
do not - they just refer to the value abstractly, such as "use this
service account to execute this task", or "I want to make a purchase
using the card whose number ends in 1234". I am using examples from
other systems that I've looked at in the past here as opposed to the
current one, so please don't get stuck on those two specific cases.
Assume that there is a requirement that the system be able to decrypt
the data, but that it should not be accessible to end users after it's
originally entered.
The combination of those two aspects of the system means that if a user
can obtain the encrypted version of the second type of data, they can
feed it into their cookie, and the system will happily display to them
the decrypted value, because it doesn't know any better and because the
same key is used for both types of data.
Now, normally users can't actually obtain this sensitive data, even in
encrypted format - there are OS- and database-level permissions that are
supposed to prevent that - but over time, people have a tendency to
forget why certain things were configured the way they were, someone
makes a configuration change, and people who shouldn't be able to get to
the encrypted data are suddenly able to.
--- Proposal/Question ---
Of course, one of my main recommendations is going to be "don't use the
same key for multiple types of data!!", but because my background is in
systems engineering, one of my interests is building redundant safety
features into a system design so that any one failure or human error
won't completely compromise the system.
Part 1 of my proposal is that encrypted values should be wrapped in some
kind of metadata to identify their type, as well as delimit where the
plaintext value starts and ends (to help prevent someone from using
block-shuffling in a way that involves changing the length of the
desired plaintext, if someone makes a mistake and uses ECB mode instead
of CBC). Some really basic examples of the plaintext might be:
<password>12345? That's the same combination as on my luggage!</password>
versus
<customThemeName>Autumn</customThemeName>
...or...
[value&&type::password&&length::52]12345? That's the same combination as
on my luggage![/value]
versus
[value&&type::customThemeName&&length::6]Autumn[/value]
This is obviously going to involve an increase in storage size. For
example, using the "Autumn" example and XML-style wrapper, with a block
size of 128 bits, the ciphertext balloons from (size of IV + 16 bytes)
to (size of IV + 48 bytes). The benefit I see is that it allows the
application to do a check to make sure that the type of data it has just
decrypted is actually of the type it expects, prevent other types of
data from being returned to the user, and possibly generate an alert if
it was expecting e.g. the name of a custom webpage theme but found a
service account password instead. There is a whole side-topic here
related to making sure that mechanism isn't itself exploitable, but I
will set that aside because then the email would be even longer.
As I said, the application I'm asking about uses strong encryption for
which there are no known known-plaintext attacks. However, as soon as I
thought of the above concept, I realized that if a practical
known-plaintext attack were ever discovered for AES, that scheme would
be setting up the system for compromise, because all values of a certain
type would have at least their first block of plaintext be
highly-predictable.
So part 2 of my proposal is that the plaintext include a throwaway
section *before* the actual data of concern, which has a length of one
block, and is filled with random (or at least pseudo-random) data that
is uniquely-generated for each encrypted value. As long as CBC mode was
used, it seems to me that it would be sort of like a second IV (a
"reinitialization vector", I guess? :)), except that it would never be
stored outside of the ciphertext, would be immediately discarded upon
decryption, and never intentionally reused. In other words, while I see
it as serving a purpose somewhat related to an IV, I also see them as
being complementary to each other instead of redundant - the IV helps
ensure that identical plaintext encrypts to different ciphertext, and
the "RIV" helps guard against future known-plaintext attacks when used
with CBC encryption mode.
This is probably stating the obvious, but in the case of one of the
examples above, if the encryption used were AES or another algorithm
with a block size of 128 bits, the plaintext modified according to both
parts 1 and 2 of my proposal would look like this:
XXXXXXXXXXXXXXXX[value&&type::password&&length::52]12345? That's the
same combination as on my luggage![/value]
...where XXXXXXXXXXXXXXXX represents 16 bytes of random/pseudorandom
values from 0-255. This whole long set of plaintext would then be
encrypted, appended to the IV, and finally stored.
To hammer home the storage downside of this, the result is that what was
originally going to be potentially an 80-byte value (16-byte IV +
64-byte ciphertext) has swollen to 148 bytes (16-byte IV + 128-byte
ciphertext). Because the password in question is unusually long, let's
say that it generally doubles or triples the size of the stored data,
and of course increases CPU time for encryption and decryption.
However, at least superficially, I think it greatly reduces the
likelihood of sensitive data being obtained by people who shouldn't have
it, because it provides a means of allowing the application to perform
"output validation" before displaying values to the user, and (unless
I'm mistaken) it guards against future known-plaintext attacks on the
encryption algorithm. In combination with using different encryption
keys for different types of data (and of course using unique IVs for
each encrypted value), it seems to me that it makes it much less likely
for any one mistake to compromise the system.
--- Wrapping up ---
I can definitely see an argument that this is a bunch of
over-engineering, but the type of flaw I'm describing is ridiculously
widespread in commercial software. I'd already run into it myself, and
then when I went to a SANS advanced web pen-testing course there was an
entire day dedicated to it and related defects.
I feel like I need to be able to make some recommendations to developers
who aren't cryptography experts that will let them design and build
systems that have a degree of built-in redundancy so that the failure of
any one design element related to the encrypted data won't result in a
complete compromise of that system. I need to be able to come up with a
simple recipe for that, and it can't be any one silver bullet (like "use
different keys for different types of data"), because single mechanisms
will always fail at some point. It also can't be something unrealistic
like "become an awesome cryptographer before you design any system that
uses cryptographic algorithms", because I know that's not going to
happen and I have to account for the reality of the situation. I feel
like it needs to be 3-5 overlapping design philosophies/patterns that
are easy to remember, in addition to the ones that are well-known like
"use existing, well-vetted cryptographic algorithms instead of writing
your own".
From a cryptography perspective, is this a stupid idea? Are there
better ways to achieve my goal? Am I introducing any new weaknesses into
the system? Has any element of this topic been done to death and I just
didn't know what to search for?
In any case, if anyone got to the end of this rambling email, thank you.
- Ben Lincoln
_______________________________________________
cryptography mailing list
cryptography@randombit.net
http://lists.randombit.net/mailman/listinfo/cryptography