Greetings,
I've been looking at the crypto that underlies ansible-vault, and I'm
worried. Specifically, it seems to me that the vault as implemented is not
safe for credential storage.
If you haven't read the implementation, this is essentially what happens
when you encrypt a file with ansible-vault:
- The plaintext is prepended with a SHA256 hash of itself,
- A random salt is generated,
- The vault password and salt are used to derive an AES key and IV,
using a python implementation of openssl's EVP_BytesToKey(),
- The plaintext-with-hash is encrypted with AES in CBC mode, using the
above key and IV,
- This ciphertext is hexlified and stored.
Looking at the comments in the code, this is a python reproduction of what
`openssl aes-256-cbc -salt` does, except for the SHA256 hash bit which is
an "aftermarket" integrity measure.
The biggest concern I have about this code is that these blobs are
*much* easier
to bruteforce than the "AES-256" would lead you to believe. This is because
openssl's EVP_BytesToKey() is a poor KDF, one which does not make the key
derivation expensive in CPU and/or memory. As a result, it is very cheap to
test a candidate password: four MD5 operations on small inputs, a few
AES-256 block operations, and a SHA-256 hash. All these operations are easy
to hardware-accelerate, either with modern CPUs (AES-NI) or GPUs.
This dramatically reduces the search space from 32 bytes if brute-forcing
the AES key (64 bytes if the IV isn't included in the ciphertext, which I
haven't checked), to the number of bytes in the password. Those bytes are
also very likely to be in a small set of values (letters+numbers, maybe
symbols if you're lucky), further reducing the search space for
bruteforcing.
A good KDF would close this avenue of attack by making the key derivation
so expensive that it's cheaper to bruteforce the AES key. Unfortunately,
EVP_BytesToKey() is not such a KDF. Its documentation in OpenSSL even
recommends using better KDFs, such as PBKDF2 or scrypt, for designs which
don't specifically require BytesToKey.
So, it seems to me that ansible-vault blobs are not safe to expose to
untrusted people, because brute-forcing them in an offline attack is much
easier than it would seem. This is a problem, because if only trusted
people have access to the blobs, you might as well just have the sensitive
data in cleartext.
Further concerns about the implementation:
- SHA-256 is used as an authentication code, but isn't one.
- The encoding is constructed as "mac-then-encrypt", whereas
encrypt-then-mac is the safer default, because it minimizes your code's
exposure to hostile inputs. This is relatively minor compared to the
effective lack of MAC.
- The hash check on decryption is not constant-time, which opens up a
timing side-channel.
- The core of the implementation seems lifted verbatim from a pair of
Stack Overflow answers. This is concerning in two ways:
- The question being answered was "how do I reproduce this one
specific behavior of the openssl CLI in python?", not "What's a
good way to
securely store sensitive data at rest, where attackers can
perform offline
attacks at will?"
- The unit tests only verify that the implementation is internally
consistent (M == decrypt(encrypt(M)) essentially), not that it
matches the
openssl behavior it's copying. While the primitives are delegated to
pycrypto, there could be bugs lurking in the glue around the primitives.
Since this is a non-standard combination of primitives, there are no
canonical test inputs you can check against.
I should say that I'm not a crypto expert, merely an enthusiastic amateur.
However, my spidey sense is tingling pretty hard in light of all the above.
I'm kinda hoping that I've overlooked something obvious that makes this all
safe, but that's a lot of distinct concerns to address :/.
Not wishing to be just a downer, I have suggestions for safer vault
implementations:
- Derive keys with PBKDF#2, use NaCl's secretbox() for encryption and
decryption. Secretbox implements correct and fast authenticated encryption,
and PBKDF#2 will severely slow down trivial bruteforcing attacks. Pynacl
provides Python bindings for NaCl, and pycrypto provides PBKDF#2.
- If a dependency on pynacl is not desired, use AES-GCM to perform
authenticated encryption. AES-GCM will be available in the upcoming
pycrypto release.
- For something that uses only current pycrypto, AES-CTR combined with
an HMAC-SHA256 authentication code. However, this is starting to drift back
into the territory of manually gluing primitives together in new and
exciting ways (although AES-CTR+HMAC-SHA256 is not exactly off the beaten
path), which increases the risk.
I'd be more than happy to provide a vault implementation for the first
option, and the commandline plumbing to enable selection of vault
implementations, if it would be helpful. I wouldn't trust myself to
implement the other two without oversight from an expert, unfortunately.
- Dave
--
You received this message because you are subscribed to the Google Groups
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/ansible-project/CAMx%2Br7W47tp%3Dmn6EwsO_WcuZfs7ujAgzYBQXhV2VE1M%3Dqaz4GA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.