On 05/29/2012 06:01 PM, Maarten Billemont wrote:
Dear readers,
I've written an iOS / Mac application whose goal it is to produce
passwords for any purpose. I was really hoping for the opportunity
to receive some critical feedback or review of the algorithm
used[1].
[1] http://masterpassword.lyndir.com/algorithm.html
Master Password Master Password So how does it work?
The theory behind Master Password is simple. The user remembers a
single, secure password. The user only ever uses that password to log
into the Master Password application. This master password is then
used as a seed to generate a different password based on the name of
How about just "used to generate"? Usually the term "seed" is used when
it completely determines a pseudorandom sequence rather than being
combined with other information.
[edit: OIC, the master password generates a seed based on the name of
the site. This was unclear.]
the site to generate a password for.
The result is that each master password generates its own unique
sequence of passwords for any site name. Since the only input data is
the master password and the site name (along with a password counter,
see below), there is no need for any kind of storage to recreate a
site's password. All that's needed is the correct master password and
the correct algorithm implementation.
How do you know how far along in the password sequence you are?
What that does for you is make
it almost impossible to lose your passwords.
To "lose a password" has multiple meanings.
How about "much easier to create and remember strong passwords"?
It also makes it nearly
impossible for hackers to steal your online identity.
I don't think this claim is well substantiated. :-)
The Algorithm
In short, the algorithm is comprised of the following steps:
Determining the master key Determining the cipher seed Encoding a
user-friendly password
The Master Password
The user chooses a single master password, preferably sufficiently
long to harden against brute-force attacks. Master Password
recommends absurd two or three-word sentences as they're easily
remembered and generally sufficiently high in entropy.
I believe this is not good advice because two and three word sentences
will not provide enough entropy.
Wikipedia cites a Harvard/Google study indicating English has 1,022,000
words.
https://en.wikipedia.org/wiki/Number_of_words_in_English#Number_of_words_in_English
Users have great difficulty picking original passwords, even security
pros tend to be overconfident in the uniqueness of password schemes.
For example:
http://www.imperva.com/docs/WP_Consumer_Password_Worst_Practices.pdf
"the 5000 most popular passwords [] are used by a share of 20% of the
users".
So even if users chose two completely independent "passwords" from the
RockYou statistical distribution to form their sentence it would mean
that more than 4% of users will use one of the top 25 million two
password combinations.
But that's when users are asked to pick unguessable passwords, asking
for very short proper sentences is certain to make this worse. Many
proper two word sentences in English are of the form noun-verb. But
English has far more nouns than verbs. For example,
https://en.wiktionary.org/ has 174871 entries for nouns and only 24931
under verbs. Adverb-verb sentences will be popular too, but there are
only 12982 adverbs listed.
In fact, a recent study tested two word minimum passphrases against an
Amazon system which returned a different error depending on whether or
not the passphrase was in use. They estimate that "a dictionary of
20,656 phrases covers the choices of about 1.13% of users" and concluded
that "by our metrics, even 5-word phrases would be highly insecure
against offline attacks, with fewer than 30 bits of work compromising
over half of users."
http://www.lightbluetouchpaper.org/2012/03/07/some-evidence-on-multi-word-passphrases/
http://www.cl.cam.ac.uk/~jcb82/doc/BS12-USEC-passphrase_linguistics.pdf
The application then creates a scrypt key derivative from the user's
password. This process takes quite a bit of processing time and
memory.
Specifics would be very interesting here. How much time on a modern CPU?
This step exists to make brute-force attempts at guessing the
master password from a given output password far more difficult, to
practically infeasible, even for otherwise vulnerable password
strings.
key = scrypt( P, S, N, r, p, dkLen ) where P = master password
(UTF-8) S = <empty> N = 16384 r = 8 p = 1 dkLen = 64
The result is a 64-byte key derived from the user's master password.
64 bytes is overkill. Later you use SHA-1 in such a way that there's no
point in having more than 160 bits here. For human chosen passphrases,
even the standard 128 bits is very generous.
This key will be fed into the rest of the algorithm to produce output
passwords that are as private to the user as his master password is.
>
Combining The Inputs
The theory behind Master Password requires that all inputs are given
by the user. The two main inputs are the master password that we used
to determine the key and the site's name. There is a third input
value, the password counter, which is a 32-bit unsigned integer value
that is used to salt the input. Initially, the password counter
should be zero, but a user may specify a non-zero counter value in
case he wants to force the algorithm to produce a new output password
for the site.
It may be the case that "all inputs are given by the user", but the more
important question is how many inputs are unknown to the attacker?
master password: pass phrase unknown to attacker
site name: well known
counter: a very small number, probably 0
These input values are combined in a byte array, separated by a
single NUL byte.
+1 for proper usage of NUL :-)
In order, the input values are the site name (UTF-8
decoded), the master key, and a salt (this is the password counter, a
32-bit unsigned integer in network byte order).
The purpose of a salt is to thwart precomputed dictionary attacks. The
salt isn't necessarily a hard secret, but the attacker can't begin
hashing dictionary words until after he learns it and then his work
won't be useful against any other account.
A slowly incrementing counter isn't a salt because it's almost always
going to be 0. In the rare case of a site that forces password changes,
it's not going to increase by more than 4 per year.
The byte array is
hashed using the SHA-1 algorithm to yield the seed as a result.
salt = htonl( password counter )
seed = sha1( site name . "\0" . key . "\0" . salt )
It appears you have mixed data formats here. Site name is some type of
text encoding (you should specify which), but key and salt are big
endian. All are NUL delimited, even though the key and salt may have
embedded NULs.
I don't think there's anything vulnerable with it, but it's messy.
Consider instead:
HMAC<SHA-1>( key,
"lyndir derive seed from master" . salt . sitename )
HMAC is a good fit here because it provides a dedicated key input which
fixes an odd property of SHA-1.
The fixed string helps ensure this function won't accidentally get
re-invented and re-used for some other purpose in a way that compromises
security.
You don't need a trailing NUL on the fixed string or the sitename. Since
the sitename is a variable length field you might consider prefixing it
with a two byte length.
Generating The Output
We now have a seed which is a sufficiently long seemingly-arbitrary
string of bytes that is unique to the site and the user.
You have 20 pseudorandom bytes, you don't know that they are unique to
the user.
This string
of bytes, however, is not very useful for a user to use as a
password. We have two additional problems that need to be solved: The
output password must be easy for a user to read and copy, but it
should also be compatible with most password policies.
It needs to be compatible with the specific password policy in use by
each site.
Password policies are strict rules imposed by applications on their
users, designed to limit the types of passwords these users are
allowed to use with the application. Usually, these policies exist to
force users into thinking about passwords with a healthy entropy.
Password policies are a lot of things to a lot of people. They also
encompass such things as periodic change requirements.
Often, they exist purely as a side-effect of bad password handling
such as storing the clear-text passwords in a database.
I don't think you've substantiated that last sentence.
Since the idea is that the output password can be used directly as a
password to protect the user's account on the site, it needs to be
able to pass the site's password policy. Master Password addresses
this problem by introducing password types. Each password type
describes what an output password must look like and maps to a set of
ciphers. Ciphers describe the resulting output password using a
series of characters that map to character groups of candidate output
characters.
This is not what most people would refer to as a 'cipher'. Consider
calling it something else like a 'password template'.
A cipher has the same length as the output password it
yields. Each character in the cipher maps to a specific character
group. At each position of the output password, a character is chosen
from the character group identified by the character in the cipher at
the same position.
The following ciphers are defined:
Type: Long Password CvcvCvcvnoCvcv CvcvnoCvcvCvcv CvcvCvcvCvcvno
Type: Medium Password CvcnoCvc CvcCvcno
Type: Short Password Cvcn
Type: Basic Password aaanaaan aannaaan aaannaaa Type: PIN nnnn
By default, Master Password uses the Long Password type for any new
passwords. The user is able to choose a different password type,
which is normally only done if the site's password policy is
incompatible with the output password produced by this type.
Isn't this another input to the generation algorithm?
Is the user expected to remember this policy for each site?
To create the create the output password, the bytes in the seed are
encoded according to the cipher. The first seed byte is used to
determine which of the type's ciphers to use for encoding an output
password.
We take the byte value of the first seed byte modulo the
amount of ciphers set for the chosen password type and use the result
as a zero-based index in the cipher list for the password type.
This results in a slightly uneven distribution when 256 is not a
multiple of the number of 'ciphers'.
Selecting random numbers in a range without bias is harder that it
looks; you can't just use the integer modulo operator. A web search for
"random number in range" turns up a lot of bad examples. The first
example I found that mentioned the bias from a naive modulo was
http://etutorials.org/Programming/secure+programming/Chapter+11.+Random+Numbers/11.11+Getting+a+Random+Integer+in+a+Range/
(This looks like it's from an OReilly book, I'm not sure how legit it is
to be on that site.)
ciphers = [ "CvcvCvcvnoCvcv", "CvcvnoCvcvCvcv", "CvcvCvcvCvcvno" ]
cipher = ciphers[ seed[0] % count( ciphers ) ]
Another way to do it without the complexity of these 'ciphers' might be
to pick one character from each of the required groups, then pick the
the remaining characters from an even distribution of all allowable
characters. Then randomly shuffle their order.
Now that we know what cipher to use for building our output password,
all that's left is to iterate the cipher, and produce a character of
password output for each step. When we iterate the cipher (index i),
we look in the character group identified by the character (string
passChars) in the cipher at index i.
The seed is 20 bytes, you use one to pick the 'cipher', what happens if
you need to generate more than a 19 character password? You don't have
to allow such a thing but you should probably document what happens.
The following character groups (passChars) are defined:
Cipher character: V AEIOU Cipher character: C BCDFGHJKLMNPQRSTVWXYZ
Cipher character: v aeiou Cipher character: c bcdfghjklmnpqrstvwxyz
Cipher character: A (= V . C) AEIOUBCDFGHJKLMNPQRSTVWXYZ
Cipher character: a (= V . v . C . c)
AEIOUaeiouBCDFGHJKLMNPQRSTVWXYZbcdfghjklmnpqrstvwxyz Cipher
character: n 0123456789
Cipher character: o !@#$%^&*()
I encountered a site the other day that required a special character but
did not accept a '%'.
Cipher character: X (= a . n . o)
AEIOUaeiouBCDFGHJKLMNPQRSTVWXYZbcdfghjklmnpqrstvwxyz0123456789!@#$%^&*()
So let's consider the "Medium" type password: CvcnoCvc CvcCvcno
Option 1:
C 21 possibilities
v 5
c 21
n 10
o 10
C 21
v 5
c 21
= 486202500 possibilities.
Option 2 is the same, so we have a maximum of 972405000 eight character
passwords which can ever be produced by this algorithm. This is probably
better than what most humans would pick, but it is also much less than
an even distribution over all allowed characters 72^8 = 722204136308736.
It is also very efficient to iterate all these possibilities. According
to http://hashcat.net/oclhashcat-plus/ a PC with a single AMD hd6990
video card can try 3263.4 M SHA-1s per second. So if an attacker with
such a rig were to obtain a properly-salted SHA-1 hash of a "medium"
type password it would take him, on average, 15 milliseconds to brute
force it.
The "long" format is defined as:
CvcvCvcvnoCvcv, CvcvnoCvcvCvcv, CvcvCvcvCvcvno
21^6 * 5^6 * 10 * 10 * 3 = 402028692187500 ~ 2^49 possibilities
The same attacker would need 17.1 hours on average (34.2 hours max) to
crack any "long" format password from a salted SHA-1 hash.
We use the seed's byte value at index i + 1 modulo the amount of
characters in the character class
Again, slightly biased.
to determine which character
(passChar) in the class to use for the output password at index i.
passChar = passChars[ seed[i + 1] % count( passChars ) ]
passWord[i] = passChar
That's my feedback on the details of the design and specification.
However, there's a serious weakness in the scheme itself.
Consider what happens if this method becomes very popular. If there is a
website with many users it will have many who are using this algorithm.
All counters start at 0 and the site name and password policy is shared
across all users. Eventually two users will generate a password for the
site using the same master pass phrase. Given the entropy estimates from
the Bonneau paper we could expect this to happen after just a few
thousand users. Even if 5-word phrases (having 30 bits entropy) were
required we would expect it to start happening around 32 k users and
become increasingly frequent.
When an attacker pwns a website and learns that two or more users have
selected the same Lyndir-formatted password, they are in a very good
position to attack the users' shared master passphrase. On a relatively
large site (the rockyou.com breach exposed 32 M passwords and I'd never
even heard of that site) they will observe the same password repeated
many times. The attacker learns that a) these users are probably
generating their passwords with this tool, b) these users have selected
a common passphrase. Because there is no meaningful salt the computation
done to discover this passphrase can be applied multiple users accounts.
The attacker's effort amounts to that of the least secure website. If
any site isn't using a hash function with a huge work factor it is the
attacker's option to go after those password hashes first. If he
succeeds in guessing *any* user's master passphrase, the attacker has
learned *all* the passwords for all websites for all users who chose
that master passphrase. Even though users do tend to reuse them across
sites, this is not a property shared by plain passwords.
If this system were adopted by large numbers of users it would make
things worse. The root cause of this is people's inability to remember
large amounts of entropy. This is why password security applications
must store entropy on behalf of the user, there's no way around it.
- Marsh
_______________________________________________
cryptography mailing list
[email protected]
http://lists.randombit.net/mailman/listinfo/cryptography