On 05/29/2012 06:01 PM, Maarten Billemont wrote:
Dear readers,

I've written an iOS / Mac application whose goal it is to produce
passwords for any purpose.  I was really hoping for the opportunity
to receive some critical feedback or review of the algorithm
used[1].


[1] http://masterpassword.lyndir.com/algorithm.html

Master Password Master Password So how does it work?

The theory behind Master Password is simple. The user remembers a
single, secure password. The user only ever uses that password to log
into the Master Password application. This master password is then
used as a seed to generate a different password based on the name of

How about just "used to generate"? Usually the term "seed" is used when it completely determines a pseudorandom sequence rather than being combined with other information.

[edit: OIC, the master password generates a seed based on the name of the site. This was unclear.]

the site to generate a password for.

The result is that each master password generates its own unique
sequence of passwords for any site name. Since the only input data is
the master password and the site name (along with a password counter,
see below), there is no need for any kind of storage to recreate a
site's password. All that's needed is the correct master password and
the correct algorithm implementation.

How do you know how far along in the password sequence you are?

What that does for you is make
it almost impossible to lose your passwords.

To "lose a password" has multiple meanings.
How about "much easier to create and remember strong passwords"?

It also makes it nearly
impossible for hackers to steal your online identity.

I don't think this claim is well substantiated. :-)

The Algorithm

In short, the algorithm is comprised of the following steps:

Determining the master key Determining the cipher seed Encoding a
user-friendly password

The Master Password

The user chooses a single master password, preferably sufficiently
long to harden against brute-force attacks. Master Password
recommends absurd two or three-word sentences as they're easily
remembered and generally sufficiently high in entropy.

I believe this is not good advice because two and three word sentences will not provide enough entropy.

Wikipedia cites a Harvard/Google study indicating English has 1,022,000 words. https://en.wikipedia.org/wiki/Number_of_words_in_English#Number_of_words_in_English

Users have great difficulty picking original passwords, even security pros tend to be overconfident in the uniqueness of password schemes.
For example:
http://www.imperva.com/docs/WP_Consumer_Password_Worst_Practices.pdf
"the 5000 most popular passwords [] are used by a share of 20% of the users".

So even if users chose two completely independent "passwords" from the RockYou statistical distribution to form their sentence it would mean that more than 4% of users will use one of the top 25 million two password combinations.

But that's when users are asked to pick unguessable passwords, asking for very short proper sentences is certain to make this worse. Many proper two word sentences in English are of the form noun-verb. But English has far more nouns than verbs. For example, https://en.wiktionary.org/ has 174871 entries for nouns and only 24931 under verbs. Adverb-verb sentences will be popular too, but there are only 12982 adverbs listed.

In fact, a recent study tested two word minimum passphrases against an Amazon system which returned a different error depending on whether or not the passphrase was in use. They estimate that "a dictionary of 20,656 phrases covers the choices of about 1.13% of users" and concluded that "by our metrics, even 5-word phrases would be highly insecure against offline attacks, with fewer than 30 bits of work compromising over half of users."
http://www.lightbluetouchpaper.org/2012/03/07/some-evidence-on-multi-word-passphrases/
http://www.cl.cam.ac.uk/~jcb82/doc/BS12-USEC-passphrase_linguistics.pdf

The application then creates a scrypt key derivative from the user's
password. This process takes quite a bit of processing time and
memory.

Specifics would be very interesting here. How much time on a modern CPU?

This step exists to make brute-force attempts at guessing the
master password from a given output password far more difficult, to
practically infeasible, even for otherwise vulnerable password
strings.

key   = scrypt( P, S, N, r, p, dkLen ) where P     = master password
(UTF-8) S     = <empty> N     = 16384 r     = 8 p     = 1 dkLen = 64

The result is a 64-byte key derived from the user's master password.

64 bytes is overkill. Later you use SHA-1 in such a way that there's no point in having more than 160 bits here. For human chosen passphrases, even the standard 128 bits is very generous.

This key will be fed into the rest of the algorithm to produce output
passwords that are as private to the user as his master password is.
>
Combining The Inputs

The theory behind Master Password requires that all inputs are given
by the user. The two main inputs are the master password that we used
to determine the key and the site's name. There is a third input
value, the password counter, which is a 32-bit unsigned integer value
that is used to salt the input. Initially, the password counter
should be zero, but a user may specify a non-zero counter value in
case he wants to force the algorithm to produce a new output password
for the site.

It may be the case that "all inputs are given by the user", but the more important question is how many inputs are unknown to the attacker?

master password: pass phrase unknown to attacker
site name: well known
counter: a very small number, probably 0

These input values are combined in a byte array, separated by a
single NUL byte.

+1 for proper usage of NUL :-)

In order, the input values are the site name (UTF-8
decoded), the master key, and a salt (this is the password counter, a
32-bit unsigned integer in network byte order).

The purpose of a salt is to thwart precomputed dictionary attacks. The salt isn't necessarily a hard secret, but the attacker can't begin hashing dictionary words until after he learns it and then his work won't be useful against any other account.

A slowly incrementing counter isn't a salt because it's almost always going to be 0. In the rare case of a site that forces password changes, it's not going to increase by more than 4 per year.

The byte array is
hashed using the SHA-1 algorithm to yield the seed as a result.

salt = htonl( password counter )
seed = sha1( site name . "\0" . key . "\0" . salt )

It appears you have mixed data formats here. Site name is some type of text encoding (you should specify which), but key and salt are big endian. All are NUL delimited, even though the key and salt may have embedded NULs.

I don't think there's anything vulnerable with it, but it's messy.

Consider instead:

HMAC<SHA-1>( key,
             "lyndir derive seed from master" . salt . sitename )

HMAC is a good fit here because it provides a dedicated key input which fixes an odd property of SHA-1.

The fixed string helps ensure this function won't accidentally get re-invented and re-used for some other purpose in a way that compromises security.

You don't need a trailing NUL on the fixed string or the sitename. Since the sitename is a variable length field you might consider prefixing it with a two byte length.

Generating The Output

We now have a seed which is a sufficiently long seemingly-arbitrary
string of bytes that is unique to the site and the user.

You have 20 pseudorandom bytes, you don't know that they are unique to the user.

This string
of bytes, however, is not very useful for a user to use as a
password. We have two additional problems that need to be solved: The
output password must be easy for a user to read and copy, but it
should also be compatible with most password policies.

It needs to be compatible with the specific password policy in use by each site.

Password policies are strict rules imposed by applications on their
users, designed to limit the types of passwords these users are
allowed to use with the application. Usually, these policies exist to
force users into thinking about passwords with a healthy entropy.

Password policies are a lot of things to a lot of people. They also encompass such things as periodic change requirements.

Often, they exist purely as a side-effect of bad password handling
such as storing the clear-text passwords in a database.

I don't think you've substantiated that last sentence.

Since the idea is that the output password can be used directly as a
password to protect the user's account on the site, it needs to be
able to pass the site's password policy. Master Password addresses
this problem by introducing password types. Each password type
describes what an output password must look like and maps to a set of
ciphers. Ciphers describe the resulting output password using a
series of characters that map to character groups of candidate output
characters.

This is not what most people would refer to as a 'cipher'. Consider calling it something else like a 'password template'.

A cipher has the same length as the output password it
yields. Each character in the cipher maps to a specific character
group. At each position of the output password, a character is chosen
from the character group identified by the character in the cipher at
the same position.

The following ciphers are defined:

Type: Long Password CvcvCvcvnoCvcv CvcvnoCvcvCvcv CvcvCvcvCvcvno
Type: Medium Password CvcnoCvc CvcCvcno
Type: Short Password Cvcn
Type: Basic Password aaanaaan aannaaan aaannaaa Type: PIN nnnn

By default, Master Password uses the Long Password type for any new
passwords. The user is able to choose a different password type,
which is normally only done if the site's password policy is
incompatible with the output password produced by this type.

Isn't this another input to the generation algorithm?
Is the user expected to remember this policy for each site?

To create the create the output password, the bytes in the seed are
encoded according to the cipher. The first seed byte is used to
determine which of the type's ciphers to use for encoding an output
password.

We take the byte value of the first seed byte  modulo the
amount of ciphers set for the chosen password type and use the result
as a zero-based index in the cipher list for the password type.

This results in a slightly uneven distribution when 256 is not a multiple of the number of 'ciphers'.

Selecting random numbers in a range without bias is harder that it looks; you can't just use the integer modulo operator. A web search for "random number in range" turns up a lot of bad examples. The first example I found that mentioned the bias from a naive modulo was
http://etutorials.org/Programming/secure+programming/Chapter+11.+Random+Numbers/11.11+Getting+a+Random+Integer+in+a+Range/

(This looks like it's from an OReilly book, I'm not sure how legit it is to be on that site.)

ciphers = [ "CvcvCvcvnoCvcv", "CvcvnoCvcvCvcv", "CvcvCvcvCvcvno" ]
cipher  = ciphers[ seed[0] % count( ciphers ) ]

Another way to do it without the complexity of these 'ciphers' might be to pick one character from each of the required groups, then pick the the remaining characters from an even distribution of all allowable characters. Then randomly shuffle their order.

Now that we know what cipher to use for building our output password,
all that's left is to iterate the cipher, and produce a character of
password output for each step. When we iterate the cipher (index i),
we look in the character group identified by the character (string
passChars) in the cipher at index i.

The seed is 20 bytes, you use one to pick the 'cipher', what happens if you need to generate more than a 19 character password? You don't have to allow such a thing but you should probably document what happens.

The following character groups (passChars) are defined:

Cipher character: V AEIOU Cipher character: C BCDFGHJKLMNPQRSTVWXYZ
Cipher character: v aeiou Cipher character: c bcdfghjklmnpqrstvwxyz
Cipher character: A (= V . C) AEIOUBCDFGHJKLMNPQRSTVWXYZ
Cipher character: a (= V . v . C . c)
AEIOUaeiouBCDFGHJKLMNPQRSTVWXYZbcdfghjklmnpqrstvwxyz Cipher
character: n 0123456789
Cipher character: o !@#$%^&*()

I encountered a site the other day that required a special character but did not accept a '%'.

Cipher character: X (= a . n . o)
AEIOUaeiouBCDFGHJKLMNPQRSTVWXYZbcdfghjklmnpqrstvwxyz0123456789!@#$%^&*()

So let's consider the "Medium" type password: CvcnoCvc CvcCvcno

Option 1:
C   21 possibilities
v    5
c   21
n   10
o   10
C   21
v    5
c   21
= 486202500 possibilities.

Option 2 is the same, so we have a maximum of 972405000 eight character passwords which can ever be produced by this algorithm. This is probably better than what most humans would pick, but it is also much less than an even distribution over all allowed characters 72^8 = 722204136308736.

It is also very efficient to iterate all these possibilities. According to http://hashcat.net/oclhashcat-plus/ a PC with a single AMD hd6990 video card can try 3263.4 M SHA-1s per second. So if an attacker with such a rig were to obtain a properly-salted SHA-1 hash of a "medium" type password it would take him, on average, 15 milliseconds to brute force it.

The "long" format is defined as:
     CvcvCvcvnoCvcv, CvcvnoCvcvCvcv, CvcvCvcvCvcvno

21^6 * 5^6 * 10 * 10 * 3 = 402028692187500 ~ 2^49 possibilities

The same attacker would need 17.1 hours on average (34.2 hours max) to crack any "long" format password from a salted SHA-1 hash.

 We use the seed's byte value at index i + 1 modulo the amount of
characters in the character class

Again, slightly biased.

to determine which character
(passChar) in the class to use for the output password at index i.

passChar    = passChars[ seed[i + 1] % count( passChars ) ]
passWord[i] = passChar

That's my feedback on the details of the design and specification.
However, there's a serious weakness in the scheme itself.

Consider what happens if this method becomes very popular. If there is a website with many users it will have many who are using this algorithm. All counters start at 0 and the site name and password policy is shared across all users. Eventually two users will generate a password for the site using the same master pass phrase. Given the entropy estimates from the Bonneau paper we could expect this to happen after just a few thousand users. Even if 5-word phrases (having 30 bits entropy) were required we would expect it to start happening around 32 k users and become increasingly frequent.

When an attacker pwns a website and learns that two or more users have selected the same Lyndir-formatted password, they are in a very good position to attack the users' shared master passphrase. On a relatively large site (the rockyou.com breach exposed 32 M passwords and I'd never even heard of that site) they will observe the same password repeated many times. The attacker learns that a) these users are probably generating their passwords with this tool, b) these users have selected a common passphrase. Because there is no meaningful salt the computation done to discover this passphrase can be applied multiple users accounts.

The attacker's effort amounts to that of the least secure website. If any site isn't using a hash function with a huge work factor it is the attacker's option to go after those password hashes first. If he succeeds in guessing *any* user's master passphrase, the attacker has learned *all* the passwords for all websites for all users who chose that master passphrase. Even though users do tend to reuse them across sites, this is not a property shared by plain passwords.

If this system were adopted by large numbers of users it would make things worse. The root cause of this is people's inability to remember large amounts of entropy. This is why password security applications must store entropy on behalf of the user, there's no way around it.

- Marsh
_______________________________________________
cryptography mailing list
[email protected]
http://lists.randombit.net/mailman/listinfo/cryptography

Reply via email to