Re: [cryptography] Master Password

Marsh Ray Tue, 29 May 2012 22:21:16 -0700

On 05/29/2012 06:01 PM, Maarten Billemont wrote:

Dear readers,


I've written an iOS / Mac application whose goal it is to produce
passwords for any purpose.  I was really hoping for the opportunity
to receive some critical feedback or review of the algorithm
used[1].

[1] http://masterpassword.lyndir.com/algorithm.html

Master Password Master Password So how does it work?

The theory behind Master Password is simple. The user remembers a
single, secure password. The user only ever uses that password to log
into the Master Password application. This master password is then
used as a seed to generate a different password based on the name of

How about just "used to generate"? Usually the term "seed" is used whenit completely determines a pseudorandom sequence rather than beingcombined with other information.

[edit: OIC, the master password generates a seed based on the name ofthe site. This was unclear.]

the site to generate a password for.

The result is that each master password generates its own unique
sequence of passwords for any site name. Since the only input data is
the master password and the site name (along with a password counter,
see below), there is no need for any kind of storage to recreate a
site's password. All that's needed is the correct master password and
the correct algorithm implementation.


How do you know how far along in the password sequence you are?

What that does for you is make
it almost impossible to lose your passwords.


To "lose a password" has multiple meanings.
How about "much easier to create and remember strong passwords"?

It also makes it nearly
impossible for hackers to steal your online identity.


I don't think this claim is well substantiated. :-)

The Algorithm

In short, the algorithm is comprised of the following steps:

Determining the master key Determining the cipher seed Encoding a
user-friendly password

The Master Password

The user chooses a single master password, preferably sufficiently
long to harden against brute-force attacks. Master Password
recommends absurd two or three-word sentences as they're easily
remembered and generally sufficiently high in entropy.

I believe this is not good advice because two and three word sentenceswill not provide enough entropy.

Wikipedia cites a Harvard/Google study indicating English has 1,022,000words.https://en.wikipedia.org/wiki/Number_of_words_in_English#Number_of_words_in_English

Users have great difficulty picking original passwords, even securitypros tend to be overconfident in the uniqueness of password schemes.

For example:
http://www.imperva.com/docs/WP_Consumer_Password_Worst_Practices.pdf

"the 5000 most popular passwords [] are used by a share of 20% of theusers".

So even if users chose two completely independent "passwords" from theRockYou statistical distribution to form their sentence it would meanthat more than 4% of users will use one of the top 25 million twopassword combinations.

But that's when users are asked to pick unguessable passwords, askingfor very short proper sentences is certain to make this worse. Manyproper two word sentences in English are of the form noun-verb. ButEnglish has far more nouns than verbs. For example,https://en.wiktionary.org/ has 174871 entries for nouns and only 24931under verbs. Adverb-verb sentences will be popular too, but there areonly 12982 adverbs listed.

In fact, a recent study tested two word minimum passphrases against anAmazon system which returned a different error depending on whether ornot the passphrase was in use. They estimate that "a dictionary of20,656 phrases covers the choices of about 1.13% of users" and concludedthat "by our metrics, even 5-word phrases would be highly insecureagainst offline attacks, with fewer than 30 bits of work compromisingover half of users."

http://www.lightbluetouchpaper.org/2012/03/07/some-evidence-on-multi-word-passphrases/
http://www.cl.cam.ac.uk/~jcb82/doc/BS12-USEC-passphrase_linguistics.pdf

The application then creates a scrypt key derivative from the user's
password. This process takes quite a bit of processing time and
memory.


Specifics would be very interesting here. How much time on a modern CPU?

This step exists to make brute-force attempts at guessing the
master password from a given output password far more difficult, to
practically infeasible, even for otherwise vulnerable password
strings.

key   = scrypt( P, S, N, r, p, dkLen ) where P     = master password
(UTF-8) S     = <empty> N     = 16384 r     = 8 p     = 1 dkLen = 64

The result is a 64-byte key derived from the user's master password.

64 bytes is overkill. Later you use SHA-1 in such a way that there's nopoint in having more than 160 bits here. For human chosen passphrases,even the standard 128 bits is very generous.

This key will be fed into the rest of the algorithm to produce output
passwords that are as private to the user as his master password is.

Combining The Inputs

The theory behind Master Password requires that all inputs are given
by the user. The two main inputs are the master password that we used
to determine the key and the site's name. There is a third input
value, the password counter, which is a 32-bit unsigned integer value
that is used to salt the input. Initially, the password counter
should be zero, but a user may specify a non-zero counter value in
case he wants to force the algorithm to produce a new output password
for the site.

It may be the case that "all inputs are given by the user", but the moreimportant question is how many inputs are unknown to the attacker?


master password: pass phrase unknown to attacker
site name: well known
counter: a very small number, probably 0

These input values are combined in a byte array, separated by a
single NUL byte.


+1 for proper usage of NUL :-)

In order, the input values are the site name (UTF-8
decoded), the master key, and a salt (this is the password counter, a
32-bit unsigned integer in network byte order).

The purpose of a salt is to thwart precomputed dictionary attacks. Thesalt isn't necessarily a hard secret, but the attacker can't beginhashing dictionary words until after he learns it and then his workwon't be useful against any other account.

A slowly incrementing counter isn't a salt because it's almost alwaysgoing to be 0. In the rare case of a site that forces password changes,it's not going to increase by more than 4 per year.

The byte array is
hashed using the SHA-1 algorithm to yield the seed as a result.

salt = htonl( password counter )
seed = sha1( site name . "\0" . key . "\0" . salt )

It appears you have mixed data formats here. Site name is some type oftext encoding (you should specify which), but key and salt are bigendian. All are NUL delimited, even though the key and salt may haveembedded NULs.


I don't think there's anything vulnerable with it, but it's messy.

Consider instead:

HMAC<SHA-1>( key,
             "lyndir derive seed from master" . salt . sitename )

HMAC is a good fit here because it provides a dedicated key input whichfixes an odd property of SHA-1.

The fixed string helps ensure this function won't accidentally getre-invented and re-used for some other purpose in a way that compromisessecurity.

You don't need a trailing NUL on the fixed string or the sitename. Sincethe sitename is a variable length field you might consider prefixing itwith a two byte length.

Generating The Output

We now have a seed which is a sufficiently long seemingly-arbitrary
string of bytes that is unique to the site and the user.

You have 20 pseudorandom bytes, you don't know that they are unique tothe user.

This string
of bytes, however, is not very useful for a user to use as a
password. We have two additional problems that need to be solved: The
output password must be easy for a user to read and copy, but it
should also be compatible with most password policies.

It needs to be compatible with the specific password policy in use byeach site.

Password policies are strict rules imposed by applications on their
users, designed to limit the types of passwords these users are
allowed to use with the application. Usually, these policies exist to
force users into thinking about passwords with a healthy entropy.

Password policies are a lot of things to a lot of people. They alsoencompass such things as periodic change requirements.

Often, they exist purely as a side-effect of bad password handling
such as storing the clear-text passwords in a database.


I don't think you've substantiated that last sentence.

Since the idea is that the output password can be used directly as a
password to protect the user's account on the site, it needs to be
able to pass the site's password policy. Master Password addresses
this problem by introducing password types. Each password type
describes what an output password must look like and maps to a set of
ciphers. Ciphers describe the resulting output password using a
series of characters that map to character groups of candidate output
characters.

This is not what most people would refer to as a 'cipher'. Considercalling it something else like a 'password template'.

A cipher has the same length as the output password it
yields. Each character in the cipher maps to a specific character
group. At each position of the output password, a character is chosen
from the character group identified by the character in the cipher at
the same position.

The following ciphers are defined:

Type: Long Password CvcvCvcvnoCvcv CvcvnoCvcvCvcv CvcvCvcvCvcvno
Type: Medium Password CvcnoCvc CvcCvcno
Type: Short Password Cvcn
Type: Basic Password aaanaaan aannaaan aaannaaa Type: PIN nnnn

By default, Master Password uses the Long Password type for any new
passwords. The user is able to choose a different password type,
which is normally only done if the site's password policy is
incompatible with the output password produced by this type.


Isn't this another input to the generation algorithm?
Is the user expected to remember this policy for each site?

To create the create the output password, the bytes in the seed are
encoded according to the cipher. The first seed byte is used to
determine which of the type's ciphers to use for encoding an output
password.

We take the byte value of the first seed byte  modulo the
amount of ciphers set for the chosen password type and use the result
as a zero-based index in the cipher list for the password type.

This results in a slightly uneven distribution when 256 is not amultiple of the number of 'ciphers'.

Selecting random numbers in a range without bias is harder that itlooks; you can't just use the integer modulo operator. A web search for"random number in range" turns up a lot of bad examples. The firstexample I found that mentioned the bias from a naive modulo was

http://etutorials.org/Programming/secure+programming/Chapter+11.+Random+Numbers/11.11+Getting+a+Random+Integer+in+a+Range/

(This looks like it's from an OReilly book, I'm not sure how legit it isto be on that site.)

ciphers = [ "CvcvCvcvnoCvcv", "CvcvnoCvcvCvcv", "CvcvCvcvCvcvno" ]
cipher  = ciphers[ seed[0] % count( ciphers ) ]

Another way to do it without the complexity of these 'ciphers' might beto pick one character from each of the required groups, then pick thethe remaining characters from an even distribution of all allowablecharacters. Then randomly shuffle their order.

Now that we know what cipher to use for building our output password,
all that's left is to iterate the cipher, and produce a character of
password output for each step. When we iterate the cipher (index i),
we look in the character group identified by the character (string
passChars) in the cipher at index i.

The seed is 20 bytes, you use one to pick the 'cipher', what happens ifyou need to generate more than a 19 character password? You don't haveto allow such a thing but you should probably document what happens.

The following character groups (passChars) are defined:

Cipher character: V AEIOU Cipher character: C BCDFGHJKLMNPQRSTVWXYZ
Cipher character: v aeiou Cipher character: c bcdfghjklmnpqrstvwxyz
Cipher character: A (= V . C) AEIOUBCDFGHJKLMNPQRSTVWXYZ
Cipher character: a (= V . v . C . c)
AEIOUaeiouBCDFGHJKLMNPQRSTVWXYZbcdfghjklmnpqrstvwxyz Cipher
character: n 0123456789
Cipher character: o !@#$%^&*()

I encountered a site the other day that required a special character butdid not accept a '%'.

Cipher character: X (= a . n . o)
AEIOUaeiouBCDFGHJKLMNPQRSTVWXYZbcdfghjklmnpqrstvwxyz0123456789!@#$%^&*()


So let's consider the "Medium" type password: CvcnoCvc CvcCvcno

Option 1:
C   21 possibilities
v    5
c   21
n   10
o   10
C   21
v    5
c   21
= 486202500 possibilities.

Option 2 is the same, so we have a maximum of 972405000 eight characterpasswords which can ever be produced by this algorithm. This is probablybetter than what most humans would pick, but it is also much less thanan even distribution over all allowed characters 72^8 = 722204136308736.

It is also very efficient to iterate all these possibilities. Accordingto http://hashcat.net/oclhashcat-plus/ a PC with a single AMD hd6990video card can try 3263.4 M SHA-1s per second. So if an attacker withsuch a rig were to obtain a properly-salted SHA-1 hash of a "medium"type password it would take him, on average, 15 milliseconds to bruteforce it.


The "long" format is defined as:
     CvcvCvcvnoCvcv, CvcvnoCvcvCvcv, CvcvCvcvCvcvno

21^6 * 5^6 * 10 * 10 * 3 = 402028692187500 ~ 2^49 possibilities

The same attacker would need 17.1 hours on average (34.2 hours max) tocrack any "long" format password from a salted SHA-1 hash.

 We use the seed's byte value at index i + 1 modulo the amount of
characters in the character class


Again, slightly biased.

to determine which character
(passChar) in the class to use for the output password at index i.

passChar    = passChars[ seed[i + 1] % count( passChars ) ]
passWord[i] = passChar


That's my feedback on the details of the design and specification.
However, there's a serious weakness in the scheme itself.

Consider what happens if this method becomes very popular. If there is awebsite with many users it will have many who are using this algorithm.All counters start at 0 and the site name and password policy is sharedacross all users. Eventually two users will generate a password for thesite using the same master pass phrase. Given the entropy estimates fromthe Bonneau paper we could expect this to happen after just a fewthousand users. Even if 5-word phrases (having 30 bits entropy) wererequired we would expect it to start happening around 32 k users andbecome increasingly frequent.

When an attacker pwns a website and learns that two or more users haveselected the same Lyndir-formatted password, they are in a very goodposition to attack the users' shared master passphrase. On a relativelylarge site (the rockyou.com breach exposed 32 M passwords and I'd nevereven heard of that site) they will observe the same password repeatedmany times. The attacker learns that a) these users are probablygenerating their passwords with this tool, b) these users have selecteda common passphrase. Because there is no meaningful salt the computationdone to discover this passphrase can be applied multiple users accounts.

The attacker's effort amounts to that of the least secure website. Ifany site isn't using a hash function with a huge work factor it is theattacker's option to go after those password hashes first. If hesucceeds in guessing *any* user's master passphrase, the attacker haslearned *all* the passwords for all websites for all users who chosethat master passphrase. Even though users do tend to reuse them acrosssites, this is not a property shared by plain passwords.

If this system were adopted by large numbers of users it would makethings worse. The root cause of this is people's inability to rememberlarge amounts of entropy. This is why password security applicationsmust store entropy on behalf of the user, there's no way around it.


- Marsh
_______________________________________________
cryptography mailing list
[email protected]
http://lists.randombit.net/mailman/listinfo/cryptography

Re: [cryptography] Master Password

Reply via email to