First of all, thanks for your time and very valuable feedback.

On 30 May 2012, at 07:20, Marsh Ray wrote:
> On 05/29/2012 06:01 PM, Maarten Billemont wrote:
>> Dear readers,
>> 
>> I've written an iOS / Mac application whose goal it is to produce
>> passwords for any purpose.  I was really hoping for the opportunity
>> to receive some critical feedback or review of the algorithm
>> used[1].
>> 
> 
>> [1] http://masterpassword.lyndir.com/algorithm.html
>> 
>> Master Password Master Password So how does it work?
>> 
>> The theory behind Master Password is simple. The user remembers a
>> single, secure password. The user only ever uses that password to log
>> into the Master Password application. This master password is then
>> used as a seed to generate a different password based on the name of
> 
> How about just "used to generate"? Usually the term "seed" is used when it 
> completely determines a pseudorandom sequence rather than being combined with 
> other information.
> 
> [edit: OIC, the master password generates a seed based on the name of the 
> site. This was unclear.]
> 
>> the site to generate a password for.
>> 
>> The result is that each master password generates its own unique
>> sequence of passwords for any site name. Since the only input data is
>> the master password and the site name (along with a password counter,
>> see below), there is no need for any kind of storage to recreate a
>> site's password. All that's needed is the correct master password and
>> the correct algorithm implementation.
> 
> How do you know how far along in the password sequence you are?

The summary descriptions of the algorithm are simplified (perhaps, too much so) 
by assuming default values for those inputs that have ones.  Technically, there 
are these inputs:
 - The master password,
 - The site name,
 - The password counter (default 0),
 - The password type (default "Long Password").

> 
>> What that does for you is make
>> it almost impossible to lose your passwords.
> 
> To "lose a password" has multiple meanings.
> How about "much easier to create and remember strong passwords"?

The intended meaning was, get in a situation where you can no longer obtain to 
the data necessary for accessing your passwords.  Additionally, you can no 
longer get in a situation where this data can be "lost" to a third party, such 
as when someone steals your notebook full of passwords.

> 
>> It also makes it nearly
>> impossible for hackers to steal your online identity.
> 
> I don't think this claim is well substantiated. :-)

The basis of this claim is the theory that attacking your password for a single 
site remains as vulnerable as password authentication has ever been, but this 
does not risk your global identity as reusing or rehashing the same password on 
multiple sites would.  It's phrased awkwardly for an audience of professionals, 
but was targeted at users.  This page may not be the right location for such 
language.

> 
>> The Algorithm
>> 
>> In short, the algorithm is comprised of the following steps:
>> 
>> Determining the master key Determining the cipher seed Encoding a
>> user-friendly password
>> 
>> The Master Password
>> 
>> The user chooses a single master password, preferably sufficiently
>> long to harden against brute-force attacks. Master Password
>> recommends absurd two or three-word sentences as they're easily
>> remembered and generally sufficiently high in entropy.
> 
> I believe this is not good advice because two and three word sentences will 
> not provide enough entropy.
> 
> Wikipedia cites a Harvard/Google study indicating English has 1,022,000 
> words. 
> https://en.wikipedia.org/wiki/Number_of_words_in_English#Number_of_words_in_English
> 
> Users have great difficulty picking original passwords, even security pros 
> tend to be overconfident in the uniqueness of password schemes.
> For example:
> http://www.imperva.com/docs/WP_Consumer_Password_Worst_Practices.pdf
> "the 5000 most popular passwords [] are used by a share of 20% of the users".
> 
> So even if users chose two completely independent "passwords" from the 
> RockYou statistical distribution to form their sentence it would mean that 
> more than 4% of users will use one of the top 25 million two password 
> combinations.
> 
> But that's when users are asked to pick unguessable passwords, asking for 
> very short proper sentences is certain to make this worse. Many proper two 
> word sentences in English are of the form noun-verb. But English has far more 
> nouns than verbs. For example, https://en.wiktionary.org/ has 174871 entries 
> for nouns and only 24931 under verbs. Adverb-verb sentences will be popular 
> too, but there are only 12982 adverbs listed.
> 
> In fact, a recent study tested two word minimum passphrases against an Amazon 
> system which returned a different error depending on whether or not the 
> passphrase was in use. They estimate that "a dictionary of 20,656 phrases 
> covers the choices of about 1.13% of users" and concluded that "by our 
> metrics, even 5-word phrases would be highly insecure against offline 
> attacks, with fewer than 30 bits of work compromising over half of users."
> http://www.lightbluetouchpaper.org/2012/03/07/some-evidence-on-multi-word-passphrases/
> http://www.cl.cam.ac.uk/~jcb82/doc/BS12-USEC-passphrase_linguistics.pdf

Very useful information!

Initially, my recommendation for a master password was to use a 
sufficiently-random 12-character string.  I added the proposal for "paranoid" 
users to incorporate non-latin glyphs into the password.  The big issue here is 
usability: Ideally, you want to lock the application down and require the 
master password for every usage of it.  That means entering the master password 
needs to be sufficiently painless.  At the same time, it's vital that users do 
not forget their master password, or the whole scheme falls apart; there is no 
master password recovery scenario.  These two key elements combined reading 
about things like passfault's attempts at determining password strengths, lead 
me to think absurd word-based sentences are a best fit for sufficiently-high 
entropy memorable and easy-to-enter pass phrases.  I may need to re-evaluate 
this opinion based on your references.  Perhaps you have proposals that might 
better address these requirements?

https://passfault.appspot.com/password_strength.html

> 
>> The application then creates a scrypt key derivative from the user's
>> password. This process takes quite a bit of processing time and
>> memory.
> 
> Specifics would be very interesting here. How much time on a modern CPU?

The specifics are included in the form of arguments to the scrypt algorithm.  
In this case, we're running on a mobile device which potentially has a very 
limited amount of memory and processing power available.  At the same time, we 
don't want to keep the user waiting for longer than a few seconds after 
entering his master password, since he wants to get on with it and log into his 
website.  Usability is an issue as well.  The values that were chosen leads to 
about 4 seconds of computational delay on the iPhone 3GS, but takes my Mac only 
0.1s.  It's still much better than the 1570733 SHA-1's it can compute per 
second, though.  I'm certain that a decent rig could lower that number 
significantly, but the main thing 'scrypt' does here is reduce the speed of the 
brute-forcing operation by a significant factor.

> 
>> This step exists to make brute-force attempts at guessing the
>> master password from a given output password far more difficult, to
>> practically infeasible, even for otherwise vulnerable password
>> strings.
>> 
>> key   = scrypt( P, S, N, r, p, dkLen ) where P     = master password
>> (UTF-8) S     = <empty> N     = 16384 r     = 8 p     = 1 dkLen = 64
>> 
>> The result is a 64-byte key derived from the user's master password.
> 
> 64 bytes is overkill. Later you use SHA-1 in such a way that there's no point 
> in having more than 160 bits here. For human chosen passphrases, even the 
> standard 128 bits is very generous.

Agreed, that parameter has not been optimally chosen.

> 
>> This key will be fed into the rest of the algorithm to produce output
>> passwords that are as private to the user as his master password is.
> >
>> Combining The Inputs
>> 
>> The theory behind Master Password requires that all inputs are given
>> by the user. The two main inputs are the master password that we used
>> to determine the key and the site's name. There is a third input
>> value, the password counter, which is a 32-bit unsigned integer value
>> that is used to salt the input. Initially, the password counter
>> should be zero, but a user may specify a non-zero counter value in
>> case he wants to force the algorithm to produce a new output password
>> for the site.
> 
> It may be the case that "all inputs are given by the user", but the more 
> important question is how many inputs are unknown to the attacker?
> 
> master password: pass phrase unknown to attacker
> site name: well known
> counter: a very small number, probably 0

There is only one input that is by-design unknown to the attacker.  The way I 
saw it was that one unknown factor is good enough when the algorithm is sound.

> 
>> These input values are combined in a byte array, separated by a
>> single NUL byte.
> 
> +1 for proper usage of NUL :-)
> 
>> In order, the input values are the site name (UTF-8
>> decoded), the master key, and a salt (this is the password counter, a
>> 32-bit unsigned integer in network byte order).
> 
> The purpose of a salt is to thwart precomputed dictionary attacks.  The salt 
> isn't necessarily a hard secret, but the attacker can't begin hashing 
> dictionary words until after he learns it and then his work won't be useful 
> against any other account.
> 
> A slowly incrementing counter isn't a salt because it's almost always going 
> to be 0. In the rare case of a site that forces password changes, it's not 
> going to increase by more than 4 per year.

The purpose of this "salt" is indeed not to thwart dictionary attacks, it 
exists solely to shake up the input data to the hash and produce an entirely 
new hash on the user's demand.  I was unfamiliar with your definition of salt, 
and I'm certainly willing to change my terminology.  Have you a proposition of 
what this "password counter" might better be called?

> 
>> The byte array is
>> hashed using the SHA-1 algorithm to yield the seed as a result.
>> 
>> salt = htonl( password counter )
>> seed = sha1( site name . "\0" . key . "\0" . salt )
> 
> It appears you have mixed data formats here. Site name is some type of text 
> encoding (you should specify which), but key and salt are big endian. All are 
> NUL delimited, even though the key and salt may have embedded NULs.
> 
> I don't think there's anything vulnerable with it, but it's messy.
> 

The site name is a UTF-8 decoded byte string, as mentioned in the text above, 
but I will repeat it inline as well, to avoid possible ambiguity.  The key and 
salt can indeed contain NULs, but then again, they can contain anything.  Are 
you hinting that they might well be simply concatenated without delimiter?

> 
> Consider instead:
> 
> HMAC<SHA-1>( key,
>            "lyndir derive seed from master" . salt . sitename )
> 
> HMAC is a good fit here because it provides a dedicated key input which fixes 
> an odd property of SHA-1.

If you're referring to the possibility of length extension to produce a 
password for a different site from a given password, ignoring that the 
resulting seed from this hash operation is never actually wholly known by a 
site, I believe I've dodged that by putting the site name in front of the other 
elements and additionally delimiting it with a NUL, though I suppose that with 
some luck, the "salt" may yet be extended?  Note, however, that it is a 
fixed-length numeric value.  I'm unsure that this can be meaningfully exploited 
by length extension.

I certainly can't deny that HMAC does seem like better a fit.  It's just that 
it also seems rather unnecessary.

> 
> The fixed string helps ensure this function won't accidentally get 
> re-invented and re-used for some other purpose in a way that compromises 
> security.

This is an interesting suggestion.

> You don't need a trailing NUL on the fixed string or the sitename. Since the 
> sitename is a variable length field you might consider prefixing it with a 
> two byte length.
> 
>> Generating The Output
>> 
>> We now have a seed which is a sufficiently long seemingly-arbitrary
>> string of bytes that is unique to the site and the user.
> 
> You have 20 pseudorandom bytes, you don't know that they are unique to the 
> user.
> 
>> This string
>> of bytes, however, is not very useful for a user to use as a
>> password. We have two additional problems that need to be solved: The
>> output password must be easy for a user to read and copy, but it
>> should also be compatible with most password policies.
> 
> It needs to be compatible with the specific password policy in use by each 
> site.
> 
>> Password policies are strict rules imposed by applications on their
>> users, designed to limit the types of passwords these users are
>> allowed to use with the application. Usually, these policies exist to
>> force users into thinking about passwords with a healthy entropy.
> 
> Password policies are a lot of things to a lot of people. They also encompass 
> such things as periodic change requirements.

Correct.  That's one of the reasons why I introduced the password counter.

> 
>> Often, they exist purely as a side-effect of bad password handling
>> such as storing the clear-text passwords in a database.
> 
> I don't think you've substantiated that last sentence.

I'm unsure what you're looking for here.  I could add that databases often 
impose length or data type limitations on their content.  I could add that code 
which improperly merges strings of different contexts can cause passwords that 
contain certain characters to result in runtime syntax errors.  But I believe 
it's perhaps a little off-scope.  Is this the kind of substantiation you would 
like to see added or were you referring to something else?

> 
>> Since the idea is that the output password can be used directly as a
>> password to protect the user's account on the site, it needs to be
>> able to pass the site's password policy. Master Password addresses
>> this problem by introducing password types. Each password type
>> describes what an output password must look like and maps to a set of
>> ciphers. Ciphers describe the resulting output password using a
>> series of characters that map to character groups of candidate output
>> characters.
> 
> This is not what most people would refer to as a 'cipher'. Consider calling 
> it something else like a 'password template'.

I was indeed struggling with finding a good term for this one.

When looking at Wiktionary, I found the definition of cipher to be:
(cryptography) A cryptographic system using an algorithm that converts letters 
or sequences of bits into ciphertext.

While ciphertext is described as:
Encoded text, text that is unreadable.

This lead me to believe that the conversion of hash bytes into a different form 
that makes the hash unreadable was a suitable match for this term.  Your 
"template" proposal isn't a bad one either, and I certainly want to avoid using 
terms that can potentially be misunderstood.

> 
>> A cipher has the same length as the output password it
>> yields. Each character in the cipher maps to a specific character
>> group. At each position of the output password, a character is chosen
>> from the character group identified by the character in the cipher at
>> the same position.
>> 
>> The following ciphers are defined:
>> 
>> Type: Long Password CvcvCvcvnoCvcv CvcvnoCvcvCvcv CvcvCvcvCvcvno
>> Type: Medium Password CvcnoCvc CvcCvcno
>> Type: Short Password Cvcn
>> Type: Basic Password aaanaaan aannaaan aaannaaa Type: PIN nnnn
>> 
>> By default, Master Password uses the Long Password type for any new
>> passwords. The user is able to choose a different password type,
>> which is normally only done if the site's password policy is
>> incompatible with the output password produced by this type.
> 
> Isn't this another input to the generation algorithm?
> Is the user expected to remember this policy for each site?
> 
>> To create the create the output password, the bytes in the seed are
>> encoded according to the cipher. The first seed byte is used to
>> determine which of the type's ciphers to use for encoding an output
>> password.
> 
>> We take the byte value of the first seed byte  modulo the
>> amount of ciphers set for the chosen password type and use the result
>> as a zero-based index in the cipher list for the password type.
> 
> This results in a slightly uneven distribution when 256 is not a multiple of 
> the number of 'ciphers'.
> 
> Selecting random numbers in a range without bias is harder that it looks; you 
> can't just use the integer modulo operator. A web search for "random number 
> in range" turns up a lot of bad examples. The first example I found that 
> mentioned the bias from a naive modulo was
> http://etutorials.org/Programming/secure+programming/Chapter+11.+Random+Numbers/11.11+Getting+a+Random+Integer+in+a+Range/
> 
> (This looks like it's from an OReilly book, I'm not sure how legit it is to 
> be on that site.)

This was pointed out to me before.  As I understand it, the result is that my 
entropy is slightly reduced.  I will evaluate the issue and consider a better 
way of choosing a number from a range, but I'm not yet convinced the loss in 
entropy is in fact, significant.  Either way, if the solution is trivial, 
there's no harm in doing the right thing.

> 
>> ciphers = [ "CvcvCvcvnoCvcv", "CvcvnoCvcvCvcv", "CvcvCvcvCvcvno" ]
>> cipher  = ciphers[ seed[0] % count( ciphers ) ]
> 
> Another way to do it without the complexity of these 'ciphers' might be to 
> pick one character from each of the required groups, then pick the the 
> remaining characters from an even distribution of all allowable characters. 
> Then randomly shuffle their order.

This was initially the plan.  The reason why I added the whole cipher-based 
encoding, is that I wanted my output passwords to be of a certain format that 
made it easier to copy.

Many password inputs are made to disallow automated input such as copy/pasting. 
 Additionally, when the password is generated solely on a user's mobile device, 
it needs to be transferred manually to the user's desktop or other device 
whenever the user wishes to authenticate himself there.

It just happens to be terribly difficult for a user to manually copy a password 
that is comprised of an evenly distributed set of characters, but much easier 
when those characters can form "groups" in the user's mind.  The cipher part is 
therefore purely a usability feature.

> 
>> Now that we know what cipher to use for building our output password,
>> all that's left is to iterate the cipher, and produce a character of
>> password output for each step. When we iterate the cipher (index i),
>> we look in the character group identified by the character (string
>> passChars) in the cipher at index i.
> 
> The seed is 20 bytes, you use one to pick the 'cipher', what happens if you 
> need to generate more than a 19 character password? You don't have to allow 
> such a thing but you should probably document what happens.

The algorithm currently defines the password types that it supports, which 
obviously doesn't contain a cipher longer than the seed bytes available.  But 
it might yet be interesting to add such a paragraph in view of future or custom 
password type additions.

> 
>> The following character groups (passChars) are defined:
>> 
>> Cipher character: V AEIOU Cipher character: C BCDFGHJKLMNPQRSTVWXYZ
>> Cipher character: v aeiou Cipher character: c bcdfghjklmnpqrstvwxyz
>> Cipher character: A (= V . C) AEIOUBCDFGHJKLMNPQRSTVWXYZ
>> Cipher character: a (= V . v . C . c)
>> AEIOUaeiouBCDFGHJKLMNPQRSTVWXYZbcdfghjklmnpqrstvwxyz Cipher
>> character: n 0123456789
>> Cipher character: o !@#$%^&*()
> 
> I encountered a site the other day that required a special character but did 
> not accept a '%'.

Unfortunately the amount of idiocy that certain sites attain in their password 
handling implementations can't reasonably be wholly contained by an algorithm.  
That's why Master Password has the option built-in to AES-128-CBC encrypt 
user-specified passwords using the master password as well, as a fallback 
mechanism that doesn't enjoy some of the advantages the purely stateless 
approach has.  This fallback is also necessary for those cases where users are 
simply given a password to use, by their administrators.

> 
>> Cipher character: X (= a . n . o)
>> AEIOUaeiouBCDFGHJKLMNPQRSTVWXYZbcdfghjklmnpqrstvwxyz0123456789!@#$%^&*()
> 
> So let's consider the "Medium" type password: CvcnoCvc CvcCvcno
> 
> Option 1:
> C   21 possibilities
> v    5
> c   21
> n   10
> o   10
> C   21
> v    5
> c   21
> = 486202500 possibilities.
> 
> Option 2 is the same, so we have a maximum of 972405000 eight character 
> passwords which can ever be produced by this algorithm. This is probably 
> better than what most humans would pick, but it is also much less than an 
> even distribution over all allowed characters 72^8 = 722204136308736.
> 
> It is also very efficient to iterate all these possibilities. According to 
> http://hashcat.net/oclhashcat-plus/ a PC with a single AMD hd6990 video card 
> can try 3263.4 M SHA-1s per second. So if an attacker with such a rig were to 
> obtain a properly-salted SHA-1 hash of a "medium" type password it would take 
> him, on average, 15 milliseconds to brute force it.
> 
> The "long" format is defined as:
>     CvcvCvcvnoCvcv, CvcvnoCvcvCvcv, CvcvCvcvCvcvno
> 
> 21^6 * 5^6 * 10 * 10 * 3 = 402028692187500 ~ 2^49 possibilities
> 
> The same attacker would need 17.1 hours on average (34.2 hours max) to crack 
> any "long" format password from a salted SHA-1 hash.

This is an inherent flaw of the password authentication method.  Master 
Password isn't here to replace or fix passwords.  Better alternatives exist 
such as public key authentication, hardware OTP tokens or smartcards.  The 
problem is the unfortunate fact that password authentication is universal 
standard.  Master Password merely aspires to produce reasonably decent 
passwords for a user's sites in a way that doesn't compromise their global 
identity when one or more of the user's passwords is compromised.

> 
>> We use the seed's byte value at index i + 1 modulo the amount of
>> characters in the character class
> 
> Again, slightly biased.
> 
>> to determine which character
>> (passChar) in the class to use for the output password at index i.
>> 
>> passChar    = passChars[ seed[i + 1] % count( passChars ) ]
>> passWord[i] = passChar
> 
> That's my feedback on the details of the design and specification.
> However, there's a serious weakness in the scheme itself.
> 
> Consider what happens if this method becomes very popular. If there is a 
> website with many users it will have many who are using this algorithm. All 
> counters start at 0 and the site name and password policy is shared across 
> all users. Eventually two users will generate a password for the site using 
> the same master pass phrase. Given the entropy estimates from the Bonneau 
> paper we could expect this to happen after just a few thousand users. Even if 
> 5-word phrases (having 30 bits entropy) were required we would expect it to 
> start happening around 32 k users and become increasingly frequent.
> 
> When an attacker pwns a website and learns that two or more users have 
> selected the same Lyndir-formatted password, they are in a very good position 
> to attack the users' shared master passphrase. On a relatively large site 
> (the rockyou.com breach exposed 32 M passwords and I'd never even heard of 
> that site) they will observe the same password repeated many times. The 
> attacker learns that a) these users are probably generating their passwords 
> with this tool, b) these users have selected a common passphrase. Because 
> there is no meaningful salt the computation done to discover this passphrase 
> can be applied multiple users accounts.
> 
> The attacker's effort amounts to that of the least secure website. If any 
> site isn't using a hash function with a huge work factor it is the attacker's 
> option to go after those password hashes first. If he succeeds in guessing 
> *any* user's master passphrase, the attacker has learned *all* the passwords 
> for all websites for all users who chose that master passphrase. Even though 
> users do tend to reuse them across sites, this is not a property shared by 
> plain passwords.
> 
> If this system were adopted by large numbers of users it would make things 
> worse. The root cause of this is people's inability to remember large amounts 
> of entropy. This is why password security applications must store entropy on 
> behalf of the user, there's no way around it.

I'm not quite convinced that things would be worse.  Master Password's 
algorithm was always designed with the knowledge that individual passwords can 
easily be compromised, and in fact was conceived as a way of addressing this 
problem specifically.

If we compare the Master Password solution being used by a large user-base 
against that same user-base using their own individual password management 
solutions, I believe we'll find that since users easily give up on security and 
simply reuse or rehash a small subset of passwords for all accounts, an 
attacker that obtains the user's credentials for one site will have a much 
easier time at figuring out the user's credentials for their other sites than 
they would if they first had to brute-force that user's master password.

Note that an attacker gains very little from discovering a user's password for 
a site, and very little more by discovering that multiple users are using the 
same password.  As far as I understand, there's no way to reverse the site's 
password and determine the master password from it, and knowing that multiple 
users that have used the same master password does not aid in this attempt 
either.

Knowing that 2, 5 or 10 users are using the same master password might make it 
more profitable a target for brute-forcing, but I'm currently still convinced 
that an attacker will opt for a user-chosen password before he'll have a go at 
brute-forcing Master Password.

But please re-iterate your point if I've completely missed it.

> 
> - Marsh


Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
cryptography mailing list
[email protected]
http://lists.randombit.net/mailman/listinfo/cryptography

Reply via email to