Hello all,

I flood you again with a looooong email. Apologies to all that don't
want to read so much, but it is a problem of rather high importance that
has not really been fixed, and the first discussions happened in 2003 as
far as I can tell. Time to FIX IT!!!

The problem, in short, is how to handle the checksumming and signing of
gentoo-provided files so that manipulation by external entities becomes
difficult. I expect many disagreements on the "best" strategy to
implement, but I hope that a sensible compromise will be reached so that
this can finally be implemented.

All the lazy people may stop reading here ;-)

Short overview:
The Problem
The Attacker
Defending
Policies and open problems



The Problem:
============

A malicious person could modify the files provided by Gentoo to 
manipulate and take over the computers of Gentoo users. To avoid such 
problems all files provided and used by Gentoo need to be identifiable 
as "correct" - we need integrity checks.

An attacker should not be able to easily circumvent these checks. There 
are some attacks that can't be prevented, so we also have to see the 
practical limits of any scheme we define - for example an attacker
could 
be a Gentoo dev with full access to all ressources, stopping that
person 
will be more difficult (if not impossible) than stopping a random
script 
kiddie that hax0rs a distfile mirror with a 0-day exploit.

The files
=========

There are two groups of files at the moment that need to be secured:
- distfiles: The large archives of source code and binary blobs from 
which we install a package
- "the tree": metadata, ebuilds and patches containing all the 
information to manage the local software installation.

The default distribution methods are rsync for the tree and http/ftp
for 
distfiles. As there are too many users for a single server theservers
are 
provided by external contributors and are not directly controlled by 
Gentoo. In almost all cases a fallback to the original download
location 
of a file is provided.

The Attacker
============

Any security policy has to take into account how strong an attacker is. 
For example securing against your grandmother with checksums signed by 
multiple independent persons is most likely overkill. A simple checksum 
would most likely be enough there.
On the other end of the spectrum we have aliens that can crack any 
encryption scheme in roughly two minutes, obviously we can't do
anything 
to really stop them.

What attackers are then reasonable?
- the script kiddie that takes over one single mirror
- a large multinational monopolist that tries to sabotage any potential 
competitors
- a mirror operator that has a bad days and manipulates files for fun
- a really strong hax0r that takes over the Gentoo CVS server
- a social hacker that takes a dev hostage and forces that dev to
insert 
evil bad data

This is by far not a complete list, it should only help with figuring 
out what can go wrong.

Now let's classify the attackers:
* local attacker ("your roommate") - nothing we can defend against,
your 
responsability.
* single compromised mirror - only with checksums can this be found. If 
the checksums are distributed on a different path than the distfiles 
a single compromised mirror has a very low impact as checksums don't 
match.
* compromised rsync mirror - now the checksums can be forged. The 
attacker will have to change the SRC_URI too so that only the 
compromised distfiles are transferred. Also changes in the ebuilds must 
be considered - a "rm -rf" in the right place in an ebuild will have a 
large impact and can't be caught with checksums (since those could be 
forged by the same attacker). We need signed checksums here.
* compromised developer - this is hard to detect, but once detected all 
files involved can be checked and corrected. The impact of this is very 
high, it is very difficult to avoid. (So we just assume that no dev
will 
go berserk and look for low-impact methods that allow us to clean up if 
that ever happens)

Note: a possible defense against rogue devs would be multi-signing, i.e.
having all commits checked by at least one other person. This does not
help much as there can be collusion between devs and the impact on all
devs is very high. It would effectively deadlock Gentoo and prevent any
useful progress.

Defense methods
===============

1) Checksums
A Checksum is a one-way function that returns a constant-length 
identifier. The checksum is designed so that changing one bit in the 
input totally changes the output (quite simplified, but that's all that 
matters). Thus any changes to a file lead to a bad checksum, finding a 
collision (two files with the same checksum) is hard.
Some checksum algorithms have known weaknesses, so relying on a single 
algorithm is not advised. For example MD5 suffers from precomputation 
attacks where one can generate two files with equal checksums (but it
is 
not possible to find a matching second file to a given file).

2) Signatures
Using GPG it is possible to sign a file. The signature is similar to a 
checksum, but it can only be created with a private key that is kept 
secret. The public key allows to verify this signature. Deducing the 
private key from the public key is hard to do. (very simplified)
The public key is provided online, in a keyring (collection of keys) or
included in the downloadable media. If the public key is trusted it can
be used to verify that all files have a correct signature, effectively
saying that the files are exactly the same as the ones committed by a
dev.

Some readers may point out that it doesn't prevent a dev from injecting
"bad" files and sign them, but this prevents tampering by external
parties. 

3) Manifest / Manifest2

This is an implementation of a checksum / signature scheme. It is
described in GLEP 44:

http://www.gentoo.org/proj/en/glep/glep-0044.html

Right now SHA1, SHA256, RMD160 are the default checksum algorithms

While manifest2 should take care of all executable bits in the tree it
does not yet cover eclasses and profiles. As long as this is not taken
care off any attacker can just override an eclass on the rsync mirror or
modify the profiles. This has a severe negative impact on signing
effectivity.

Any "good" solution should sign all data files in the tree, so I ask for
an extension of the Manifest2 protocol to include _every_ data file with
no exception.

Key policies
============

To make signing relevant and verifiable all devs should use the same
parameters - key length, key type, validity.
Once that is agreed upon a key distribution strategy is needed so that
users can get the key(s) on a verifiable path.

Signing strategies
==================

Once there is an agreement on what files to sign with what kind of keys
there remains the question how to sign it. There are at least three
strategies:

Method "simple":
----------------

Use one central key that is kept on a secure box. Signing is done 
automatically after a commit. The key distribution is simple since
there 
is only one key that has to be pushed.
Problems are security (single point of failure, single target for 
compromising)

Method "complex":
-----------------

Let every dev sign the files he adds or modifies. A keyring is 
maintained on gentoo infrastructure and is distributed over multiple 
paths.
Problems: Need support for multi-signing. If one file is added the 
manifest should not be only signed by the last editor, only the change 
should be signed. At the same time it needs to be kept simple and fast, 
ssigning each file on its own or keeping infinite history must be 
avoided. Keyring managment needs to be defined. Key revocation etc. 
needs to be defined.

Method "hybrid":
----------------

Let every dev sign, add automatic server-side signing with a master key.
Gives you bits of both. Normal users can trust the master key.
Paranoid users can trust the dev keys.


Earlier Discussions:

http://article.gmane.org/gmane.linux.gentoo.devel/16876
2004.1 discussion

http://www.gentoo.org/proj/en/devrel/manager-meetings/logs/2004/20040531.txt
manager meeting

Some selected problems from there:

* Access Control Lists could be used so that only toolchain people can
commit to glibc. Do we want that level of micromanagment? Does it offer
any security benefits?

* key revocation may be impractical - what methods for handling retired
devs and rogue devs are there?

* how to verify from an install  CD ?

* in tree or out of band? Storing the keys in the tree is easy, but a
potential security problem

With this I hope to get the discussion started. There are many areas
where I am unsure what is the best strategy - every decision has obvious
disadvantages, either security, code complexity or developer workload.
Any solution should try to keep the workload low while offering the
highest level of security that does not halt all progress.

I hope that discussion can stay focussed on the implementation aspects.
When you suggest something (for example multiple signatures) please
explain what it gains us (protection against single rogue devs) and at
what price (having to sign everything by at least two persons). That
should make it easier to see if the workload impact of that idea is
worth it.

Take care,

Patrick

-- 
Stand still, and let the rest of the universe move

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to