Hello all, I flood you again with a looooong email. Apologies to all that don't want to read so much, but it is a problem of rather high importance that has not really been fixed, and the first discussions happened in 2003 as far as I can tell. Time to FIX IT!!!
The problem, in short, is how to handle the checksumming and signing of gentoo-provided files so that manipulation by external entities becomes difficult. I expect many disagreements on the "best" strategy to implement, but I hope that a sensible compromise will be reached so that this can finally be implemented. All the lazy people may stop reading here ;-) Short overview: The Problem The Attacker Defending Policies and open problems The Problem: ============ A malicious person could modify the files provided by Gentoo to manipulate and take over the computers of Gentoo users. To avoid such problems all files provided and used by Gentoo need to be identifiable as "correct" - we need integrity checks. An attacker should not be able to easily circumvent these checks. There are some attacks that can't be prevented, so we also have to see the practical limits of any scheme we define - for example an attacker could be a Gentoo dev with full access to all ressources, stopping that person will be more difficult (if not impossible) than stopping a random script kiddie that hax0rs a distfile mirror with a 0-day exploit. The files ========= There are two groups of files at the moment that need to be secured: - distfiles: The large archives of source code and binary blobs from which we install a package - "the tree": metadata, ebuilds and patches containing all the information to manage the local software installation. The default distribution methods are rsync for the tree and http/ftp for distfiles. As there are too many users for a single server theservers are provided by external contributors and are not directly controlled by Gentoo. In almost all cases a fallback to the original download location of a file is provided. The Attacker ============ Any security policy has to take into account how strong an attacker is. For example securing against your grandmother with checksums signed by multiple independent persons is most likely overkill. A simple checksum would most likely be enough there. On the other end of the spectrum we have aliens that can crack any encryption scheme in roughly two minutes, obviously we can't do anything to really stop them. What attackers are then reasonable? - the script kiddie that takes over one single mirror - a large multinational monopolist that tries to sabotage any potential competitors - a mirror operator that has a bad days and manipulates files for fun - a really strong hax0r that takes over the Gentoo CVS server - a social hacker that takes a dev hostage and forces that dev to insert evil bad data This is by far not a complete list, it should only help with figuring out what can go wrong. Now let's classify the attackers: * local attacker ("your roommate") - nothing we can defend against, your responsability. * single compromised mirror - only with checksums can this be found. If the checksums are distributed on a different path than the distfiles a single compromised mirror has a very low impact as checksums don't match. * compromised rsync mirror - now the checksums can be forged. The attacker will have to change the SRC_URI too so that only the compromised distfiles are transferred. Also changes in the ebuilds must be considered - a "rm -rf" in the right place in an ebuild will have a large impact and can't be caught with checksums (since those could be forged by the same attacker). We need signed checksums here. * compromised developer - this is hard to detect, but once detected all files involved can be checked and corrected. The impact of this is very high, it is very difficult to avoid. (So we just assume that no dev will go berserk and look for low-impact methods that allow us to clean up if that ever happens) Note: a possible defense against rogue devs would be multi-signing, i.e. having all commits checked by at least one other person. This does not help much as there can be collusion between devs and the impact on all devs is very high. It would effectively deadlock Gentoo and prevent any useful progress. Defense methods =============== 1) Checksums A Checksum is a one-way function that returns a constant-length identifier. The checksum is designed so that changing one bit in the input totally changes the output (quite simplified, but that's all that matters). Thus any changes to a file lead to a bad checksum, finding a collision (two files with the same checksum) is hard. Some checksum algorithms have known weaknesses, so relying on a single algorithm is not advised. For example MD5 suffers from precomputation attacks where one can generate two files with equal checksums (but it is not possible to find a matching second file to a given file). 2) Signatures Using GPG it is possible to sign a file. The signature is similar to a checksum, but it can only be created with a private key that is kept secret. The public key allows to verify this signature. Deducing the private key from the public key is hard to do. (very simplified) The public key is provided online, in a keyring (collection of keys) or included in the downloadable media. If the public key is trusted it can be used to verify that all files have a correct signature, effectively saying that the files are exactly the same as the ones committed by a dev. Some readers may point out that it doesn't prevent a dev from injecting "bad" files and sign them, but this prevents tampering by external parties. 3) Manifest / Manifest2 This is an implementation of a checksum / signature scheme. It is described in GLEP 44: http://www.gentoo.org/proj/en/glep/glep-0044.html Right now SHA1, SHA256, RMD160 are the default checksum algorithms While manifest2 should take care of all executable bits in the tree it does not yet cover eclasses and profiles. As long as this is not taken care off any attacker can just override an eclass on the rsync mirror or modify the profiles. This has a severe negative impact on signing effectivity. Any "good" solution should sign all data files in the tree, so I ask for an extension of the Manifest2 protocol to include _every_ data file with no exception. Key policies ============ To make signing relevant and verifiable all devs should use the same parameters - key length, key type, validity. Once that is agreed upon a key distribution strategy is needed so that users can get the key(s) on a verifiable path. Signing strategies ================== Once there is an agreement on what files to sign with what kind of keys there remains the question how to sign it. There are at least three strategies: Method "simple": ---------------- Use one central key that is kept on a secure box. Signing is done automatically after a commit. The key distribution is simple since there is only one key that has to be pushed. Problems are security (single point of failure, single target for compromising) Method "complex": ----------------- Let every dev sign the files he adds or modifies. A keyring is maintained on gentoo infrastructure and is distributed over multiple paths. Problems: Need support for multi-signing. If one file is added the manifest should not be only signed by the last editor, only the change should be signed. At the same time it needs to be kept simple and fast, ssigning each file on its own or keeping infinite history must be avoided. Keyring managment needs to be defined. Key revocation etc. needs to be defined. Method "hybrid": ---------------- Let every dev sign, add automatic server-side signing with a master key. Gives you bits of both. Normal users can trust the master key. Paranoid users can trust the dev keys. Earlier Discussions: http://article.gmane.org/gmane.linux.gentoo.devel/16876 2004.1 discussion http://www.gentoo.org/proj/en/devrel/manager-meetings/logs/2004/20040531.txt manager meeting Some selected problems from there: * Access Control Lists could be used so that only toolchain people can commit to glibc. Do we want that level of micromanagment? Does it offer any security benefits? * key revocation may be impractical - what methods for handling retired devs and rogue devs are there? * how to verify from an install CD ? * in tree or out of band? Storing the keys in the tree is easy, but a potential security problem With this I hope to get the discussion started. There are many areas where I am unsure what is the best strategy - every decision has obvious disadvantages, either security, code complexity or developer workload. Any solution should try to keep the workload low while offering the highest level of security that does not halt all progress. I hope that discussion can stay focussed on the implementation aspects. When you suggest something (for example multiple signatures) please explain what it gains us (protection against single rogue devs) and at what price (having to sign everything by at least two persons). That should make it easier to see if the workload impact of that idea is worth it. Take care, Patrick -- Stand still, and let the rest of the universe move
signature.asc
Description: This is a digitally signed message part