On Sun, May 14, 2006 at 05:19:29PM +0100, Scott James Remnant wrote: > > The PDF is the first time I've ever sat down and wrote, in one document, > what I've been thinking about for the last couple of years. While I > think it's pretty neat, I'm hoping others will be able to find holes or > problems with it -- or improvements they can make.
This is very exciting. From your introduction in dpkg2.pdf, it sounds like you have come to the same conclusions we have about the first-generation package managers (dpkg and RPM): they served us well, but it's time to move to something new. Or maybe it was too many years of hearing how "APT (sic) is better than RPM." Erik Troan (who together with Marc Ewing created RPM), Michael K. Johnson, and I set out to redesign a next generation software manager in February 2004. We took the lessons learned from our experience at Red Hat developing many distributions using RPM, new philosophies from groups like Gentoo on build flexibility, and fresh ideas on building a distributed system. Many of the concepts you have in the design paper are things we have implemented. This is a good sign: two projects from different background and approaches seem to be converging on good ideas. Our work turned into Conary - a distributed software configuration management system. There's some information at http://wiki.conary.com/. Some (somewhat dense) reading material can be found at http://wiki.conary.com/DocumentsAndPresentations and http://wiki.conary.com/ConaryPresentations. The OLS papers are probably the most helpful. Enough background about Conary; let me share some thoughts from reading dpkg2.pdf. __ On Source and Binary Formats: definitely decouple the build mechanism from the package manager itself. You should be able to use files in debian/, .spec files, .ebuild files, etc. to drive the process of taking sources and turning them into binaries. Conary takes this approach as well, though no backends other than our own .recipe format have been implemented. While splitting source building from package management is important, we think that managing sources and binaries in a unified system is important. With Conary, you always have the sources available, they're versioned using the same version tree as the binaries, and you can always reproduce any particular binary from the sources since they're managed the same way. We did not use an existing tar or cpio-like archive for our on-disk format. Conary manages the system by applying changesets to it. Since these changesets are relative to what's already installed on the system, existing archive formats don't make sense. We do use SHA1 to check the integrity of the file contents and OpenPGP signatures on the metadata. __ On Atomic Operation: I'm interested to know how you're planning to do one atomic operation on the filesystem to move a package from being staged to being installed. As you stage new file contents to disk, possibly writing them along side existing files, you have to rename() each one individually. There's no way to do this in one atomic operation (that I know of) without help from the OS. In Conary, we take any changeset that is applied to the system and reverse it. We store the reversed changeset as a "rollback". Reverting an operation simply applies the rollback changeset. __ On Focus on Installed Packages: I see no need to keep a record of available packages in the package manger. I do see a need to keep at least a log of removed packages on the system. __ On Unpacking: The approach you've outlined is very similar to how things work in Conary. We don't currently support registering non-packaged files' metadata. __ On Filters: This is a neat idea. We don't have anything exactly like it. I think that the usefulness will depend on what metadata is made available to the filter. __ On Classes: Though the mechanism you've described is somewhat different than what we've implemented, Conary implements this. We call them "tags". Files are tagged to be of a certain class, or type. For example, we have an "initscript" tag. The system has a "tag handler" that knows what to do with a file that is an "initscript". All the actions that are needed to register the initscript with the system are stored in one place - NOT in every package. __ On Removing: Sound concepts here. Our backups are stored in the rollback files, fwiw. __ On Hooks: We talked about having something like this, but we have not found a need. Tag handlers have been sufficient thus far. __ On Fundamentals: It sounds like the "variant" in your design is the "flavor" in Conary. Every installable object in the Conary system is identified by its name, version, and flavor. The flavor says how the object was built. For example, since a package like lynx can be built with or without support for SSL, you might have one "lynx" version "2.8.5" flavor "with ssl" and one "lynx" version "2.8.5" flavor "without ssl". I recommend against having variant as optional, since (name, version, variant) is essentially a primary key in the system. If there is no relevant variant information, let a blank variant explicitly express that. __ On Architecture: I think that relying solely on the dependency mechanism alone for architecture handling could be a mistake. When tools are trying to filter packages down to ones that are suited for a particular target, you don't want to have to sift through dependencies to determine the fitness. In Conary we include the architecture information as part of the flavor. So, extending the example above, you have "lynx" version "2.8.5" flavor "is: x86 ssl" (where "is" stands for "instruction set"). We also have (now using a more Conary-specific notation) "linux=2.8.5[is: x86_64 ssl]" This is critical to narrow the forest of available packages to ones that will be suited for the target. Optimizations are represented by instruction set "flags". For example, if you have a mplayer binary that utilizes sse2 instructions, the flavor on the binary is "is: x86(sse2)". This says that the binary requires sse support to operate properly. If mplayer can _optionally_ use sse2 if it detects it, the flavor on the binary is "is: x86(~sse2)". This allows the score for a package that dynamically supports sse2 to be higher on a sse2-capable system. We also use dependencies to ensure that a package will run correctly on the system. All ELF binaries have an ABI in them. We record the ABI as a dependency for the file. For example, on x86 the dependency is "abi: ELF32(SysV x86)". On x86_64 it's "abi: ELF64(SysV x86_64)". Virtualization technology that provides the capability of running "abi: ELF32(SysV x86)" binaries simply provides "abi: ELF32(SysV x86)". __ On Dependencies: Please, no more dependency types. It makes calculating solutions for dependency closure extremely hard. I think that the focus should be on getting dependencies right. Additional information (Enhances, Suggests) should be part of metadata, so that frontends with more complex solution algorithms can use them. In Conary, provides and requires that are architecture specific are explicitly so. That is, if something provides "libc.so.6" on a 32-bit system, the Provide is: "soname: ELF32/libc.so.6(GCC_3.0 GLIBC_2.0 GLIBC_2.1 GLIBC_2.1.1 GLIBC_2.1.2 GLIBC_2.1.3 GLIBC_2.2 GLIBC_2.2.1 GLIBC_2.2.2 GLIBC_2.2.3 GLIBC_2.2.4 GLIBC_2.2.6 GLIBC_2.3 GLIBC_2.3.2 GLIBC_2.3.3 GLIBC_2.3.4 GLIBC_PRIVATE SysV x86)" (note the support for ABI versioning). If something requires 32-bit libc.so.6, it may be something like: "soname: ELF32/libc.so.6(GLIBC_2.0 GLIBC_2.1 GLIBC_2.2 GLIBC_2.3 SysV x86)", which says that we need the ELF32, SysV x86 ABI, libc.so.6 with ABI versions GLIBC_2.0 GLIBC_2.1 GLIBC_2.2 GLIBC_2.3. If a dependency is not architecture specific, a 32-bit package that provides a dependency can solve the requirement in a 64-bit package. __ On Features: We call this "capabilities". If you require a thread-safe version of the sqlite library, you might have the threadsafe version of sqlite provide "sqlite(threadsafe)". The package that requires it would do the same. __ On Configuration File Merging: Conary does this by saving the pristine file in the Conary database. Changesets that are changing config files contain a diff to apply. A three way merge is used to preserve changes. In fact, all the aspects of a file are preserved when doing an update. For example, if a security paranoid sysadmin wanted to turn of suid root in /bin/ping, all (s)he needs to do is "chmod u-S /bin/ping". This change is also considered a local modification that is merged in with changes contained in a changeset. __ On Multi-Arch: You note that your solution requires packages to be modified so that packages do not contain common files. We've solved this problem through policy that runs at package creation time. One policy we have breaks packages down into "components". For example, the glibc package is made up of the "glibc:runtime", "glibc:lib", "glibc:devel", "glibc:devellib", "glibc:doc", and "glibc:locale" components. These components are created automatically by the policy. When you're running on a 64-bit system and you want to be able to run a 32-bit program, you only need glibc:lib. Policy makes sure that non-architecture specific files that are in paths like /usr/lib (which would conflict if you had them in both a 32-bit and a 64-bit version of glibc:lib). When building on a target that does not use /lib, /usr/lib, etc, policy automatically moves files installed in the wrong path to the right one (from /lib to /lib64, for example). Installing x86 binaries on a system like PPC64 (which already has 32-bit PPC libraries in /lib and 32-bit PPC binaries in /usr/bin) is something that Conary does not handle automatically (yet). __ Overall, I'm very impressed with what you've come up with. However, I'd like to see a few more things tackled. One problem that plagued RPM was dealing with distribution upgrades. As we removed one package in favor for another (xinetd replacing inetd, for example), we relied on Obsoletes to do the right thing. But sometimes there wasn't anything that was replacing a particular package. It just was no longer needed in the repository. This is one (of many) problems that factored into including group support in Conary. Updating from one major release of a Conary-managed distribution to another is a simple matter of updating the group that defines what's in version 1 of the distribution to the group that defines what's in version 2. We've even successfully migrated a running system from one Conary-managed distribution (rPath Linux) to another (Foresight Linux) with only minor issues (things I think we've since fixed). Second, I'm very interested in utilizing the package manager to help address Debian derivatives. If all the packages (both binaries and sources) in Debian were managed in a repository system, then Ubuntu could very simply add any additional patches they want on a distributed branch of Debian in their own repository. Re-basing Ubuntu on a new version of Debian would be a branch merge operation in the Ubuntu repository. This is working extremely well in Conary. We maintain rPath Linux in our conary.rpath.com repository. Foresight Linux (which concentrates on bleeding edge GNOME and desktop technology integration) automatically inherits all the work we do on packages that they don't modify. GNOME packages are on a branch in their own repository. There's an opportunity to radically change the way that distributions are put together, much in the same way that BitKeeper and GIT revolutionized kernel development. As soon as people were able to create a remote distributed repository for doing kernel work and easily merge the changes from one repository to another, kernel development accelerated enormously. It's time to apply the same methods to the entire distribution so that people can work on building complete, integrated systems and share the common work between projects. If anyone has questions about the technical details of Conary, please feel free to email me. I'm looking forward to what comes out of the dpkg 2.0 discussion. Cheers, Matt -- Matt Wilson Founding Engineer rPath, Inc. [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

