Hi, the following is an email written by Wookey and myself.
0. Introduction =============== The Debian bootstrap build ordering tool Google Summer of Code project [1] was continued even after the summer ended and recently reached a new milestone by being able to create a final build order from a dependency graph [2] for Debian Sid. By now, all important tools and algorithms have been written [5] to solve the following problems: - find source packages to which build profiles (reduced build dependencies) should be added - given enough source packages annotated with build profiles, generate a final build order which produces a full Debian archive from zero, starting from cross compiling a minimal system and natively compiling the rest, breaking build dependencies as necessary (and as possible) Since Debian source packages do not (yet) contain enough meta data to decide whether or not a build dependency can be dropped, USE flags of Gentoo source packages were harvested [3] [6]. On top of that, suggestions from Thorsten Glaser, Patrick McDermott and Daniel Schepler were used. This way, our current results are hopefully not too far away from reality, While the theoretical results do look consistent, this has so far not been completed in practice due to the following open issues: 1. missing multiarch annotations prevent the multiarch cross build dependencies of some source packages from being resolved correctly 2. not all source packages of the minimal build system are cross compilable in practice yet 3. no decision has been made on the syntax of the new control fields (build profiles) which are required for automated bootstrapping 4. not enough source packages implement build profiles (this depends on 3 being solved) More details on this scheme are given at the DebianBootstrap wiki page [8]. Work has been going on for a couple of years on this, evolving as practical experience was gained, and input taken from more people. We therefore make the following proposals (field names not set in stone) in descending order of importance for us: 1. Build-Profiles ================= The build profile format was proposed by Guillem Jover together with other solutions he presented in this document [7] as part of bug#661538. Build profiles extend the Build-Depends format with a syntax similar to architecture restrictions but using < and > instead. Build-Depends: huge (>= 1.0) [i386 arm] <!embedded !stage1>, tiny The build dependency "huge" would not be required by the source package if it is built in the "embedded" or "stage1" profile. This mechanism neatly allows for removed build-deps, replaced build-deps and added build-deps, and an arbitrary number of possible 'profiles'. Besides bootstrapping, these build profiles could also be used for embedded builds, and to allow for changed buil-deps when cross-building. One could also imagine that DEB_BUILD_OPTIONS=nodocs could be replaced by a profile called "nodocs". Patches for dpkg (bug#661538) and dose3 implementing this syntax already exist. This scheme supersedes an earlier version, (referred-to as 'staged' builds), which used repeated Build-depends-StageN: lines. See the dpkg bug#661538 for the evolution of this. The profile labels are arbitrary but agreement on label usage is necessary. For bootstrap automation we have been using 'stage1', 'stage2', etc which fits with existing custom in packages which already have such internal mechanisms using DEB_STAGE (currently gcc, eglibc, libselinux, gcj, gnat, gdc, linux [9]) These seem like sensible names so we propose to stick with them. Other useful profiles can be defined in the future. The drawback of this syntax is that Build-Dep parsing tools need to be updated to read/accept it, so uploads of source containing these annotations cannot be done until the dpkg in buildds at least parses it. 2. Build-Profiles (extension 1) =============================== When a source package is built with fewer build dependencies (cross, embedded, stage1, nodocs...), then it often happens that it does not build one or more of its binary packages at all (e.g. foo-gtk, foo-java, foo-doc). While this is a minor nuisance during a half automated bootstrap, a fully automated bootstrapping process needs to know which binary packages a source package does not build if it is compiled in one of its profiles. We therefore propose a new field for binary packages in their control file which indicates for which profiles it builds. Builds-With-Profile: !stage1 !embedded Different profile names are separated by spaces similar to the Architecture field. A binary package with the above field would not be built during the profile builds "stage1" or "embedded". Binary packages which do not have this field would default to being built by every profile. This field would mean a minor change to dpkg-gencontrol. 3. Build-profiles (extension 2) =============================== A build profile is set either using a DEB_ environment variable or a command-line option. DEB_STAGE has been used historically in a few packages with staged build support, but that is specific to the staged-builds purpose. For the more generic build-profiles DEB_BUILD_PROFILE=<label> is proposed instead - (only 7 existing packages would need to be changed - patches exist for some already). Setting the build-profile causes dpkg-checkbuilddeps to use the modified deps, dpkg-gencontrol to mark the built package with a new field: Built-With-Profile: stage1 cross This new field is optional and just meant to mark binary packages such that they can not accidentally make it to the archive. Another idea is to encode this information in the package version by adding a ~stage1. Using the field is more powerful as source packages can also be built with multiple profiles activated at once and the field can store a list of profile names. In above example, the binary package was built with the cross profile activated for cross compilation and the stage1 profile activated to break a build dependency cycle. While this field is meant to make sure not to allow any profile built binary package to be uploaded to the archive, it can also be abused to only allow "some" build profiles to be uploaded. For example ubuntu might generally forbid profile built binary packages to be uploaded except for packages built with the "ubuntu" profile only. Or emdebian might allow binary packages using the "embedded" profile. This would allow unified binary packages which are able to build for different targets. As only one unified binary package can satisfy the needs of different purposes this can improve the quality of the package as only one codebase has to be maintained. We already have this (using dpkg-vendor) where changes only affect the rules file, but as soon as build-dependency changes are needed that mechanism is insufficient. This usage of build profiles is not part of this proposal but one of the possibilities they offer besides allowing automated bootstrapping. 4. Unified field for extensions 1 & 2 ===================================== The Architecture field contains different information depending on its context [10]. The syntax of profiles behaves similar to those of architecture specifiers. An alternative name for the field names of the last two items would therefore be a unified "Profile" field whose meaning depends on its context: Profile: !stage1 !embedded Profile: stage1 cross The first one would appear in binary packages in debian/control and indicate which binary packages do or do not build with a specific profile. The second one would appear in DEBIAN/control (the built binary package) and indicate with which profiles the binary package was built. This is our favoured option as Build-with-profile/Built-with-profile will only be confused anyway, and if it works for 'Architecture' there seems no reason why it's not sensible for 'Profile'. 5. Cross-Builds field ===================== For even further automation and also for quality assurance, we propose another new field for source packages which indicates whether or not this source package is supposed to be cross compilable. If Debian wants to incorporate the ability to being bootstrappable in its policy, then there *must* be some packages which are cross compiled for a minimal build system. Adding this header to those source packages would make it a bug if they do not actually cross compile. Without this header, cross compilation would be wishlist as usual. Furthermore, cross compilation is one of the methods a porter can use to break build dependency cycles. If some packages carry this new field then not only could a porter decide quicker whether or not a source package is cross compilable, an algorithm could also incorporate this information to automatically break build dependency cycles for cross compilation. Some naming ideas: Cross-Builds: Yes Does-Cross-Build: Yes Allows-Cross-Build: Yes 6. Conclusion ============= If more automated bootstrapping of Debian is desired, then at least build profiles (1.) should be introduced. For a fully automated bootstrapping of Debian, the second item (extension 1) is needed as well. The third item (extension 2) prevents accidental upload of binary packages that have not been built fully. The last item (5.) adds further convenience to the process but is not strictly needed. We will now make an argument how Debian will benefit from allowing a fully automated bootstrapping process: - obvious: it's the simplest possible way to bootstrap Debian for new architectures - no need for other distributions in the bootstrapping process (make Debian genuinely "universal") - better update lagging architectures - build packages for architectures that cannot build themselves - allow easy sub-arch builds, optimized for a specific CPU - continuously check the archive for bootstrappability as a QA measure This mechanism also covers cross-compiler bootstraping. The eglibc, gcc, and kernel packages already have the neceassary staged-build info, but the build profiles (1.) part is also needed to specifiy the reduced build-deps. The cross-toolchain bootstrap ceases to be a special case if treated this way and just becomes packages to be built in stages using the profiles mechansim, like many others in the base system (but for build arch taregtting host, arch, rather than built for host-arch). See the wiki article at [11]. The question remains of how many source packages would have to have the proposed new fields added to them to make a full bootstrap from zero possible. If the Gentoo USE flags were not too far off and assuming or tools do the correct thing so far, then: - the number of source packages that has to be modified with the new fields is at maximum 83 (there might be a smaller set). Another argument why a fully automated bootstrap of Debian might be necessary is the growing problem size over the years [4]. If this trend continues it will only become harder to implement the necessary meta data in the future. If enough meta data is introduced now to make a fully automated bootstrap possible, then any subsequent work will only have to be incremental to that. The main questions to this list are: - should Debian be bootstrappable in a fully automated fashion? We created the algorithms that can allow this to happen, we just need more meta data and a way to encode it - do the proposals for the needed fields sound convincing? Can they be improved? Do they have fundamental flaws? cheers, josch and Wookey Thanks to Thorsten Glaser and Patrick McDermott for feedback, and numerous others along the way for help developing this scheme. [1] http://wiki.debian.org/SummerOfCode2012/StudentApplications/JohannesSchauer [2] http://lists.mister-muffin.de/pipermail/debian-bootstrap/2012-November/000425.html [3] http://blog.mister-muffin.de/2012/10/10/using-gentoo-to-find-reduced-build-dependencies-for-debian-source-packages/ [4] http://blog.mister-muffin.de/2012/10/13/does-it-become-harder-to-bootstrap-debian-/ [5] https://gitorious.org/debian-bootstrap/bootstrap [6] https://gitorious.org/debian-bootstrap/gen2deb [7] http://www.hadrons.org/~guillem/debian/docs/embedded.proposal [8] http://wiki.debian.org/DebianBootstrap [9] http://codesearch.debian.net/search?q=DEB_STAGE [10] http://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Architecture [11] http://wiki.debian.org/MultiarchCrossToolchains -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130115181840.GA16417@hoothoot