On 17-11-2018 12:21:40 +0100, Michał Górny wrote: > Problems with the current binary package format > ----------------------------------------------- > > The following problems were identified with the package format currently > in use: > > 1. **The packages rely on custom binary archive format to store > metadata.** It is entirely Gentoo invented, and requires dedicated > tooling to work with it. In fact, the reference implementation > in Portage does not even include a CLI tool to work with tbz2 > packages; an unofficial implementation is provided as part > of portage-utils toolkit [#PORTAGE-UTILS]_.
I think you should rewrite this section to the argument that the metadata is hard to edit, and that there is only one tool to do so (except a python interface from Portage?). On a separate note, I don't think portage-utils can be considered "unofficial", it is a Gentoo official project as far as I am aware. > 2. **The format relies on obscure compressor feature of ignoring > trailing garbage**. While this behavior is traditionally implemented > by many compressors, the original reasons for it have become long > irrelevant and it is not surprising that new compressors do not > support it. In particular, Portage already hit this problem twice: > once when users replaced bzip2 with parallel-capable pbzip2 > implementation [#PBZIP2]_, and the second time when support for zstd > compressor was added [#ZSTD]_. I think this is actually the result of a rather opportunistic implementation. The fault is that we chose to use an extension that suggests the file is a regular compressed tarball. When one detects that a file is xpak padded, it is trivial to feed the decompressor just the relevant part of the datastream. The format itself isn't bad, and doesn't rely on obscure behaviour. > 3. **Placing metadata at the end of file makes partial fetches > complex.** While it is technically possible to obtain package > metadata remotely without fetching the whole package, it usually > requires e.g. 2-3 HTTP requests with rather complex driver. For > comparison, if metadata was placed at the beginning of the file, > early-terminated pipeline with a single fetch request would suffice. I think this point needs to be quantified somewhat why it is so important. I may be wrong, but the average binpkg is small, <1MiB, bigger packages are <50MiB. So what is the gain to be saved here? A "few" MiBs for what operation exactly? I say "few" because I know for some users this is actually not just a blib before it's downloaded. So if this is possible to achieve, in what scenarios is this going to be used (and is this often?). > 4. **Extending the format with OpenPGP signatures is non-trivial.** > Depending on the implementation details, it either requires fetching > additional detached signature, breaking backwards compatibility or > introducing more custom logic to reassemble OpenPGP packets. I think one could add an extra key to the xpak that holds a gpg sig or something. Perhaps this point is better phrased as that current binpkgs don't have any validation options defined. > 5. **Metadata is not compressed.** This is not a significant problem, > it is just listed for completeness. > > > Goals for a new container format > -------------------------------- > > The following goals have been set for a replacement format: > > 1. **The packages must remain contained in a single file.** As a matter > of user convenience, it should be possible to transfer binary > packages without having to use multiple files, and to install them > from any location. > > 2. **The file format must be entirely based on common file formats, > respecting best practices, with as little customization as necessary > to satisfy the requirements.** In particular, it is unacceptable > to create new binary formats. I take this as your personal opinion. I don't quite get why it is unacceptable to create a new binary format though. In particular when you're looking for efficiency, such format could serve your purposes. As long as it's clearly defined, I don't see the problem with a binary format either. Could you add why it is you think binary formats are unacceptable here? > 3. **The file format should provide for partial fetching of binary > packages.** It should be possible to easily fetch and read > the package metadata without having to download the whole package. Like above, what is the use-case here? Why would you want this? I think I'm missing something here. > 4. **The file format must provide support for OpenPGP signatures.** > Preferably, it should use standard OpenPGP message formats. > > 5. **The file format must allow for efficient metadata updates.** > In particular, it should be possible to update the metadata without > having to recompress package files. > > 6. **The file format should account for easy recognition both through > filename and through contents.** Preferably, it should have distinct > features making it possible to detect it via file(1). > > 7. **The file format should allow for metadata compression.** > > 8. **The file format should make future extensions easily possible > without breaking backwards compatibility.** -- Fabian Groffen Gentoo on a different level
signature.asc
Description: PGP signature