On 17-11-2018 12:21:40 +0100, Michał Górny wrote:
> Problems with the current binary package format
> -----------------------------------------------
> 
> The following problems were identified with the package format currently
> in use:
> 
> 1. **The packages rely on custom binary archive format to store
>    metadata.**  It is entirely Gentoo invented, and requires dedicated
>    tooling to work with it.  In fact, the reference implementation
>    in Portage does not even include a CLI tool to work with tbz2
>    packages; an unofficial implementation is provided as part
>    of portage-utils toolkit [#PORTAGE-UTILS]_.

I think you should rewrite this section to the argument that the
metadata is hard to edit, and that there is only one tool to do so
(except a python interface from Portage?).
On a separate note, I don't think portage-utils can be considered
"unofficial", it is a Gentoo official project as far as I am aware.

> 2. **The format relies on obscure compressor feature of ignoring
>    trailing garbage**.  While this behavior is traditionally implemented
>    by many compressors, the original reasons for it have become long
>    irrelevant and it is not surprising that new compressors do not
>    support it.  In particular, Portage already hit this problem twice:
>    once when users replaced bzip2 with parallel-capable pbzip2
>    implementation [#PBZIP2]_, and the second time when support for zstd
>    compressor was added [#ZSTD]_.

I think this is actually the result of a rather opportunistic
implementation.  The fault is that we chose to use an extension that
suggests the file is a regular compressed tarball.
When one detects that a file is xpak padded, it is trivial to feed the
decompressor just the relevant part of the datastream.  The format
itself isn't bad, and doesn't rely on obscure behaviour.

> 3. **Placing metadata at the end of file makes partial fetches
>    complex.**  While it is technically possible to obtain package
>    metadata remotely without fetching the whole package, it usually
>    requires e.g. 2-3 HTTP requests with rather complex driver.  For
>    comparison, if metadata was placed at the beginning of the file,
>    early-terminated pipeline with a single fetch request would suffice.

I think this point needs to be quantified somewhat why it is so
important.
I may be wrong, but the average binpkg is small, <1MiB, bigger packages
are <50MiB.
So what is the gain to be saved here?  A "few" MiBs for what operation
exactly?  I say "few" because I know for some users this is actually not
just a blib before it's downloaded.  So if this is possible to achieve,
in what scenarios is this going to be used (and is this often?).

> 4. **Extending the format with OpenPGP signatures is non-trivial.**
>    Depending on the implementation details, it either requires fetching
>    additional detached signature, breaking backwards compatibility or
>    introducing more custom logic to reassemble OpenPGP packets.

I think one could add an extra key to the xpak that holds a gpg sig or
something.  Perhaps this point is better phrased as that current binpkgs
don't have any validation options defined.

> 5. **Metadata is not compressed.**  This is not a significant problem,
>    it is just listed for completeness.
> 
> 
> Goals for a new container format
> --------------------------------
> 
> The following goals have been set for a replacement format:
> 
> 1. **The packages must remain contained in a single file.**  As a matter
>    of user convenience, it should be possible to transfer binary
>    packages without having to use multiple files, and to install them
>    from any location.
> 
> 2. **The file format must be entirely based on common file formats,
>    respecting best practices, with as little customization as necessary
>    to satisfy the requirements.**  In particular, it is unacceptable
>    to create new binary formats.

I take this as your personal opinion.  I don't quite get why it is
unacceptable to create a new binary format though.  In particular when
you're looking for efficiency, such format could serve your purposes.
As long as it's clearly defined, I don't see the problem with a binary
format either.
Could you add why it is you think binary formats are unacceptable here?

> 3. **The file format should provide for partial fetching of binary
>    packages.**  It should be possible to easily fetch and read
>    the package metadata without having to download the whole package.

Like above, what is the use-case here?  Why would you want this?  I
think I'm missing something here.

> 4. **The file format must provide support for OpenPGP signatures.**
>    Preferably, it should use standard OpenPGP message formats.
> 
> 5. **The file format must allow for efficient metadata updates.**
>    In particular, it should be possible to update the metadata without
>    having to recompress package files.
> 
> 6. **The file format should account for easy recognition both through
>    filename and through contents.**  Preferably, it should have distinct
>    features making it possible to detect it via file(1).
> 
> 7. **The file format should allow for metadata compression.**
> 
> 8. **The file format should make future extensions easily possible
>    without breaking backwards compatibility.**

-- 
Fabian Groffen
Gentoo on a different level

Attachment: signature.asc
Description: PGP signature

Reply via email to