Re: [gentoo-portage-dev] Changing the VDB format

Florian Schmaus Mon, 14 Mar 2022 08:35:20 -0700

On 14/03/2022 13.22, Fabian Groffen wrote:

Hi,


I've recently been thinking about this too.

On 13-03-2022 18:06:21 -0700, Matt Turner wrote:

The VDB uses a one-file-per-variable format. This has some
inefficiencies, with many file systems. For example the 'EAPI' file
that contains a single character will consume a 4K block on disk.
I recommend json and think it is the best choice because:


[snip]

- json provides the smallest on-disk footprint
- json is part of Python's standard library (so is yaml, and toml will
be in Python 3.11)
- Every programming language has multiple json parsers
-- lots of effort has been spent making them extremely fast.


I would like to suggest to use "tar".

Your idea sounds very appealing and I am by no means an expert to thetar file format buthttps://www.gnu.org/software/tar/manual/html_node/Standard.html states

"""

…an archive consists of a series of file entries terminated by anend-of-archive entry, which consists of two 512 blocks of zero bytes.

"""

and the Wikipedia entry of 'tar' [1] states

"""

Each file object includes any file data, and is preceded by a 512-byteheader record. The file data is written unaltered except that its lengthis rounded up to a multiple of 512 bytes.

"""

and furthermore

"""

The end of an archive is marked by at least two consecutive zero-filledrecords.

"""

Which sounds like a lot of overhead if no compression is involved. Notsure if this can be considered a knock out criteria for tar.


- Flow

Re: [gentoo-portage-dev] Changing the VDB format

Reply via email to