On 14/03/2022 13.22, Fabian Groffen wrote:
Hi,

I've recently been thinking about this too.

On 13-03-2022 18:06:21 -0700, Matt Turner wrote:
The VDB uses a one-file-per-variable format. This has some
inefficiencies, with many file systems. For example the 'EAPI' file
that contains a single character will consume a 4K block on disk.
I recommend json and think it is the best choice because:

[snip]

- json provides the smallest on-disk footprint
- json is part of Python's standard library (so is yaml, and toml will
be in Python 3.11)
- Every programming language has multiple json parsers
-- lots of effort has been spent making them extremely fast.

I would like to suggest to use "tar".

Your idea sounds very appealing and I am by no means an expert to the tar file format but https://www.gnu.org/software/tar/manual/html_node/Standard.html states

"""
…an archive consists of a series of file entries terminated by an end-of-archive entry, which consists of two 512 blocks of zero bytes.
"""

and the Wikipedia entry of 'tar' [1] states

"""
Each file object includes any file data, and is preceded by a 512-byte header record. The file data is written unaltered except that its length is rounded up to a multiple of 512 bytes.
"""

and furthermore

"""
The end of an archive is marked by at least two consecutive zero-filled records.
"""

Which sounds like a lot of overhead if no compression is involved. Not sure if this can be considered a knock out criteria for tar.

- Flow



Reply via email to