On 14/03/2022 13.22, Fabian Groffen wrote:
Hi,
I've recently been thinking about this too.
On 13-03-2022 18:06:21 -0700, Matt Turner wrote:
The VDB uses a one-file-per-variable format. This has some
inefficiencies, with many file systems. For example the 'EAPI' file
that contains a single character will consume a 4K block on disk.
I recommend json and think it is the best choice because:
[snip]
- json provides the smallest on-disk footprint
- json is part of Python's standard library (so is yaml, and toml will
be in Python 3.11)
- Every programming language has multiple json parsers
-- lots of effort has been spent making them extremely fast.
I would like to suggest to use "tar".
Your idea sounds very appealing and I am by no means an expert to the
tar file format but
https://www.gnu.org/software/tar/manual/html_node/Standard.html states
"""
…an archive consists of a series of file entries terminated by an
end-of-archive entry, which consists of two 512 blocks of zero bytes.
"""
and the Wikipedia entry of 'tar' [1] states
"""
Each file object includes any file data, and is preceded by a 512-byte
header record. The file data is written unaltered except that its length
is rounded up to a multiple of 512 bytes.
"""
and furthermore
"""
The end of an archive is marked by at least two consecutive zero-filled
records.
"""
Which sounds like a lot of overhead if no compression is involved. Not
sure if this can be considered a knock out criteria for tar.
- Flow