Heu Reuti,
Thanks for you input. I was afraid I'd need to use a workaround like this.
I like the idea of explicitely packing relative symlinks and use them to
determine the real target.
What bugs me is the idea to extract the files in order to determine,
whether this is the real deal or just a symlink I need to follow first.
If I understand tar correctly, it is stored in a header whether the
current file is a file or a symlink.
I think I'll extend your recommendation by a variant of "tar tvf
archive.tar targetfile".
Gotta find out, whether that is intended to be machine-readable or if
there's a cleaner approach to proping the files.
Thanks for your help
Aiyion
On 7/22/22 17:29, Reuti wrote:
Hi Aiyion,
Am 22.07.2022 um 10:13 schrieb Aiyion.Prime <[email protected]>:
Good morning everyone,
I thought I knew my way around tar for a few years now, but learned I'm wrong
about that yesterday evening:
I'm archiving a directory-structure, that does contain large redundant files.
onepath/readme
onepath/binaryblob13
anotherpath/readme
anotherpath/binaryblob13
I don't know your complete workflow, hence I can give only a vague idea:
Assuming you are using symlinks in the above structure:
• instead of archiving the complete directories recursively, create a list of
files to be saved for `tar`: first all symlinks (as symlinks), then all real
files
• on extraction --occurrence=1 will stop at the first encounter
• in case it's a symlink, remove the extracted symlink file and extract the
real file it points to with the name of the symlink file
This should speed up the processing.
-- Reuti
I cannot change the pathing, as this is to be fed to a packagemanager, that
requires it.
What I thought I could do, to not have an archive twice the size of
`binaryblob13`, was to use sym- or hardlinks and the `-h` flag for creation.
So archiving this:
onepath/
secondpath -> onepath/
using
tar --sort=name --owner=0 --group=0 --numeric-owner -chvf normal_sized.tar
secondpath onepath ${mtime})
That would work like a charm if said packagemanger would extract the whole
tarfile.
This is what it does though:
tar xf $tar_file secondpath/binaryblob13
And that works fine if I extract files from the directory first referenced in
the creation command (in the case above secondpath)
but returns an error for the latter directory I archived, as it tries to create
a hardlink on disk pointing to what would've been the former extracted file. As
it does not exist I've got a problem.
I'd like to avoid extracting all binaryblob13 references beforehand only to
have the link I extract point to something valid.
Is there a flag to tell tar "I dont care if you have to seacrh the archive twice,
but extract the original file instead of creating an (invalid) hardlink"?
I realize thats unuseable for actual tape-records, but maybe someone has a hint
for me here.
Thanks in advance and have a nice morning,
Aiyion