On Sat, Aug 9, 2025 at 2:50 AM Vincent Lefevre <[email protected]> wrote: > On 2025-08-09 01:30:52 -0700, Michael Paoli wrote: > > < tar.xz | xz -d | tar tf - > > With tar utilities that support xz (like GNU tar), not using "xz -d" > could be more efficient as "xz -d" will uncompress the whole file > while this may not be necessary: > > tar tf file.tar.xz > > is sufficient. This may allow one to skip xz blocks if the archive > contains big files. That said, I don't know whether GNU tar has > such an optimization. I rather doubt any tar implementation has such an optimization. I don't think there's any tar format that has an "index" or the like, it's generally just a tar header, then for each file (of any type), specific header for that, and any associated data, and I think there's also some type of end marker or the like, perhaps with a bit of metadata at the end too. And there may be some type of marker or the like at the end of the archive too. But I think that's basically it. And if compression is used, same as the compression programs would do, additional header, compressed data and however they handle that, and likely some type of end marker, at least for most compression formats, and that's it. So, even if tar is requested to restore a single file, and has gotten to the point in archive where it's extracted that file, that doesn't mean it can quit at that point. Alas, same file/pathname may appear again later in the archive, with same or differing data.
So, e.g.: $ tar -tf tar f this_could_be_a_huge_file f $ tar -tvf tar -rw------- michael/users 0 2025-08-09 19:20 f -rw------- michael/users 0 2025-08-09 19:20 this_could_be_a_huge_file -rw------- michael/users 1 2025-08-09 19:24 f $ Exact same pathname f, two different sets of contents and mtimes. So, after tar has read past the first file, into the second, it doesn't know if the pathname of the first repeats, with same, or differing content. So in most all circumstances it will continue, in fact reading through the entire archive. But some versions of tar may have option to shortcut that. E.g. bsdtar has -q option, GNU tar may have similar. Not sure if there exists any tar that can extract or list only the nth occurence of the same pathname, e.g. if there are 3 occurrences of the same pathname, to request extracting only the 2nd occurrence. But certainly such could be done, e.g. at least lower level libraries would have access to that data, and could be requested to handle that accordingly. So, maybe there well exists a tar implementation or utility that already has convenient option or the like to be able to do that. See also tar's [-]r option. Note also that such can be highly practical. E.g. in our example, if f is small/tiny file, our other file is huge, we earlier created tar archive with first f in it, then the huge file. Now f has changed, and we want to update the archive - but we don't want to have to reread the data of our huge file again or have to overwrite all that data to the tar file. Well, we can simply append a backup of f to the existing tar archive. And when extracting, at least by default, each occurrence of f will be extracted, and the last extraction of that will generally clobber any earlier extractions of such.

