On Oct 9, 2010, at 12:19 AM, Bob Proulx wrote: > > $ tar cvvf x.tar afile afile afile
This is a somewhat strange request you're making. I doubt many tar implementations have done anything to optimize this specific case (de-duping the argument list would be one strategy, although -C handling makes that a bit more complex than it sounds). > $ tar cvvf x.tar afile afile afile > -rw-rw-r-- bob/bob 32 2010-10-09 01:07 afile > -rw-rw-r-- bob/bob 32 2010-10-09 01:07 afile > -rw-rw-r-- bob/bob 32 2010-10-09 01:07 afile > $ tar cvvf x.tar afile afile afile > -rw-rw-r-- bob/bob 32 2010-10-09 01:07 afile > hrw-rw-r-- bob/bob 0 2010-10-09 01:07 afile link to afile > hrw-rw-r-- bob/bob 0 2010-10-09 01:07 afile link to afile These are both "reasonable" answers to your request. Neither is really wrong, so there's not really a bug here. The latter version is smaller (a link entry generally takes less space than a full copy of the file), but the former is easier to restore. Which one you see will depend heavily on which tar implementation you're using (GNU tar is just one of many) and how it optimizes detecting hard links. The basic strategy used by tar implementations for archiving hard links is to keep a table that maps dev/ino values to file names and create a hard link entry in the archive when the tar program sees something that's already in the table. The most straightforward implementation of this strategy would give you the output you listed second regardless of the number of actual links on the file. But such tables can get very large if you're using tar to backup a very large filesystem (a system with a billion files could easily require hundreds of gigabytes to store every filename). So tar implementations generally skip adding something to this table when the link count reported by the filesystem is 1. In the cases above, this explains the difference you're seeing. In the first case, nothing was entered into the internal table, so the tar program saw each "afile" as a separate file to be archived. Bumping the link count caused an entry to occur in the internal table and resulted in links being generated. You might also have seen the second "afile" get recorded as a link and the third be stored normally if the tar program had taken the further optimization of removing the internal table entry when the expected number of references had been seen. It's interesting to compare this with the strategies required when writing formats such as the newer cpio variants (which effectively store hard link entries first, and the "real" file data last). As Joerg pointed out, the more interesting problem is how this gets handled on extract. The second form is tricky to restore correctly. Cheers, Tim
