Joerg Schilling wrote:
Phillip Susi <[EMAIL PROTECTED]> wrote:
Can anyone explain this?
~$: du -bsh Maildir/
98M Maildir/
~$: tar cf Maildir.tar Maildir/
~$: du -bsh Maildir.tar
112M Maildir.tar
~$: find Maildir | cpio -o -H newc > Maildir.cpio
204433 blocks
~$: du -bsh Maildir.cpio
100M Maildir.cpio
Why does tar have 12M more overhead than cpio? This Maildir is the lkml
since Jan 1, so it contains ~20,000 messages/files, but ~734 bytes per
file seems like a bit much for overhead.
As cpio does not offer a -H newc format, let me asume that you are talking
about the -c or -H crc format...
Yes, it does have a newc format, see the info page. It is also the
format used by the linux kernel for initramfs images.
cpio is unblocked and thus has problems to resync after a part of the archive
that appears to be corrupted.
du only counts the file contend and a part of the meta data (not counting e.g.
the "inode" - see: /usr/include/sys/fs/ufs_inode.h)
Right, but the timestamps, owner, and mode only take up a handful of
bytes, which cpio also stores.
cpio -Hcrc writes 110 Bytes header + the file path name + the file content.
tar in the historical format or POSIX.1-1988 writes 512 bytes header +
the file content rounded up to the next 512 byte boundary.
recent tar (POSIX.1-2001 aka. "pax") writes at least 1 KB per file in addition.
I see. And the purpose for this is to try and recover from bad sectors
since a file will always start on a sector boundary, so only the file
contained in the bad sector will be lost?
Conclusion: if you write more metadata, you have more overhead.
But in real world use this has no relevence:
star -cPM -time f=/dev/null -C /usr .
star: 107825 blocks + 6656 bytes (total of 1104134656 bytes = 1078256.50k).
star: Total time 136.532sec (7897 kBytes/sec)
star -cPM -Hasc -time f=/dev/null -C /usr .
star: 104818 blocks + 2560 bytes (total of 1073338880 bytes = 1048182.50k).
star: Total time 134.415sec (7798 kBytes/sec)
The additional overhead that reasults from the tar format is typically less
than 3%. If you compress the result and use an archiver that takes care about
best compressibilty (as star does), even the small "advantage" of the cpio
format will go away.
If you compress the result, the remaining difference is less than 1%.
I'd say archiving my Maildir is a rather real world use, so this is
somewhat relevant. I did notice though, that once compressed, the
difference in size is greatly diminished.
Jörg
_______________________________________________
Bug-tar mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-tar