Has anyone considered having cpio call fallocate() when running on Linux? This is a fairly new system call that pre-allocates file space in those file systems that support it. ext2 and ext3 do not, but ext4 and xfs and some others do.
Both ext4 and xfs are extent-based filesystems, meaning that each file is stored in one or more contiguous regions called extents. (ext2 and ext3 exhaustively list every disk block in a file whether they are contiguous or not). There are significant performance advantages to storing each file in a single extent, and ext4 and xfs try very hard to do so. But this is difficult when the file system doesn't know in advance how big the file will be, especially when the system is close to full and free space is highly fragmented. The fallocate() call was therefore added for the optional use of applications that do know in advance how big a file they will write. An archive extractor like cpio -i does know how big each file it creates will be. In Linux, calling fallocate() on a file system that doesn't support it has no effect. So I recommend that cpio always call as each file is extracted, whether or not the file system supports it. I also recommend the FALLOC_FL_KEEP_SIZE flag, which allocates space without changing the size of the file until it is actually written. There is a related library call, posix_fallocate(), that may be present even when the underlying file system doesn't have a native allocate call. In that case, it merely writes the specified number of zeroes to the file so that later overwrites with real data cannot fail due to lack of disk space, and the file size is changed. I do not recommend using posix_fallocate() as it could slow things down considerably while providing no real benefit. I'm running local test versions of cpio, tar and rsync with fallocate() calls added, and they seem to work as expected. Comments? Phil
