[adding coreutils]
On 08/24/2010 09:17 AM, Bernd Schubert wrote:
Hi all,
for improved stat() performance the Lustre filesystem uses entirely empty
sparse files on its metadata target (MDT). Now with hundredes of millions of
sparse file of huge sizes, creating a backup of of the MDT using vanilla
gnu-tar is basically impossible, as it needs far too much time to detect
sparse files.
Coreutils cp(1) has recently started using code to efficiently iterate
over the locations of all holes within sparse files, with the goal of
eventually being able to target both Linux ioctls and Solaris SEEK_HOLE
directives. I think that could also be leveraged rather nicely for
tar's detection of sparse files, by stopping the iteration after the
first hole has been found; in particular, it would rapidly detect files
that are not completely sparse (whereas the description of your patch
implies that you only address the subset of quickly detecting a
completely sparse file, but offer no speedup on partially sparse files).
Thus, coreutils' sparse file management is a great candidate for
migrating into gnulib and sharing among several projects.
Meanwhile, if you are indeed correct that there are easy ways to detect
completely sparse files, even when the ioctl or SEEK_HOLE directives are
not present, then the coreutils cp(1) hole iteration routine should
probably be taught that corner case to recognize an entirely sparse file
as a single hole.
PS: I'm used to linux-style indentation and I'm not sure if I did it the right
way. If it is wrong, please complain and I will try to reformat it.
Thanks for taking the time to contribute a patch. However, the diffstat
says that your patch is large enough to fall outside the bounds of
trivial submissions, so I quit reading it to avoid any copyright issues.
Would you be willing to assign copyright to the FSF? If so, we can
start the paperwork process off-list.
--
Eric Blake [email protected] +1-801-349-2682
Libvirt virtualization library http://libvirt.org