Eric Blake wrote: > On 01/31/2011 02:46 PM, Jim Meyering wrote: >> Now that we have can read sparse files efficiently, >> what if I want to copy a 20PiB sparse file, and yet I want to >> be sure that it does so efficiently. Few people can afford >> to wait around while a normal processor and storage system process >> that much raw data. But if it's a sparse file and the src and dest >> file systems have the right support (FIEMAP ioctl), then it'll be >> copied in the time it takes to make a few syscalls. >> >> Currently, when the efficient sparse copy fails, cp falls back >> on the regular, expensive, read-every-byte approach. >> >> This proposal adds an option, --efficient-sparse=required, >> to make cp fail if the initial attempt to read the sparse file fails, >> rather than resorting to the regular (very slow in the above case) copy >> procedure. >> >> The default is --efficient-sparse=auto, and for symmetry, >> I've provided --efficient-sparse=never, in case someone finds >> a reason to want to skip the ioctl. > > Conversely, what happens if I have a file that contains large blocks of > zeros but is NOT fully sparse (plausible, since we're still facing the > fact that it is still not easy to punch holes into existing files when > data in that portion of the file is no longer needed)? Does all the new > fiemap code still have the ability for me to request that the cp code > specifically look for large blocks of zero in the source, rather than > trusting the fiemap, so that I can create a copy that is more sparse > than the original? Does that also need a tunable; and if so, should we > try to combine it into this tunable or is it orthogonal?
It's orthogonal. --sparse=always still does the hole-punching, independently of whether we're copying normally or via the efficient FIEMAP-based code. E.g., if you have a sparse file, where one non-sparse chunk contains all-zero blocks (currently 32KiB minimum), then --sparse=always will convert those blocks to holes, with or without --efficient-sparse=never. --efficient-sparse=... controls efficiency while reading --sparse=... controls hole-punching (or preserving) BTW, that the existing hole-punching behavior works for no sequence shorter than 32KiB is a bug that I will fix very soon. I think that was introduced as an unwanted side-effect when increasing buffer size for efficiency. Thanks for the feedback.
