On Tue, Jun 15, 2010 at 02:28:43PM -0700, Paul Eggert wrote: > I looked at the proposed fiemap patch for cp and see another issue > with it, which needs to be thought through more carefully. > > The proposed fiemap code tries to copy the _physical_ holes in the > source file. But currently cp copies the _logical_ holes: that is, it > skips over writing (and thus attempts to create physical holes) > whenever the source file contains a block of zeros, regardless of > whether that block of zeros is physically stored in the file system. > The current cp behavior is often the desired behavior (particularly > for cp --sparse=always) and the fiemap code shouldn't alter that. > > For example, if a fiemap_extent has the FIEMAP_EXTENT_UNWRITTEN flag > set, cp should treat that as a hole, because the extent is all zeros. > (This will greatly help performance in some cases.) Also, if an input > extent is read and a block of it is found to be zeros, cp should skip > over that block when writing.
I pretty much agree with everything you wrote above. Fiemap is introduced into cp(1) to optimize the existing --sparse behavior. So we should be holding to that behavior. In fact, I'll go one further. While I haven't looked at the actual cp code in a little bit, I do not think that fiemap is used for the non-sparse copy. I contend that fiemap should be used for all copies if the capability exists. Even if cp is doing a non-sparse copy, there's no need for it to call read(2) on a 1GB hole in the file. It can write zeros just fine on its own. Basically, these capabilities (fiemap, SEEK_HOLE) are ways to avoid reading zeros we can determine without reading and scanning. We should always take advantage of that. > If the proposed fiemap code was intended to copy the physical holes, > then the code wasn't successful in implementing the intent, as there's > no guarantee that the extents of the destination are the same as the > extents of the source. If it is useful to have 'cp' copy the physical > holes, this could be supported via a new cp option or options, but > surely the default should continue to be to copy logical holes. The goal was not physical holes, it was logical ones as you surmise. I can't see any way to ensure the file has identical extents. I don't know any filesystem that guarantees such allocations on write(2). I do think there might be an interesting performance optimization to be had, but it is independent of fiemap. Above, you describe the usual "cp --sparse" behavior of reading allocated data and checking for zeroes. If you're optimizing for space usage -- what "cp --sparse" is defined to do -- you definitely want to skip writing blocks that are all zero. But it actually creates a crappier allocation and I/O pattern for some files. Imagine you have a data extent that is alternating blocks of zeroes and ones: hexdump -C /tmp/blk 00000000 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 |................| * 00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00002000 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 |................| * 00003000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00004000 ... (continued for 4MB in this example) 00400000 Classic "cp --sparse" will spend CPU time scanning this extent, which can be megabytes in size. It will then write out every other block, which is terribly slow, and it leavs a horribly fragmented destination file. The result is even more disastrous if those holes ever get filled in. I'm, of course, exaggerating a worst-case here. But folks looking at I/O performance would love detection like "here's a linear MB, sure it has some zero blocks, but it's faster to write the entire MB than to chop it up." Joel -- Life's Little Instruction Book #456 "Send your loved one flowers. Think of a reason later." Joel Becker Principal Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127