On 06/15/2010 02:43 PM, Jim Meyering wrote: > I think that copying physical holes via FIEMAP should be the default, when > possible. One problem is that the current code on the fiemap-copy branch > does not honor --sparse=WHEN when in fiemap-copying mode. The solution > would seem to be to change the regular-file-copying loop in the fiemap_copy > function to use the same hole-preserving code that is used in copy_reg.
I assume that the solution would be used only with cp --sparse=always? (Otherwise, it would amount to copying logical holes by default.) If so, under this proposal, --sparse=always would copy logical holes, --sparse=never would never copy holes, and --sparse=auto (the default) would copy physical holes if supported, else it would copy logical holes if the file seems to have enough physical holes, and otherwise it would copy no holes. (Whew!) But this raises another issue. Under this proposal, the default behavior copies physical _holes_, but it doesn't copy physical _extents_. For example, suppose the input file has a single 128 MiB extent. The proposal could create an output file with two 64-MiB extents that are logically contiguous. This is because the code does not advise the output file about the extents that the input file had, and the operating system might assign a smaller output extent at first, discovering only too late that more space was needed. If the intent is to copy the physical extents, then the proposed code needs to be fixed so that it uses fallocate() to attempt to create the same extents in the destination that existed in the source. Obviously this could fail for any number of reasons, for example, the destination file system might be different from the source; but the goal would be to preserve extents if the underlying OS is "nice" about implementing fallocate(). Which raises another question: should cp attempt to create _better_ extents in the destination than in the source? For example, if the source contains two logically-adjacent 64-MiB extents, should the destination contain just one 128-MiB extent? I can see where that might be worthwile. This all is getting a bit complicated, I'm afraid. Perhaps it'd be better to try this stuff out with a new option, "cp --sparse=extents" say, and keep the default as-is for a while?