On Fri, Jan 24, 2014 at 01:07:21AM +0000, Pádraig Brady wrote:
> On 01/24/2014 12:47 AM, Bernhard Voelker wrote:
> > Inspired by a recent post on util-linux ML [1], talking about turning
> > a file into a sparse file in-place, i.e. not using a 2-step approach
> > like `cp --sparse file file2 && mv file2 file`), I thought, hey, don't
> > we have this in coreutils already?
>
> > b)
> > Then, I tried
> > $ dd if=file of=file conv=sparse,notrunc
> > to avoid truncating the output file. That didn't corrupt the data,
> > but the file still was not sparse afterward.
> > What's the reason for conv=sparse not to work in this situation?
> > BTW: generally, writing to the same file seems to work, e.g.:
> > dd if=file of=file conv=ucase,notrunc
>
> To deallocate the zeros we'd have to use fallocate(FALLOC_FL_PUNCH_HOLE).
> Also for efficiency reasons it would be nice to detect holes efficiently.
> We can do this with the current fiemap code, but really we should try
> and use the new SEEK_HOLE functionality available in the kernel.
I looked into this, but I think it won't. I even tried (maybe I did it wrong ?)
when implementing the tool to make a file sparse in-place, but it didn't report
the '\0's already allocated. The manpage says:
These operations allow applications to map holes in a sparsely
allocated file. This can be useful for applications such as file backup
tools, which can save space when creating backups and preserve holes, if
they have a mechanism for discovering holes.
For the purposes of these operations, a hole is a sequence of zeros that
(normally) has not been allocated in the underlying file storage.
How‐ ever, a filesystem is not obliged to report holes, so these
operations are not a guaranteed mechanism for mapping the storage space
actually allo‐ cated to a file. (Furthermore, a sequence of zeros that
actually has been written to the underlying storage may not be reported
as a hole.) In the simplest implementation, a filesystem can support
the operations by making SEEK_HOLE always return the offset of the end
of the file, and mak‐ ing SEEK_DATA always return offset (i.e., even if
the location referred to by offset is a hole, it can be considered to
consist of data that is a sequence of zeros).
So, this operations seems to be oriented to easily handle already sparse files
in applications. And let me remark one specific part of it:
Furthermore, a sequence of zeros that actually has been written to the
underlying storage may not be reported as a hole.
And that is the case that we are interested in. And, I tried it in ext4 on a
3.11 kernel and it does not detect as a hole a sequence of zeros that has been
written to the underlying storage.
Thanks a lot,
Rodrigo