On Tue, Jan 17, 2006 at 02:36:44PM -0500, Daniel Ouellet wrote:
> [...] But having a 
> file that is let say 1MB of valid data that grow very quickly to 4 and 
> 6GB quickly and takes time to rsync between servers were in one instance 
> fill the fill system and create other problem. (:> I wouldn't call that 
> a feature.

As Otto noted, you've distinguish between file size (that's what
stat(2) and friends report, and at the same time it's the number
of bytes you can read sequentially from the file), and a file's
disk usage.

For more explanations, see the RATIONALE section at

http://www.opengroup.org/onlinepubs/009695399/utilities/du.html

(You may have to register, but it doesn't hurt)

See also the reference to lseek(2) mentioned there.


> But at the same time, I wasn't using the -S switch in rsync, 
> so my own stupidity there. However, why spend lots of time processing 
> empty files I still don't understand that however.

Please note that -S in rsync does not *guarantee* that source and
destination files are *identical* in terms of holes or disk usage.

For example:

$ dd if=/dev/zero of=foo bs=1m count=42
$ rsync -S foo host:
$ du foo
$ ssh host du foo

Got it? The local foo is *not* sparse (no holes), but the remote
one has been "optimized" by rsync's -S switch.

We recently had a very controverse (and flaming) discussion at our
local UG on such optimizations (or "heuristics", as in GNU cp).
IMO, if they have to be explicitely enabled (like `-S' for rsync),
that's o.k. The other direction (copy is *not* sparse by default)
is exactly what I would expect.

Telling wether a sequence of zeroes is a hole or just a (real) block
of zeroes isn't possible in userland -- it's a filesystem implementation
detail.

To copy the *exact* contents of an existing filesystem including
all holes to another disk (or system), you *have* to use
filesystem-specific tools, such as dump(8) and restore(8). Period.


> I did research on google for sparse files and try to get more 
> informations about it. In some cases I would assume like if you do round 
> database type of stuff where you have a fix file that you write in at 
> various place or something, would be good and useful, but a sparse file 
> that keep growing over time uncontrol, I may be wrong, but I don't call 
> that useful feature.

Sparse files for databases on heavy load (many insertions and
updates) ar the death of performance -- you'll get files with blocks
spreaded all over your filesystem.

OTH, *spare* databases such as quotas files (potentially large, but
growing very slowly) are good candidates for sparse files.

Ciao,
        Kili

Reply via email to