On Tue, 17 Jan 2006, Daniel Ouellet wrote:

> > > You are wrong in thinking sparse files are a problem. Having sparse
> > > files quite a nifty feature, I would say. 
> > 
> > 
> > Are we talking about webazolver or OpenBSD?
> > 
> > I'd argue that relying on the OS handling sparse files this way instead
> > of handling your own log data in an efficient way *is* a problem, as
> > evidenced by Daniels post. After all, it's reasonable to copy data to,
> > say, a different drive and expect it to take about as much space as the
> > original.
> 
> Just as feedback the size showed something like 150MB or so as the original
> file on OpenBSD. Using RSYNC to copy it over makes it almost 5GB in size, well
> I wouldn't call that good. But again, before I say no  definitely, there is
> always something that I may not understands, so I am welling to leave some
> space for that here. But not much! (:>
> 
> > On the other hand, I agree with you that handling sparse files
> > efficiently is rather neat in an OS.
> 
> I am not sure that the OS handle it well or not. Again, no punch intended, but
> if it was, why copy no data then? Obviously something I don't understand for
> sure.
> 
> However, here is something I didn't include in my previous email with all the
> stats and may be very interesting to know. I didn't think it was so important
> at the time, but if you talk about handling it properly, may be it might be
> relevant.
> 
> The test were done with three servers. The file showing ~150MB in size was on
> www1. Then copying it to www2 with the -S switch in rsync regardless got it to
> ~5GB. Then copying the same file from www2 to www3 using the same rsync -S
> setup go that file back to the size it was on www1. So, why not in the www2 in
> that case. So, it the the OS, or is that the rsync. Was it handle properly or
> wasn't it? I am not sure. If it was, then the www2 file should not have been
> ~5GB should it?

Until you get your terminology right, and make a difference between
file size and disk usage of a file, you are just adding to your own
confusion. 

File size remains constant, if a file is sparse or not. The only thing
that changes is it's disk usage. It doesn't matter if you like it or
not; it is handy to be able to store a file using less bytes than its
length.  This feature is used by db(3) to great advantage. 

> 
> So the picture was
> 
> www1->www2->www3
> 
> www1 cache DB show 150MB
> 
> rsync -e ssh -aSuqz --delete /var/www/sites/ [EMAIL PROTECTED]:/var/www/sites
> 
> www2 cache DB show ~5GB
> 
> rsync -e ssh -aSuqz --delete /var/www/sites/ [EMAIL PROTECTED]:/var/www/sites
> 
> www3 cache DB show ~150MB
> 
> Why not 150Mb on www2???

I suspect a test interpretation error, or maybe the OS on www2 does
not support sparse files, who knows? You are not giving details. 

> One think that I haven't tried and regret not have done that not to know is
> just copying that file on www1 to a different name and then copying it again
> to it's original name and check the size at the and and the transfer of that
> file as well I without the -S switch to see if the OS did copy the empty data
> or not.

I think you still do not understand the concept of sparse files. Just
compare the SIZE of a file with it's disk USAGE. The bytes that did
not get written to do not take space. If you read back unwritten
bytes, you'll get zeroes. Just compare the numbers given by ls -ls.

$ dd if=/dev/zero of=sparse seek=1m bs=1k count=1 
1+0 records in
1+0 records out
1024 bytes transferred in 0.000 secs (4762791 bytes/sec)
$ ls -ls sparse                                   
128 -rw-r--r--  1 otto  wheel  1073742848 Jan 17 21:39 sparse
$ hexdump -vC sparse | more
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
[snip]

> I guess the question would be, should it, or shouldn't it do it?
> 
> My own opinion right now is the file should show the size it really is. So, if
> it is 5GB and only 100MB is good on it, shouldn't it show it to be 5GB? I
> don't know, better mind then me sure have the answer to this one, right now, I
> do not for sure.

It's not a question of data being good or not. I think I give up.

        -Otto

Reply via email to