Re: identifying sparse files and get ride of them trick available?

Otto Moerbeek Fri, 09 Nov 2007 02:07:59 -0800

On Fri, Nov 09, 2007 at 04:27:49AM -0500, Daniel Ouellet wrote:

> >>Any clue as to how to tackle this problem, or any trick around it?
> >
> >I really do not understand the problem here. But you might be able to
> >detect sparse files compartaring the size vs the number of blocks it uses.
> 
> Without making a bit writing out of it. Let say that the problem is for 
> now a storage capacity problem on the destinations servers, a timing one 
> in the extended transfer process and the additional bandwidth required 
> at some of the destination point and the volumes of files. Let just say 
> that if it was syncing 100K files, it would be a piece of cake, but it's 
> much bigger.
> 
> Just for example, a source file that is sparse badly, don't really have 
> allocated disk block yet, but when copy over, via scp, or rsync will 
> actually use that space on the destination servers. All the servers are 
> identical (or suppose to be anyway) but what is happening is the copy of 
> them are running out of space at time in the copy process. Like when it 
> is copying them, it may easy use twice the amount of space in the 
> process and sadly filling up the destinations then then the sync process 
> stop making the distribution of the load unusable. I need to increase 
> the capacity yes, except that it will take me times to do so.
> 
> Sparse file for database example is a very good thing, but not for 
> everything however.
> 
> The problem is not the sparse file at the source. It sure can stay as 
> is. It's just offset pointers anyway.
> 
> The problem is in the sync process between multiple servers using the 
> Internet to sync them and the bandwidth waisted as well as the lack of 
> space available at the destination. Plus because the copy is different 
> in size, then the sync process see it as different files and as such 
> will copy them again.


The size will not be different, just the disk space used.

> 
> Or it can be copy using -S with rsync, however this process will inflate 
> the file at the destination and run out of space during the process and 
> make them smaller at the end. Plus this obviously take a lots more time 
> and as such, the timely sync process that was good for a long time now, 
> well... Let say, not reliable. Let say, sync without concern for sparse 
> is done just in a few minutes, but then use lots more space on the 
> destination. Doing it with -S to address the capacity issue fix that, 
> but then it takes a HUGE amount of time more and sadly there is useless 
> transfer of null data cause from the sparse source empty space.

So your problem seems to be that rsync -S is inefficient to the point
where it is not useable.  I do not use rsync a lot, so I do not know
if there's a solution to that problem. It does seem strange that a
feature to solve a problem actually make the problem worse. 

> I can manage, I find ways to use ls -laR, or du -k and do diff's between 
> them and fine the files that are getting out of wack, replace them and 
> then continue, but this really is painful.

stat -s gives the raw info in one go. Some shell script hacking should
make it easy to detect sparse files.

        -Otto

> Obviously when the capacity will be there, it will be a none issue, 
> however I am sadly not at that point yet and it will take me some time.
> 
> Not sure if that explain it any better, I hope so.
> 
> But I was looking if it was possible to identify these files in a more 
> efficient way.
> 
> If not, I will just deal with it.
> 
> It's just going to be painful for sometime that's all.
> 
> The issue is really in the transfer process and at the final 
> destination. Not at the source.
> 
> I hope it make more sense explaining it this way, if not I apologists 
> for the lack of better thinking at the moment in explaining it.
> 
> Best,
> 
> Daniel

Re: identifying sparse files and get ride of them trick available?

Reply via email to