On Fri, Nov 09, 2007 at 04:27:49AM -0500, Daniel Ouellet wrote:
> >>Any clue as to how to tackle this problem, or any trick around it?
> >
> >I really do not understand the problem here. But you might be able to
> >detect sparse files compartaring the size vs the number of blocks it uses.
>
> Without making a bit writing out of it. Let say that the problem is for
> now a storage capacity problem on the destinations servers, a timing one
> in the extended transfer process and the additional bandwidth required
> at some of the destination point and the volumes of files. Let just say
> that if it was syncing 100K files, it would be a piece of cake, but it's
> much bigger.
>
> Just for example, a source file that is sparse badly, don't really have
> allocated disk block yet, but when copy over, via scp, or rsync will
> actually use that space on the destination servers. All the servers are
> identical (or suppose to be anyway) but what is happening is the copy of
> them are running out of space at time in the copy process. Like when it
> is copying them, it may easy use twice the amount of space in the
> process and sadly filling up the destinations then then the sync process
> stop making the distribution of the load unusable. I need to increase
> the capacity yes, except that it will take me times to do so.
>
> Sparse file for database example is a very good thing, but not for
> everything however.
>
> The problem is not the sparse file at the source. It sure can stay as
> is. It's just offset pointers anyway.
>
> The problem is in the sync process between multiple servers using the
> Internet to sync them and the bandwidth waisted as well as the lack of
> space available at the destination. Plus because the copy is different
> in size, then the sync process see it as different files and as such
> will copy them again.
The size will not be different, just the disk space used.
>
> Or it can be copy using -S with rsync, however this process will inflate
> the file at the destination and run out of space during the process and
> make them smaller at the end. Plus this obviously take a lots more time
> and as such, the timely sync process that was good for a long time now,
> well... Let say, not reliable. Let say, sync without concern for sparse
> is done just in a few minutes, but then use lots more space on the
> destination. Doing it with -S to address the capacity issue fix that,
> but then it takes a HUGE amount of time more and sadly there is useless
> transfer of null data cause from the sparse source empty space.
So your problem seems to be that rsync -S is inefficient to the point
where it is not useable. I do not use rsync a lot, so I do not know
if there's a solution to that problem. It does seem strange that a
feature to solve a problem actually make the problem worse.
> I can manage, I find ways to use ls -laR, or du -k and do diff's between
> them and fine the files that are getting out of wack, replace them and
> then continue, but this really is painful.
stat -s gives the raw info in one go. Some shell script hacking should
make it easy to detect sparse files.
-Otto
> Obviously when the capacity will be there, it will be a none issue,
> however I am sadly not at that point yet and it will take me some time.
>
> Not sure if that explain it any better, I hope so.
>
> But I was looking if it was possible to identify these files in a more
> efficient way.
>
> If not, I will just deal with it.
>
> It's just going to be painful for sometime that's all.
>
> The issue is really in the transfer process and at the final
> destination. Not at the source.
>
> I hope it make more sense explaining it this way, if not I apologists
> for the lack of better thinking at the moment in explaining it.
>
> Best,
>
> Daniel