Re: [BackupPC-users] very slow backup speed

Evren Yurtesen Thu, 29 Mar 2007 06:04:58 -0800

Holger Parplies wrote:
> Hi,
> 
> Evren Yurtesen wrote on 29.03.2007 at 08:24:43 [Re: [BackupPC-users] very 
> slow backup speed]:
>> Les Mikesell wrote:
>>> Evren Yurtesen wrote:
>>>> Also I see the --ignore-times option in rsync args for full backups. Why
>>>> is this necessary exactly?
>>> If you don't, you are trusting that every file that has the same name, 
>>> timestamp and length on the previous backup still matches the contents 
>>> on the target.  It probably, mostly, usually, does...
> 
> actually, if you don't, you are trusting that nobody/nothing wrote to a file,
> keeping its size the same, and then reset the modification time to the
> previous value. Technically, that's simple to do and does happen. Full
> backups are meant for clearing up such a situation. We're talking about
> backups after all. From time to time we want to be *absolutely* sure that
> our data is really on backup in unmodified form. A backup that perpetually
> assumes "oh, everything will probably be fine" for sake of speed is quite
> pointless.


Well, I am sure there are 100s of programs setting file's modification time
back to original. So I am not saying that this should be disabled. But an option
to backuppc user should be given so people who thinks that speed is more 
important
can tweak it.

Although, I think this is impossible as rsync is forcing this.

In any case, even using tar is worse than disabling checksum checks and people
have been using tar for ages without serious consequences.

>>> [...]
>> Yes, this is causing a lot of extra processing on the server side. However
>> most backup programs do not do this detailed file checking(I think?) and 
>> nobody
>> seems to be complaining. There at least should be an option to only rely
>> on the name/timestamp of a file when using rsync also.
> 
> You've gone to incredible lengths to demonstrate that you don't know what
> rsync is all about, and you've spread that out over several threads. In a
> nutshell, it's "save bandwidth at the cost of processing overhead". Within
> BackupPC, this comes at an additional cost, as files need to be
> uncompressed. BackupPC is not fine-tuned to your case of small files but

I figure that when --checksum-seed is a fixed value then uncompressing is 
unnecessary
(for most files falling outside the probability values) So this information is 
wrong.
You dont necessarily have to uncompress the files to do checksum checking.

> rather handles huge files nicely, which it would be impossible to cache in
> uncompressed form. All of this is clearly documented, if you're only willing
> to read and understand it.

rsync is saving bandwidth at the cost of processing overhead doesnt mean that 
rsync
is just for this purpose, it is a way to think about it. Dont be so single 
minded.

As a matter of fact the rsync page says 'rsync is an open source
utility that provides fast incremental file transfer.' and I want speed which
coincides with the statement in rsync web pages.
 
> "Most backup programs" take "do a full backup" to mean "read *everything*
> over the network and write it to backup medium". BackupPC modifies this to
> mean "read *everything* over the network and store *unchanged* contents
> efficiently by using hardlinks" while taking the notion of "unchanged" very
> seriously. If previous backups have become corrupted for some reason, that
> is no good reason to feel fine with storing unchanged contents in unchangedly

I am willing to take that risk, who are you to disagree? I am not asking that
checksum checking should be removed. I am just asking that people who want to 
take that risk should be able to disable it.

> corrupted form for the future. That's not what backups are about.
> *rsync* further modifies the meaning (of a full backup!) to "read only
> *changed* contents over the network, so we can do multi-gigabyte backups over
> a low bandwidth link (and store it efficiently as above)" - low bandwidth
> meaning maybe modem links. Note that checksum caching already means a
> compromise: corruption in previous backups may go unnoticed and uncorrected.

Perhaps this is true for low bandwidth links, but I take the risk that files 
might be
corrupted and why should I waste a dedicated gigabit ethernet link for backups
and waste expensive CPU cycles?

Sure the way it works now fits to your usage doesnt mean that new features 
shouldnt be
added.

> If you're not prepared to pay the price rsync costs, then don't use rsync.
> Use tar. That's what you've got the choice for.

It is not the same, tar misses moved/renamed files with same modification times.
 
>> What I meant was, if tar is used then live comparisons can not be done. Tar
>> relies on the modification time. However if rsync is used then the file 
>> location
>> and modification time can be checked. If a file is renamed or moved keeping 
>> the
>> same modification time then this can be detected using rsync even without 
>> checksum
>> checks (which would give the same checksum anyway).
> 
> Nonsense. If you previously had a file 'foo' with 32 KB of contents and
> modification time X and now have a file 'bar' with the same length and date,
> you would want to blindly assume they're identical without checking the
> contents? rsync doesn't do voodoo. On the remote end, you have the file
> 'bar' and no knowledge about the file 'foo'. Even with rsync, the file 'bar'
> is transfered in this case. It's BackupPC that re-establishes that the
> contents are identical while receiving it (and it does that even for a 1 GB
> file which could turn out to have only the last byte changed without needing
> to store an intermediate copy!).

I have to accept, this is a good point. I thought that checksum checks were done
to figure only if the file is same or not. However my incremental backups were 
slow
even when there werent a lot of updated files.

Anyhow, I enabled checksum-seed now and I will see if it helps.

Thanks,
Evren

>> So checksum is saving
>> bandwidth if a file modification time is different but the contents are same.
> 
> That much is correct. Note that the main objective of rsync is to make sure
> that local and remote copy are *really the same*. You don't say
> 'rsync foo host:/bar' mainly to save bandwidth but to update the remote file
> (or local with reversed arguments). rsync means "try to do that with low
> bandwidth requirements, skipping as much of the transfer as you can while
> still making sure the original goal is met".
> 
>> In any case, somebody else said that when checksum caching is enabled then 
>> file
>> checksums are not calculated at it backup session by uncompressing the 
>> files. If
>> this is true? then I can live with it but CPU usage could be lowered further 
>> if
>> this was  an option which could be disabled and better performance could be 
>> achieved.
> 
> And that's where you completely miss the point. For taking a backup of a log
> file which has grown from 10 MB to 11 MB you either
> * need to transfer the new version completely (that's tar) or
> * figure out what has changed and transfer only that (that's rsync, and it
>   needs block checksums to be calculated, because not every file changes
>   only by data being appended to it).
> 
> For a 100 MBit/s link, 10 MB is about a second, so spending a few seconds to
> save you that is not a good option. For an ISDN link, 10 MB is 20 minutes. So
> what if the computation takes a few seconds? Worst case is always
> "computation plus full transfer". There are no guarantees that rsync will
> actually speed things up. That depends on your specific parameters.
> 
> As a side note, checksum caching reportedly *does not* speed things up as much
> as you'd first expect. Or maybe, not as much as *you* expect. I expect you
> not to break with your habit of being disappointed.
> 
> 
> It has been mentioned before, but for completeness' sake: rsync has the
> additional benefit over tar that incremental runs will detect renamed files,
> new files with old timestamps and deleted files. This is not to be confused
> with the "voodoo that rsync does not do". It means that a file 'foo' with an
> "old timestamp" (meaning older than the previous successful backup) which is
> *not* in the previous backup will be transferred (!) by rsync, while tar
> will skip it based on the timestamp, assuming it was in the previous backup.
> 
> Either you want rsync (for slow links) and are prepared to pay for it (then
> you get this benefit for free), or you take tar (then you don't get this).
> That's the deal.
> 
> What you seem to be asking for is rsync with the BackupPC server side
> sending incorrect checksums in order for the remote end to simply send the
> whole file. The question is: how do you make sure the checksums are in fact
> incorrect (without calculating them, which would defeat the whole purpose)?
> Sounds like a bad kludge.
> 
>> Thanks for all the help and patience! :)
> 
> Oh, don't expect patience from me. I'm not convinced you really want to
> understand, but there might be others out there that do. Confusing messages
> tend to get people confused.
> 
> This would probably be the place to rant about quoting long parts of
> messages back that you are not, in fact, referring to (which is confusing and
> time consuming to work out) and redundant multiple replies with identical
> content within one thread. After all, we're talking about trying to
> eliminate redundancy in backups. Why not start where it has many other
> benefits?
> 
> My apologies to the rest of the list for my tone.
> 
> Regards,
> Holger

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Re: [BackupPC-users] very slow backup speed

Reply via email to