On May 3, 2010, at 3:40 AM, henri wrote:

>> Depending on how this test works out, we may be able to figure out what is 
>> causing it, and then get it fixed in rsync.
> I am uncertain of where the problem is at the moment as I have not personally 
> seen these errors. Have you tried using rsync with the source / destination 
> set to different physical devices? 

Yes, I have quite a few servers, and quite a few things going on.  If I had to 
make a huge generalization, my basic setup is that each machine has it's own 
internal RAID mirror.  Basic two drives, one protecting the other.

This is not backup though, this is for drives that misbehave.  I have scripts 
that watch raid and smart status and keep me up to date on what is going on 
with the mirror.

The mirror is cloned by rsync locally, from itself, to itself, twice a day, 
sometimes with lbackup, others without.  I am still deciding if I will be 
rolling out lbackup on all machines, or continuing to use my hobbled little 
backup script that sort of does the same thing.  At midnight and noon the 
entire data set at the root of the mirror is backed up, sort of like:
    rsyc / /backups/local/noon
    rsyc / /backups/local/midnight

>From there, on other various schedules, depending on the machine, and how much 
>data there is that changes, depends on my archival schedule.  For example, an 
>http server where lots of clients are uploading new data all the time, will 
>get a much more frequent schedule, whereas a machine that sits there, serves a 
>little DNS and does some SNMP reporting, I need not backup all that often, 
>once a day is certainly enough.

Since the local to local mirror backup is only as good as those two drives, all 
that data is sent off to my backup server, via rsync over ssh.  That machine 
itself has a large mirror, and follows most of the same routines.  The backup 
machine accepts backups in a rotational method either by my script moving 
directories around, or by lbackup on the machines I am testing that on.

With all this, I can say, there are a significant amount of changes in source 
and destination drives.  And I have and continue to see errors from local to 
local, local to remote, and remote to local, all using different combinations 
of drives and transports.  Also, some cases uses rsync and others use lbackup. 

The only thing unique here is these are all Mac OS X machines, and they all use 
two patches:

I have used those patches for almost a year now.  Why is it that those patches 
are not accepted into the main branch?  Maybe there is something wrong with the 
patches?  Where do I find out more about the patches,  how made them, why they 
are not incorporated into upstream, why has Apple not at least incorporated 
them into their ditros?

> If it is a bug in rsync, it seems odd that the reported error is not easily 
> reproducible. As stated above I am not yet sure where the problem is located. 
> As previously mentioned, trying a known good device for reading / writing is 
> a good way to move forward if you are experiencing IO errors.

Yes, the repeatability aspect of this is most frustrating.  If I can repeat it, 
we are down to a file, which could have permissions, acl's, data, resource 
forks, data forks, and some other special to Mac OS X meta data.  In a 
repeatable case, it would be trivial to start to pick away at the file until I 
find what at least causes the errors.  I find it somewhat hard to believe that 
across perhaps 20 total drives, and a handful of machines, that all of them 
have something wrong with the hardware.

I have more or less ruled out lbackup as far as I am concerned.  The only thing 
I can think is it is patch related, as that is the only difference in the Mac 
OS X version and the version the rest of the world uses, or there is some kind 
of strange race condition that happens when multiple rsyncs run.  My schedules 
are all staggered, though that does not mean the OS is not changing files in 
the middle of a backup.  This could explain why a second run "fixes" things.  
Perhaps there are some options to rsync that do the file compare operations at 
the exact time the file comes into queue.  I bet that comes at the expense of 
performance greatly though.

I do wonder how Carbon Copy Cloner (CCC), which has a rather large following 
and user base, is what I would consider to be pretty rock solid software.  I am 
sure 10's of thousands of people use CCC every day, and they manage to do so 
without filling their forums with reports of errors.  I am using the same 
patches that CCC uses.  

I am keeping a much closer eye on it now, and also have removed a few launchd 
based schedules and run the backups manually now, so I can watch them as they 
happen, until I do get to a repeatable case.

If I find out more, I will post here, as it stands, until I get it repeatable, 
it is as good as it not happening :)

Thanks for looking into this and for the suggestions.
Scott * If you contact me off list replace talklists@ with scott@ * 

lbackup-discussion mailing list

Change options or unsubscribe :

Reply via email to