Re: Rsync decides to copy all old files to WinXP based server

2006-04-03 Thread Craig Barratt
Alex Janssen writes:

 I've been using rsync to create backup copies of all my data files on my 
 Linux laptop to my Windows XP Home based desktop for about 6 months 
 now.  Been working as it should, copying only files that changed since 
 the last backup.  The first backup I ran after the time change to 
 Daylight Saving Time it wanted to copy all of the files regardless of 
 the timestamp.  It copied old files that had not changed as well as the 
 files that had changed.  All of the timestamps on the destination ended 
 up correctly set after the copy occurred but the appeared to be the same 
 before the copy began as well.  I am stumped.
 
 I don't know what the system time has to do with it seeing as it is 
 comparing file timestamps.

FAT file systems store time stamps in local time, so they change
with DST.  See JW Schultz's excellent write up:

http://www.cygwin.com/ml/cygwin/2003-10/msg00995.html

Craig
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: What would cause an unexpected massive transfer

2005-10-31 Thread Craig Barratt
Harry Putnam writes:

 Yeah, nice write up.  Am I correct in thinking that since I've gone
 thru the long backup I'm now good till next time change?

Yes.

 Further, if I converted the fs on the external drive to NTFS or create
 an ext3 partitions, this would never have happened?

Yes.

Craig
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: What would cause an unexpected massive transfer

2005-10-30 Thread Craig Barratt
Harry Putnam writes:

 I've rsynced two directory structures back and forth a few times.

 [snip]
 
 The file systems in volved are (xp)NTFS on one end and Fat32 on the
 external drive.

This is the DST problem with how Fat32 represents mtime.
Fat32 uses localtime, so the unix-derived (UTC) mtime with
Fat32 changes with DST.  Sad, huh?

See the excellent write-up by the late JW Schultz:

http://www.cygwin.com/ml/cygwin/2003-10/msg00995.html

Craig
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsyncd server daemon not allowing connections

2005-04-25 Thread Craig Barratt
[EMAIL PROTECTED] writes:

 Gang, I've read the manual(s), surfed google, spent about 5 hours on this,
 to no avail
 
 I'm trying to run rsync in server mode and it appears to start normally,
 but it refuses all connections (refuses connection when I tried telnetting
 in on localhost 873!).
 
 I've turned off all firewalls on this server (do I dare tell you guys
 that?...), which is fine: it is on a local network.
 
 I used the following command:
 
 rsync --daemon --server --config-file=/etc/rsyncd.conf .
 It responds normally: @RSYNC 28

You should not use the --server option.

Craig
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: An idea: rsyncfs, an rsync-based real-time replicated filesystem

2005-04-13 Thread Craig Barratt
Interesting ideas.

 I envision the VFS Change Logger as a (hopefully very thin) middle-ware
 that sits between the kernel's VFS interfaces and a real filesystem, like
 ext3, reiser, etc.  The VFS Change Logger will pass VFS calls to the
 underlying filesystem driver, but it will make note of certain types of
 calls...

If I understand your description correctly, inotify does something
close to this (although I'm not sure where it sits relative to VFS);
see:

http://www.kernel.org/pub/linux/kernel/people/rml/inotify/

It provides the information via /dev/notify based on ioctl requests.
I vaguely recall inotify doesn't report hardlinks created via link()
(at least based on looking at the utility example).

Inotify will drop events if the application doesn't read them fast
enough.  So a major part of the design is how to deal with that
case, and of course the related problem of how to handle a cold
start (maybe just run rsync, although on a live file system it
is hard to know how much has changed since rsync checked/updated
each file/directory).

Perhaps you would have a background program that slowly reads
(or re-writes the first byte) of each file, so that over time
all the files get checked (although file deletions won't be
mirrored).

Craig
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsyncing really large files

2005-02-28 Thread Craig Barratt
Lars Karlslund writes:

 Also the numbers speak for themselves, as the --whole-file option is
 *way* faster than the block-copy method on our setup.

At the risk of jumping into the middle of this thread without
remembering everything that was discussed...

Remember that by default rsync writes a new file and then renames that
file.  So a single byte change to a file requires a complete read and
write (plus the earlier read to generate the block checksums).

The --inplace option is more efficient in terms of disk IO, but the
drawback is that blocks earlier in the original file cannot be matched.
I haven't looked at the code, but I'm guessing --inplace still does
byte-by-byte matching.  An additional optimization for --inplace would
be to only try to match on multiples of the block size.

Also, the matching only proceeds byte-by-byte when there is no match.
Once a match is found then the entire block is skipped.  So on a file
with few changes, the byte-by-byte matching doesn't slow things down
very much.

Craig
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync 2.6.3 hang (was rsync 2.6.2 crash)

2004-10-05 Thread Craig Barratt
jim writes:

 Thanks for the additional info.
 
 I actually have tried the --no-blocking-io option, but the sync
 still hung.

 Since no one on Unix-like platforms are reporting an issue, do you
 think it may be something in the Cygwin compatibility layer?

Yes, I think so.  When I tried to debug this some time ago with rsync
over ssh using cygwin I found that data that was flushed by one end's
rsync never arrived at the other end: the other end was still blocked
on select. I presume that somewhere between rsync/ssh/cygwin, and
cygwin/ssh/rsync on the other end, some buffer was not getting flushed
properly. I don't know anything about cygwin internals, so I didn't
look at this further.

 Interestingly (to me anyway,) is that I have encountered the
 problem with syncing across the network using ssh, and syncing
 locally, but not over the network rsync to rsync. Maybe it's
 just a matter of time

I've never seen this with rsync/cygwin -- rsync/cygwin, only with
ssh.  I haven't tested local rsync on cygwin that much.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: cwrsync and CPU usage

2004-08-12 Thread Craig Barratt
Jose Luis Poza writes:

 I have a problem witch cwsrync and a questions. Does cwrsync process
 (rsync.exe) use 100% (more or less) CPU in Windows 2000 server witch a high
 level of kernel usage ? 
 I have syncronized 11 servers (unix and windos) witch all their unit´s
 files, that proccess during approach 17 hours (the proccess is make every
 day). Is this time normal?. (A client makes all the request and store in
 local the files). 

Are you using the latest rsync (version 2.6.2)?  It runs a lot
faster on cygwin.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync to Mac OS X question

2004-08-06 Thread Craig Barratt
Chris Heller writes:

 I ran into a problem today when I tested the system for the first time.
 I am rsyncing from a remote Linux host using the following options to
 rsync: -avv --rsh=ssh stuff here --exclude-from=path to exclude
 file --delete.
 
 The problem is when the files are moved over to the Mac OS X server
 their owner/group ids change.
 
 For instance if I copy ~heller/ (uid: 500 gid: 500) to the Mac it
 becomes uid: 504 gid: 504.
 
 This isn't too big a problem, but it messes up security when I go to
 export the data via NFS.
 
 From the rsync man page I was under the impression that -a will preserve
 owner, group permissions.

By default rsync maps uid/gid values by user/group name at each
end of the transfer.  

Use --numeric-ids to just send the uid/gid without mapping.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: cwRsync, Windows-2000, use of 'auth users': not working .... shou ld it?

2004-08-06 Thread Craig Barratt
GUZZI, ANTHONY writes:

 Without a 'auth users' entry for a module, the sync go fine.  With an 'auth
 users' entry, I'm getting the '@ERROR: auth failed on module ' error
 message.

Make sure your RSYNC-USERS.TXT file ends in a newline.
Rsync prior to 2.6.2 ignores the last line in the file
if it doesn't end in a newline.  Next, try full paths
for the secrets file.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: HP-UX 11i and largefiles on rsync 2.6.2

2004-07-27 Thread Craig Barratt
Steve Bonds writes:

 This is what I would expect to see if the VXFS filesystem was not created
 with the largefiles option-- but it was.  (And I double-checked.)  Other
 utilities (e.g. dd) can create large files just fine.
 
 I haven't seen anything obviously wrong with write_file or
 flush_write_file in fileio.c (v. 1.15).
 
 Do you know what is meant by the process' file size limit?

I don't know specifically about HP-UX, but must *nix systems have ulimit.
See the man page.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: HP-UX 11i and largefiles on rsync 2.6.2

2004-07-20 Thread Craig Barratt
Don Malloy writes:

 I just tried the build from the nightly tar file: 
 rsync-HEAD-20040720-1929GMT.tar.gz
 
 It failed at 2144075776 bytes each time I tried. I've attached the tail from 
 the tusc again. Here it the output of the rsync:

I haven't been following this thread, so I might be way off base.
Are you sure your destination file system supports large files,
and that the destination file system has enough room?

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsyncP

2004-07-15 Thread Craig Barratt
Paul Arch writes:

 does anyone know if File::RsyncP will operate under activeperl (windows?)
 
 This module is maintained by Craig Barratt, who I noticed is also on this
 list :)

I haven't tested it under activeperl, but it does work under
perl + cygwin on WinXX.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] Batch-mode rewrite

2004-07-13 Thread Craig Barratt
Chris Shoemaker writes:

 Do you see any reason to keep FIXED_CHECKSUM_SEED around?  It doesn't
 hurt anthing, but I don't see a use for it.

So long as the --checksum-seed=N option remains, I'm ok getting
rid of FIXED_CHECKSUM_SEED.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [Bug 1463] New: poor performance with large block size

2004-06-17 Thread Craig Barratt
Wally writes:

 I apologize to Craig. Chris is correct.

No problem.

 I had been reading so many of Chris's highly intelligent e-mails...

Same here.

 But, the comment seems to have been right on. I have re-run the
 experiment with block sizes as small as 3000 (yes it took a long
 time to complete) all the way up to block sizes of 10 with it
 working in reasonable times. But, when the block size approaches
 170,000 or so, the performance degrades exponentially.

 I understand that I am testing at the very fringes of what we should
 expect rsync to do. File sizes of 25Gig and 55Gig are beyond what was
 originally envisioned (based on 64k hash buckets and a sliding window
 of 256k).

Here's a patch to try.  It basically ensures that the window is
at least 16 times the block size.  Before I'd endorse this patch
for CVS we need to make sure there aren't cases where map_ptr is
called with a much bigger length, making the 16x a bit excessive.

Perhaps I would be tempted to repeat the previous check that the
window start plus the window size doesn't exceed the file length,
although it must be at least offset + len - window_start as in
the original code.

In any case, I'd be curious if this fixes the problem.

Craig

--- rsync-2.6.2/fileio.cSun Jan  4 19:57:15 2004
+++ ../rsync-2.6.2/fileio.c Thu Jun 17 19:33:26 2004
@@ -193,8 +193,8 @@
if (window_start + window_size  map-file_size) {
window_size = map-file_size - window_start;
}
-   if (offset + len  window_start + window_size) {
-   window_size = (offset+len) - window_start;
+   if (offset + 16 * len  window_start + window_size) {
+   window_size = (offset + 16 * len) - window_start;
}
 
/* make sure we have allocated enough memory for the window */
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: stalling during delta processing

2004-06-15 Thread Craig Barratt
Wallace Matthews writes:

 I copy the 29 Gig full backup back into fedor//test/Kibbutz and issue
 the command time rsync -avv --rsh=rsh --stats --block-size=181272
 /test/Kibbutz/Kbup_1 fedor://test/Kibbutz and it CRAWLS during delta
 generation/transmittal at about 1 Megabyte per second.

 I have repeated the experiment 3 times; same result each time.

 The only thing that is different is --block-size= option. First,
 time it isnt specified and I get a predictable answer. Second
 time, I give it a block size that is about 1/2 of square root of
 (29 Gig) and that is ok. But, explicitly give it something that
 is approximately the square root of the 29 Gig and it CRAWLS.

 When I cancel the command, the real time is 86 minutes and the
 user time is 84 minutes. This is similar to the issue I reported
 on Friday that Chris suggested I remove the --write-batch= option
 and that seemed to fix the CRAWL.

If I understand the code correctly, map_ptr() in filio.c maintains
a sliding window of data in memory.  The window starts 64K prior
to the desired offset, and the window length is 256K.  So your
block-size of 181272 occupies most of the balance of the window.

Each time you hit the end of the window the data is memmoved
and the balance needed is read.  With such a large block size
there will be a lot of memmoves and small reads.

I doubt this issue explains the dramatic reduction in speed, but
it might be a factor.  Perhaps there is a bug with large block
sizes?

And, yes, your observation about the number of matching blocks
needs to be explored.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: 2.6.2 not displaying permissions errors on client side

2004-05-31 Thread Craig Barratt
Wayne Davison writes:

 On Sun, May 09, 2004 at 03:35:47AM -0700, Robert Helmer wrote:
  If there is an error writing to the remote file due to a permission
  denied error, rsync 2.6.1's client exits with an error code of 23, and
  an informative error message.
 
 ... and no error message logged in the server's log file.
 
 Rsync has historically been hesitent to return error messages from a
 server to the client for fear of revealing too much information.  The
 2.6.0 and 2.6.1 releases were returning error messages but failing to
 log them in the server's log file.  The 2.6.2 release reverts back to
 the historical way this was handled.
 
 A better solution for the future would be to log all errors to the
 server log and send some/most of them to the user as well.  However,
 that will be a complex change, and it has not been worked on yet.
 
 A simpler solution would be to duplicate ALL the messages (the lack of
 selectivity make this change easy).  The appended patch should do this,
 if you so desire to go that route.

Thanks for the patch.  I strongly vote for rsync errors being delivered
to the client and would like to see this default again in the next
version (perhaps with a command-line switch if necessary?).  In my case
with BackupPC it emulates an rsync client and it needs to see client
errors so it can log and count them, and for client read errors it needs
to remove the bad file which is otherwise zero-filled (this happens with
locked files on cygwin/WinXX).

I saw your patch that returns a bad file checksum in the case of read
errors.  The drawback is the bad file requires two passes, since it
will fail on both passes, but retrying the file is probably worthwhile
in case the failure was intermittent.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: cwRync and Windows permissions

2004-05-23 Thread Craig Barratt
[EMAIL PROTECTED] writes:

 Have a look at
 
 http://www.itefix.no/phpws/index.php?module=faqFAQ_op=viewFAQ_id=12
 
 In short :
 
 Right click My Computer Go to Properties Go to the Advanced Tab Click
 Environment Variables In the bottom section (System variables), add the
 new entry: CYGWIN, with value nontsec Restart the rsync service Make sure
 the folders you are uploading to have the permissions you want the files
 to inherit. Doing this, I’ve found the uploaded files get the correct
 permissions.

It works for me too.  Thanks.

An alternative to the system-wide variable is to add

--env CYGWIN=nontsec

to the cygrunsrv command-line when you install rsyncd as a service, eg:

cygrunsrv -I rsyncd --env CYGWIN=nontsec -p c:/cygwin/bin/rsync.exe -a 
--config=/etc/rsyncd.conf --daemon --no-detach

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Fwd: Re: setting checksum_seed

2004-05-18 Thread Craig Barratt
Wayne Davison writes:

 On Sat, May 15, 2004 at 02:25:11PM -0700, Craig Barratt wrote:
  Any feedback on this patch and the possibility of getting it
  into CVS or the patches directory?
 
 The file checksum-seed.diff was put into the patches dir on the 2nd of
 May.  Strangely, I don't seem to have sent any email indicating this
 (my apologies about that).
 
 I think that this patch is a good candidate to go into the next
 release.

Unfortunately the checksum-seed.diff patch breaks authentication in
rsyncd.

The problem is that when you specify --checksum-seed=N on the client
when connecting to an rsyncd server, the authentication response is
based on an MD4 digest computed by calling sum_init(), sum_update() and
sum_end().  sum_init() adds checksum_seed to the digest data.  The
problem at this point is the args have not been sent to the server
(that happens after authentication), so the client has checksum_seed=N
and the server still has checksum_seed=0, so authentication fails.

Probably the best solution is to add a flag argument to sum_init(void)
to request whether to add checksum_seed or not.  authenticate.c calls
sum_init(0) in two places, and match.c and receiver.c call sum_init(1).
Other alternatives of adding a second sum_init_nochecksumseed() function
or saving/restoring checksum_seed in authenticate.c seem ugly.

If you agree with this fix I will have happy to submit a new patch
in the next few days.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Fwd: Re: setting checksum_seed

2004-05-16 Thread Craig Barratt
Wayne Davison writes:

 On Sat, May 15, 2004 at 02:25:11PM -0700, Craig Barratt wrote:
  Any feedback on this patch and the possibility of getting it
  into CVS or the patches directory?
 
 The file checksum-seed.diff was put into the patches dir on the 2nd of
 May.  Strangely, I don't seem to have sent any email indicating this
 (my apologies about that).

...and my apologies for not checking CVS before I sent my email.

 I think that this patch is a good candidate to go into the next
 release.

Thanks again!
Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Fwd: Re: setting checksum_seed

2004-05-15 Thread Craig Barratt
Any feedback on this patch and the possibility of getting it
into CVS or the patches directory?

Thanks,
Craig

-- Forwarded message --
To: jw schultz [EMAIL PROTECTED]
From: Craig Barratt [EMAIL PROTECTED]
cc: [EMAIL PROTECTED]
Date: Sat, 01 May 2004 17:06:10 -0700
Subject: Re: setting checksum_seed 

jw schultz writes:

   There was some talk last year about adding a --fixed-checksum-seed
   option, but no consensus was reached.  It shouldn't hurt to make the
   seed value constant for certain applications, though, so you can feel
   free to proceed in that direction for what you're doing for your client.
   
   FYI, I just checked in some changes to the checksum_seed code that will
   make it easier to have other options (besides the batch ones) specify
   that a constant seed value is needed.
  
  I would really like a --fixed-csumseed option become a standard
  feature in rsync.  Just using the batch value (32761) is fine.
  Can I contribute a patch?  The reason I want this is the next
  release of BackupPC will support rsync checksum caching, so that
  backups don't need to recompute block or file checksums.  This
  requires a fixed checksum seed on the remote rsync, hence the
  need for --fixed-csumseed.  I've included this feature in a
  pre-built rsync for cygwin that I include on the SourceForge
  BackupPC downloads.
 
 1.  Yes, you may contribute a patch.  I favor the idea of
 being able to supply a checksum seed.
 
 2.  Lets get the option name down to a more reasonable
 length.  --checksum-seed should be sufficient.

I submitted a patch in Feb 2004 to add a --fixedcsum-seed option
(which only sets checksum_seed to 32761, the batch file value):

http://lists.samba.org/archive/rsync/2004-February/008616.html

Earlier, I submitted a patch (against 2.5.6pre1 in Jan 2003)
for --checksum-seed=NUM:

http://lists.samba.org/archive/rsync/2003-January/004845.html

Since I posted both of these patches, there was an interesting thread
started by Eran Tromer about potential block checksum collisions that
could be exploited by someone to trigger first-pass failures. See:

http://lists.samba.org/archive/rsync/2004-March/008821.html

The consequence is just a performance penalty, since with very
high probability the whole-file checksum fails, triggering the
second pass with the full checksum size, which will succeed.
Eran recommended that checksum_seed be more random than time().

BackupPC now supports rsync checksum caching, so I would really like
an rsync command-line option to set the checksum_seed.  Based on the
thread started by Eran I am reverting to the --checksum-seed=NUM form,
since this allows paranoid users to pick their own random value should
they wish to avoid the issue raised by Eran, plus it also allows my
BackupPC users to specify a fixed value so that caching is useful
(subject to the same caveats raised by Eran).

Here's a new patch against rsync-2.6.2.  JW's earlier changes
have simplified this patch.  Could this be applied to CVS,
or at a minimum added to the patches directory?

Note: the patch does not allow the case of --checksum-seed=0, since
the code in compat.c replaces the value 0 with time(0).  I don't think
it is necessary to support this case (which means disable adding the
seed to the MD4 digests).  If people feel strongly about this I can
also support the case --checksum-seed=0, although it will make the
code a little uglier (we'll need another global variable).

Thanks,
Craig

--- options.c   2004-04-17 10:07:23.0 -0700
+++ options.c   2004-05-01 16:24:44.380672000 -0700
@@ -290,6 +290,7 @@
   rprintf(F, --bwlimit=KBPS  limit I/O bandwidth, KBytes per second\n);
   rprintf(F, --write-batch=PREFIXwrite batch fileset starting with 
PREFIX\n);
   rprintf(F, --read-batch=PREFIX read batch fileset starting with PREFIX\n);
+  rprintf(F, --checksum-seed=NUM set block/file checksum seed\n);
   rprintf(F, -h, --help  show this help screen\n);
 #ifdef INET6
   rprintf(F, -4  prefer IPv4\n);
@@ -386,6 +387,7 @@
   {from0,   '0', POPT_ARG_NONE,   eol_nulls, 0, 0, 0},
   {no-implied-dirs,  0,  POPT_ARG_VAL,implied_dirs, 0, 0, 0 },
   {protocol, 0,  POPT_ARG_INT,protocol_version, 0, 0, 0 },
+  {checksum-seed,0,  POPT_ARG_INT,checksum_seed, 0, 0, 0 },
 #ifdef INET6
   {0,'4', POPT_ARG_VAL,default_af_hint, AF_INET, 0, 0 },
   {0,'6', POPT_ARG_VAL,default_af_hint, AF_INET6, 0, 0 },
@@ -911,6 +913,11 @@
goto oom;
args[ac++] = arg;
}
+   if (checksum_seed) {
+   if (asprintf(arg, --checksum_seed=%d, checksum_seed)  0)
+   goto oom;
+   args[ac++] = arg;
+   }
 
if (keep_partial)
args[ac++] = --partial;
--- rsync.yo2004-04-30 11:02:43.0 -0700
+++ rsync.yo2004-05-01 16:59

Re: setting checksum_seed

2004-05-01 Thread Craig Barratt
jw schultz writes:

   There was some talk last year about adding a --fixed-checksum-seed
   option, but no consensus was reached.  It shouldn't hurt to make the
   seed value constant for certain applications, though, so you can feel
   free to proceed in that direction for what you're doing for your client.
   
   FYI, I just checked in some changes to the checksum_seed code that will
   make it easier to have other options (besides the batch ones) specify
   that a constant seed value is needed.
  
  I would really like a --fixed-csumseed option become a standard
  feature in rsync.  Just using the batch value (32761) is fine.
  Can I contribute a patch?  The reason I want this is the next
  release of BackupPC will support rsync checksum caching, so that
  backups don't need to recompute block or file checksums.  This
  requires a fixed checksum seed on the remote rsync, hence the
  need for --fixed-csumseed.  I've included this feature in a
  pre-built rsync for cygwin that I include on the SourceForge
  BackupPC downloads.
 
 1.  Yes, you may contribute a patch.  I favor the idea of
 being able to supply a checksum seed.
 
 2.  Lets get the option name down to a more reasonable
 length.  --checksum-seed should be sufficient.

I submitted a patch in Feb 2004 to add a --fixedcsum-seed option
(which only sets checksum_seed to 32761, the batch file value):

http://lists.samba.org/archive/rsync/2004-February/008616.html

Earlier, I submitted a patch (against 2.5.6pre1 in Jan 2003)
for --checksum-seed=NUM:

http://lists.samba.org/archive/rsync/2003-January/004845.html

Since I posted both of these patches, there was an interesting thread
started by Eran Tromer about potential block checksum collisions that
could be exploited by someone to trigger first-pass failures. See:

http://lists.samba.org/archive/rsync/2004-March/008821.html

The consequence is just a performance penalty, since with very
high probability the whole-file checksum fails, triggering the
second pass with the full checksum size, which will succeed.
Eran recommended that checksum_seed be more random than time().

BackupPC now supports rsync checksum caching, so I would really like
an rsync command-line option to set the checksum_seed.  Based on the
thread started by Eran I am reverting to the --checksum-seed=NUM form,
since this allows paranoid users to pick their own random value should
they wish to avoid the issue raised by Eran, plus it also allows my
BackupPC users to specify a fixed value so that caching is useful
(subject to the same caveats raised by Eran).

Here's a new patch against rsync-2.6.2.  JW's earlier changes
have simplified this patch.  Could this be applied to CVS,
or at a minimum added to the patches directory?

Note: the patch does not allow the case of --checksum-seed=0, since
the code in compat.c replaces the value 0 with time(0).  I don't think
it is necessary to support this case (which means disable adding the
seed to the MD4 digests).  If people feel strongly about this I can
also support the case --checksum-seed=0, although it will make the
code a little uglier (we'll need another global variable).

Thanks,
Craig

--- options.c   2004-04-17 10:07:23.0 -0700
+++ options.c   2004-05-01 16:24:44.380672000 -0700
@@ -290,6 +290,7 @@
   rprintf(F, --bwlimit=KBPS  limit I/O bandwidth, KBytes per second\n);
   rprintf(F, --write-batch=PREFIXwrite batch fileset starting with 
PREFIX\n);
   rprintf(F, --read-batch=PREFIX read batch fileset starting with PREFIX\n);
+  rprintf(F, --checksum-seed=NUM set block/file checksum seed\n);
   rprintf(F, -h, --help  show this help screen\n);
 #ifdef INET6
   rprintf(F, -4  prefer IPv4\n);
@@ -386,6 +387,7 @@
   {from0,   '0', POPT_ARG_NONE,   eol_nulls, 0, 0, 0},
   {no-implied-dirs,  0,  POPT_ARG_VAL,implied_dirs, 0, 0, 0 },
   {protocol, 0,  POPT_ARG_INT,protocol_version, 0, 0, 0 },
+  {checksum-seed,0,  POPT_ARG_INT,checksum_seed, 0, 0, 0 },
 #ifdef INET6
   {0,'4', POPT_ARG_VAL,default_af_hint, AF_INET, 0, 0 },
   {0,'6', POPT_ARG_VAL,default_af_hint, AF_INET6, 0, 0 },
@@ -911,6 +913,11 @@
goto oom;
args[ac++] = arg;
}
+   if (checksum_seed) {
+   if (asprintf(arg, --checksum_seed=%d, checksum_seed)  0)
+   goto oom;
+   args[ac++] = arg;
+   }
 
if (keep_partial)
args[ac++] = --partial;
--- rsync.yo2004-04-30 11:02:43.0 -0700
+++ rsync.yo2004-05-01 16:59:48.546313600 -0700
@@ -348,6 +348,7 @@
  --bwlimit=KBPS  limit I/O bandwidth, KBytes per second
  --write-batch=PREFIXwrite batch fileset starting with PREFIX
  --read-batch=PREFIX read batch fileset starting with PREFIX
+ --checksum-seed=NUM set block/file checksum seed
  -h, --help   

Re: Rsync hangs with XP machine used as sshd server

2004-04-15 Thread Craig Barratt
Agostino Russo writes:

 I have a problem with rsync 2.6 protocol 27 (both client and server) 
 running over XP via Cygwin and sshd (on remote machine). It just hangs 
 almost randomly  while transfering files after transfering a few 
 megabytes, not always on the same file. When the remote machine is a 
 Linux server I have no problems, bad news is I need also to rsync from 
 XP...  I found the problem mentioned somewhere else doing a google 
 serach, see for instance:
 
 http://www.cygwin.com/ml/cygwin/2003-08/msg01065.html
 
 Which seems to be exactly the same issue, see the link for more info 
 about the situation. Unfortunately I wasn't able to find a solution so 
 far on the web. I was hoping that somebody here knows a way around this 
 problem (other than dropping XP :-)

I've never found rsync + ssh + cygwin to be reliable; your
symptoms are the same as mine.  I recommend using rsync in
daemon mode with cygwin.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Failed rsync -- two different files considered up to date

2004-03-29 Thread Craig Barratt
Greger Cronquist writes:

 I've used rsync successfully for several years, syncing between two 
 Windows 2000 servers using daemon mode, but today I stumbled accross 
 something peculiar. I'm using cygwin with rsync 2.6.0 at both ends (the 
 latest available at this date) and I have a file that rsync considers up 
 to date even though both the md5 and a normal diff show differences. 
 I've tried calling rsync with several different options, most notably -c 
 for forcing checksum, but it fails to see a difference between the files.
 
 Are there any things I should try or information that I can include? All 
 -vvv gives me is uptodate.

How about -I (--ignore-times)?

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Speed up rsync ,cwRsync and replay changes against a file

2004-02-25 Thread Craig Barratt
 I recently installed and setup cwRsync on a Windows 2000 Server -
 http://www.itefix.no/cwrsync/ -, and I was very impressed. I just
 followed the instructions on the website and got it working.=20
 
 I am using it to mirror 30Gb's of mailboxes everynight (only grabbing
 the changes to each file), from a Windows 2000 box to a Linux box (RH9).
 
 The nightly replication takes approximately 8 hrs to complete, but the
 actual size of the mailbox directory only increases by about 120Mb a
 day. There are 750 mailboxes and each mailbox is between 50 and 200Mb in
 size.
 
 I am using the following command line options:
 
 rsync -avz hostname::MailBoxes /mailboxreplica
 
 Can anyone recommend ways to speed this up - is there some extra
 compression I can use, or a kind of quick checksum option that I could
 use?

If you are on a fast network, -z will probably slow you down.

Rsync + cygwin is typically slow due to the system call overhead in
cygwin.  There is a performance patch (patches/craigb-perf.diff)
included with the 2.5.6, 2.5.7 and 2.6.0 releases that makes a
measurable improvement.  This patch is now in CVS.

So you should build rsync from releases sources after applying the
patches/craigb-perf.diff patch (or build from CVS).  Or you can try a
pre-built executable with the patch, like the cygwin-rsyncd package
at http://backuppc.sourceforge.net.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Not Again! (Was: Re: FAQ: Moving files between two machines using rsync)

2004-02-23 Thread Craig Barratt
Mauricio writes:

   I can't believe this!  I am having the very same problem I 
 had before.  For those who do not remember, I was trying to rsync a 
 file from a Solaris 9 box(kushana)  to a netbsd 1.6.1 (the rsync 
 server, katri) box, without much luck:
 
 [EMAIL PROTECTED]rsync -vz \
 ? --password-file=/export/home/raub/nogo \
 ? /export/home/raub/sync-me \
 ? [EMAIL PROTECTED]::tmp
 NetBSD 1.6.1 (GENERIC) #0: Tue Apr 8 21:00:42 UTC 2003
 
 Welcome to NetBSD!
 
 @ERROR: auth failed on module tmp
 rsync: connection unexpectedly closed (164 bytes read so far)
 rsync error: error in rsync protocol data stream (code 12) at io.c(165)
 [EMAIL PROTECTED]

One other thing to check is that the /etc/rsyncd.secrets file
ends in a newline.  The last entry will be ignored if that
line doesn't end with a newline.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: checksum_seed

2004-02-16 Thread Craig Barratt
jw schultz writes:

 1.  Yes, you may contribute a patch.  I favor the idea of
 being able to supply a checksum seed.
 
 2.  Lets get the option name down to a more reasonable
 length.  --checksum-seed should be sufficient.

I submitted a patch against 2.5.6pre1 last January for --checksum-seed=NUM:

http://lists.samba.org/archive/rsync/2003-January/004845.html

but in that thread Dave Dykstra correctly pointed out there wasn't
much point in letting the user specify a particular value.

Therefore, I switched to just a flag that forces the fixed value of
32761 (same as batch mode).  I picked the option named --fixed-csumseed,
which is long but hopefully informative.

Here's a patch against CVS using --fixed-csumseed.  I also added it to
the usage and documentation, but it's not clear this option needs to be
exposed to the user.

Craig

diff -bur rsync/options.c rsync-fixedcsum/options.c
--- rsync/options.c Tue Feb 10 20:30:41 2004
+++ rsync-fixedcsum/options.c   Mon Feb 16 12:32:23 2004
@@ -89,6 +89,7 @@
 int modify_window = 0;
 int blocking_io = -1;
 int checksum_seed = 0;
+int fixed_csumseed = 0;
 unsigned int block_size = 0;
 
 
@@ -288,6 +289,7 @@
   rprintf(F, --bwlimit=KBPS  limit I/O bandwidth, KBytes per second\n);
   rprintf(F, --write-batch=PREFIXwrite batch fileset starting with 
PREFIX\n);
   rprintf(F, --read-batch=PREFIX read batch fileset starting with PREFIX\n);
+  rprintf(F, --fixed-csumseeduse fixed MD4 block/file checksum seed\n);
   rprintf(F, -h, --help  show this help screen\n);
 #ifdef INET6
   rprintf(F, -4  prefer IPv4\n);
@@ -303,7 +305,7 @@
 enum {OPT_VERSION = 1000, OPT_SENDER, OPT_EXCLUDE, OPT_EXCLUDE_FROM,
   OPT_DELETE_AFTER, OPT_DELETE_EXCLUDED, OPT_LINK_DEST,
   OPT_INCLUDE, OPT_INCLUDE_FROM, OPT_MODIFY_WINDOW,
-  OPT_READ_BATCH, OPT_WRITE_BATCH};
+  OPT_READ_BATCH, OPT_WRITE_BATCH, OPT_FIXED_CSUMSEED};
 
 static struct poptOption long_options[] = {
   /* longName, shortName, argInfo, argPtr, value, descrip, argDesc */
@@ -379,6 +381,7 @@
   {hard-links,  'H', POPT_ARG_NONE,   preserve_hard_links, 0, 0, 0 },
   {read-batch,   0,  POPT_ARG_STRING, batch_prefix,  OPT_READ_BATCH, 0, 0 },
   {write-batch,  0,  POPT_ARG_STRING, batch_prefix,  OPT_WRITE_BATCH, 0, 0 },
+  {fixed-csumseed,   0,  POPT_ARG_NONE,   0, OPT_FIXED_CSUMSEED, 0, 0 },
   {files-from,   0,  POPT_ARG_STRING, files_from, 0, 0, 0 },
   {from0,   '0', POPT_ARG_NONE,   eol_nulls, 0, 0, 0},
   {no-implied-dirs,  0,  POPT_ARG_VAL,implied_dirs, 0, 0, 0 },
@@ -564,6 +567,11 @@
checksum_seed = FIXED_CHECKSUM_SEED;
break;
 
+   case OPT_FIXED_CSUMSEED:
+   fixed_csumseed = 1;
+   checksum_seed = FIXED_CHECKSUM_SEED;
+   break;
+
case OPT_LINK_DEST:
 #if HAVE_LINK
compare_dest = (char *)poptGetOptArg(pc);
@@ -931,6 +939,10 @@
args[ac++] = --files-from=-;
args[ac++] = --from0;
}
+   }
+
+   if (fixed_csumseed) {
+   args[ac++] = --fixed-csumseed;
}
 
*argc = ac;
diff -bur rsync/rsync.yo rsync-fixedcsum/rsync.yo
--- rsync/rsync.yo  Mon Feb  2 10:23:09 2004
+++ rsync-fixedcsum/rsync.yoMon Feb 16 12:36:08 2004
@@ -348,6 +348,7 @@
  --bwlimit=KBPS  limit I/O bandwidth, KBytes per second
  --write-batch=PREFIXwrite batch fileset starting with PREFIX
  --read-batch=PREFIX read batch fileset starting with PREFIX
+ --fixed-csumseeduse fixed MD4 block/file checksum seed
  -h, --help  show this help screen
 
 
@@ -879,6 +880,15 @@
 dit(bf(--read-batch=PREFIX)) Apply a previously generated change batch,
 using the fileset whose filenames start with PREFIX. See the BATCH
 MODE section for details.
+
+dit(bf(--fixed-csumseed)) Set the MD4 checksum seed to the fixed
+value 32761.  This 4 byte checksum seed is included in each block and
+file MD4 checksum calculation.  By default the checksum seed is generated
+by the server and defaults to the current time(), or 32761 if
+bf(--write-batch) or bf(--read-batch) are specified.  This default
+causes the MD4 block and file checksums to be different each time rsync
+is run. For applications that cache the block or file checksums the
+checksum seed needs to be fixed each time rsync runs using this option.
 
 enddit()
 
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [patch] Add `--link-by-hash' option (rev 2).

2004-02-16 Thread Craig Barratt
Jason M. Felice writes:

 This patch adds the --link-by-hash=DIR option, which hard links received
 files in a link farm arranged by MD4 file hash.  The result is that the system
 will only store one copy of the unique contents of each file, regardless of
 the file's name.
 
 (rev 2)
 * This revision is actually against CVS HEAD (I didn't realize I was working
   from a stale rsync'd CVS).
 * Apply permissions after linking (permissions were lost if we already had
   a copy of the file in the link farm).

I haven't studied your patch, but I have a couple of comments/questions:

  - If you update permissions, then all hardlinks will change too.
Does that mean that all instances of an identical file will get
the last mtime/permissions/ownership?  Or does the link farm have
unique entries for contents plus meta data (vs just contents)?

  - Some file systems have a hardlink limit of 32000.  You will need to
roll to a new file when that limit is exceeded (ie: link() fails).
Also, empty files tend to be quite prevalent, so it is probably
easier to just create those files and not link them (should be no
difference in disk usage).

  - How does this patch interact with -H?

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: checksum_seed

2004-02-10 Thread Craig Barratt
On Mon, Feb 09, 2004 at 09:14:06AM -0500, Jason M. Felice wrote:

 I got the go-ahead from the client on my --link-by-hash proposal, and
 the seed is making the hash unstable.  I can't figure out why the seed
 is there so I don't know whether to cirumvent it in my particular case
 or calculate a separate, stable hash.

I believe the checksum seed is meant to reduce the chance that different
data could repeatedly produce the same md4 digest over multiple runs.
If a collision happens the hope is that a different checksum seed will
break the collision.

However, my guess is that it doesn't make any difference.  Certainly
adding the seed at the end of the block won't change a collision even
if the seed changes over multiple runs.  File MD4 checksums add the
seed at the beginning, which might help breaking collisions, although
I'm not sure.

Wayne Davison writes:

 There was some talk last year about adding a --fixed-checksum-seed
 option, but no consensus was reached.  It shouldn't hurt to make the
 seed value constant for certain applications, though, so you can feel
 free to proceed in that direction for what you're doing for your client.
 
 FYI, I just checked in some changes to the checksum_seed code that will
 make it easier to have other options (besides the batch ones) specify
 that a constant seed value is needed.

I would really like a --fixed-csumseed option become a standard
feature in rsync.  Just using the batch value (32761) is fine.
Can I contribute a patch?  The reason I want this is the next
release of BackupPC will support rsync checksum caching, so that
backups don't need to recompute block or file checksums.  This
requires a fixed checksum seed on the remote rsync, hence the
need for --fixed-csumseed.  I've included this feature in a
pre-built rsync for cygwin that I include on the SourceForge
BackupPC downloads.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


BackupPC 2.0.0beta0 released - now supports rsync

2003-02-24 Thread Craig Barratt
I just released version 2.0.0beta0 of BackupPC on SourceForge, see

http://backuppc.sourceforge.net/

What is BackupPC?  It is an enterprise-grade open-source package for
backing up WinXX and *nix systems to disk.  It supports transport via
SMB, tar and now rsync over rsh/ssh and rsyncd.  The backend features
hard-linking of any identical files (not just files with the same
name) and compression, giving a 6x to 10x reduction in disk storage.
It also has a comprehensive web (CGI) interface.

The rsync support in BackupPC is based on File::RsyncP, a perl rsync
client; see http://perlrsync.sourceforge.net.

A future version of BackupPC will also support block and file checksum
caching for additional performance.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html


Re: Fast Cygwin binaries ?

2003-02-19 Thread Craig Barratt
 I read in the archives that somebody has a faster binary version floating 
 around. How might I get ahold of it? (If you have it, would it be possible 
 to e-mail me a copy?)

Fetch 2.5.6 and apply the patch in patches/craigb-perf.diff before you
build it.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync vs. rcp

2003-02-19 Thread Craig Barratt
 I wasn't aware that it had this. Was it there at the time of the
 original discussion (Oct 2002)? The people involved in the discussion
 then didn't seem to know this.

I wasn't aware of it in Oct 2002 during that discussion.  I saw it in
the code a month or two after that.  I haven't checked the history,
but it is definitely there in 2.5.5.

 However, it's not really adequate. A 16K block size only really works
 for files up to about 500M. Still... that's a lot better than I thought
 it was at the time.

Agreed.  Checksum length matters a lot more than block size, as you
pointed out in your earlier analysis.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync in cygwin as service

2003-02-11 Thread Craig Barratt
 If I try to  start rsync from command line it simply do nothig:
 
 $ rsync --daemon
 
 Administrator@dm-w2ks /usr/bin
 
 $ ps
PIDPPIDPGID WINPID  TTY  UIDSTIME COMMAND
480   1 480480  con  500 04:15:03 /usr/bin/bash
   1428 4801428   1420  con  500 05:26:46 /usr/bin/ps
 
 Administrator@dm-w2ks /usr/bin
 
 So I'm trying to set it as service:
 
 C:\cygwin\bincygrunsrv -I RSYNC -d Rsync -p /bin/rsync.exe -a 
 --daemon --n o-detach

I've found on cygwin that I need to explicitly tell it where the
config file is, both on the command line and with cygrunsrv.  I
haven't investigated; perhaps the platform default is some other
file.

These commands work for me:

rsync --config=/etc/rsyncd.conf --daemon

and

cygrunsrv -I RSYNC -p /bin/rsync.exe -a '--config=/etc/rsyncd.conf --daemon 
--no-detach'

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: duplicated file removal: call for comment

2003-02-11 Thread Craig Barratt
 This problem may be discussed now, because in versions before
 rsync-2.5.6, the algorithm for removing the so called duplicated files
 was broken.
 That's why we expect nobody used it anyway in earlier versions - but who
 knows..

I agree it should be the last argument that wins, but as Wayne points
out your code and 2.5.6 have unpredictable behavior since qsort() could
return identical names in any order.

Another concern I have about this fix in 2.5.6 is that there is risk
the change is not backward compatible with earlier protocol versions.
The file list is sent (unsorted and uncleaned) from the sender to the
receiver, and each side then sorts and cleans the list.  Since the
duplicate removal changed in 2.5.6, but the protocol number didn't
change, it is possible that with duplicates the file lists are no
longer identical.  Specifically, with three or more duplicates, 2.5.5
and earlier will remove the even ones, while 2.5.6 correctly removes
all but the first.  Remember that the files are referred to as an
integer index into the sorted file list, and the receiver skips
NULL (duplicate) files.

I suspect (but haven't checked) that if a 2.5.5 receiver is talking to
a 2.5.6 sender then 2.5.5 will send the index for the 3rd file, which
will be null_file on 2.5.6.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync in cygwin as service

2003-02-11 Thread Craig Barratt
 Certanly, I tried --config
 Could you tell me which rsync version do you use?

rsync 2.5.5 and rsync 2.5.6 both work fine for me.

Is it possible that rsync is already running as a service?
It won't show up in cygwin's ps.  For example, when rsync
is running via cygrunsrv, if I type:

rsync --daemon

it exits with no error, but ps shows no process.  But rsync is
indeed running, eg:

tcsh 438% telnet localhost 873
Trying 127.0.0.1...
Connected to .
Escape character is '^]'.
@RSYNCD: 26
quit
@ERROR: protocol startup error
Connection closed by foreign host.

You can also see rsync.exe in the windows task manager.

You could also try a different port number to see if there is
someone else on 873:

craigslt 461% ps aux | egrep rsync
craigslt 462% rsync --daemon --port=1234
craigslt 463% ps aux | egrep rsync
 4020   14020   4020? 1005 23:29:08 /bin/rsync

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync in-place (was Re: rsync 1tb+ each day)

2003-02-05 Thread Craig Barratt
  Is it possible to tell rsync to update the blocks of the target file=20
  'in-place' without creating the temp file (the 'dot file')?  I can=20
  guarantee that no other operations are being performed on the file at=20
  the same time.  The docs don't seem to indicate such an option.
 
 No, it's not possible, and making it possible would require a deep
 and fundamental redesign and re-implementation of rsync; the result
 wouldn't resemble the current program much.

I disagree.  An --inplace option wouldn't be too hard to implement.
The trick is that when --inplace is specified the block matching
algorithm (on the sender) would only match blocks at or after that
block's location (on the receiver).  No protocol change is required.
The receiver can then operate in-place since no matching blocks are
earlier in the file.  This could be relaxed to allow a fixed number
of earlier blocks, based on the knowledge the receiver will buffer
reads.  But that is more risky.  Caveat user: if you specify --inplace
and the source file has a single byte added to the beginning then the
entire file will be sent as literal data.

Of course, a major issue with --inplace is that the file will be
in an intermediate state if rsync is killed mid-transfer.  Rsync
currently ensures that every file is either the original or new.

Another independent optimization would be to do lazy writes.  Currently,
if you specify -I (--ignore-times) the output file is written (to a tmp
file and then renamed) even if the contents are identical.  Instead,
creation of the tmp file could be delayed until the output file is
known to be different.  This is detected either by an out-of-sequence
block number from the sender, or any literal data.  If the file contains
only in-sequence block numbers and no literal data, then there is no
need to write anything.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync 1tb+ each day

2003-02-05 Thread Craig Barratt
 I am rsyncing 1tb of data each day.  I am finding in my testing that 
 actually removing the target files each day then rsyncing is faster than 
 doing a compare of the source-target files then rsyncing over the delta 
 blocks.  This is because we have a fast link between the two boxes, and 
 that are disk is fairly slow. I am finding that the creation of the temp 
 file (the 'dot file') is actually the slowest part of the operation. 
 This has to be done for each file because the timestamp and at least a 
 couple blocks are guaranteed to have changed (oracle files).

How big are the individual files?  If they are bigger than 1-2GB then it
is possible rsync is failing on the first pass and repeating the file.
You should be able to see from the output of -vv (you will see a
message like redoing fileName (nnn)).

The reason for this is that the first-pass block checksum (32 bits Adler
+ 16 bits of MD4) is too small for large files.  There was a long thread
about this a few months ago.  The first message was from Terry Reed
around mid Oct 2002 (Problem with checksum failing on large files).

In any case, as your already note, if the network is fast and the disk
is slow then copying the files will be faster.  Rsync on the receiving
side reads each file 1-2 times and writes each file once, while copying
just requires a write on the receiving side.

Another comment: rsync doesn't buffer its writes, so each write
is a block (as little as 700 bytes, or up to 16K for big files).
Buffering the writes might help.  There is an optional buffering
patch (patches/craigb-perf.diff) included with rsync 2.5.6 that
improves the write buffering, plus other I/O buffering.  That
might improve the write performance, althought so far significant
improvements have only been seen on cygwin.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Incremental transfers: how to tell?

2003-01-28 Thread Craig Barratt
 James Kilton wrote:
  To follow up on this... I found the --stats option and
  here's what I'm getting:
 
  Number of files: 36
  Number of files transferred: 36
  Total file size: 10200816 bytes
  Total transferred file size: 10200816 bytes
  Literal data: 10200816 bytes
  Matched data: 0 bytes
  File list size: 576
  Total bytes written: 10203996
  Total bytes read: 596
 
  So, I don't know why no parts of the files are
  matching.  The files are the same save for 1 or 2
  values changing every 5 minutes.  I don't know if
  anyone here is familiar with RRD files, but they're
  database files commonly used for SNMP data collection.
   All the fields are created initially so the file size
  never changes -- the fields are populated as time goes
  on.
 
  Is RSync unable to do incremental transfers of
  non-text files?
 
 No, it is perfectlty capable of this.
 
 Is is possible that your files are changing in widely scattered places, such
 that every block that rsync examines has changed?

Since only 596 bytes were read, the receiving side clearly doesn't
even see the old files and send the checksums (10MB of literal data
should be around 10MB/700 * 6 bytes of checksums).

So you appear to be rsync'ing to files on the receiving side that
don't exist, or they cannot be read (permissions problem?).  Please
check your path names etc.  What happens if you run the same command
twice?

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Proposal that we now create two branches - 2_5 and head

2003-01-28 Thread Craig Barratt
 I have several patches that I'm planning to check in soon (I'm waiting
 to see if we have any post-release tweaking to and/or branching to do).
 This list is off the top of my head, but I think it is complete:

And I have several things I would like to work on and submit:

 - Fix the MD4 block and file checksums to comply with the rfc
   (currently MD4 is wrong for blocks of size 64*n, or files
   longer than 512MB).

 - Adaptive first pass checksum lengths: use 3 or more bytes of the MD4
   block checksum for big files (instead of 2).  This is to avoid almost
   certain first pass failures on very large files.  (The block-size is
   already adaptive, increasing up to 16K for large files.)

 - Resubmit my --fixed-checksum-seed patch for consideration for 2.6.x.

 - Resubmit my buffering/performance patch for consideration for 2.6.x.

 - For --hard-links it is only necessary to send dev,inode for files
   that have at least 2 links.  Currently dev,inode is sent for
   every file when the file list is sent.  In a typical *nix file
   system only a very small percentage of files have at least 2
   links.  Unfortunately all the bits in the flag byte are used,
   so another flag byte (to indicate whether dev,inode is
   present) would be necessary with --hard-links (unless someone
   has a better idea).  This would save sending up to 7 bytes
   per file (or actually as many as 23 bytes per file for 64 bit
   dev,inode).

Except for the last, all these items were discussed in this group over
the last few months.  The first two items and last item require a bump
in the protocol number, so I would like to include all of them together.

But before I work on these I would like to make sure there is interest
in including them.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Storage compression patch for Rsync (unfinished)

2003-01-26 Thread Craig Barratt
 Is there any reason why caching programs would need to set the
 value, rather than it just being a fixed value?
 I think it is hard to describe what this is for and what it should be
 set to.  Maybe a --fixed-checksum-seed option would make some sense,
 or for a caching mechanism to be built in to rsync if it is shown to
 be very valuable.

A fixed value would be perfectly ok; the same magic value that batch
mode uses (32761) would make sense.

 I know people have proposed some caching mechanisms in the past and
 they've been rejected for one reason or another.

One difficulty is that additional files, or new file formats, are needed
for storing the checksums, and that moves rsync further away from its
core purpose.

 I don't think I'll include the option in 2.5.6.

If I submitted a new patch with --fixed-checksum-seed, would you be
willing to at least add it to the patches directory for 2.5.6?

I will be adding block and file checksum caching to BackupPC, and
that needs --fixed-checksum-seed.  This will save me from providing
a customized rsync (or rsync patches) as part of BackupPC; I would
much rather tell people to get a vanilla 2.5.6 rsync release and
apply the specific patch that comes with the release.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Storage compression patch for Rsync (unfinished)

2003-01-26 Thread Craig Barratt
 Block checksums come from the receiver so cached block
 checksums are only useful when sending to a server which had
 better know it has block checksums cached.

The first statement is true (block checksums come from the receiver),
but the second doesn't follow.  I need to cover the case where the
client is the receiver and the client is caching the checksums. That
needs a command-line switch, since the server would otherwise use
time(NULL) as the checksum seed, which is then sent from the server
to the client at protocol startup.

I agree with your changes though: the command-line handling code can set
checksum_seed if any of write-batch, read-batch, or fixed-checksum-seed
are specified, avoiding the additional variable.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Cygwin issues: modify-window and hangs

2003-01-26 Thread Craig Barratt
 Has *anybody* been able to figure out a fix for this that really works?

Why does the receiving child wait in a loop to get killed, rather than
just exit()?  I presume cygwin has some problem or race condition in the
wait loop, kill and wait_process().

The pipe to the parent will read 0 bytes (EOF) on the parent side after
the child exits.

Although I haven't tried it, I would guess this should be the reliable
solution on all platforms.  But there must be some good reason the wait
loop, kill and wait_process() contortions appeared in the code (maybe
some race condition with the remote side?)...

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Storage compression patch for Rsync (unfinished)

2003-01-17 Thread Craig Barratt
 While the idea of rsyncing with compression is mildly
 attractive i can't say i care for the new compression
 format.  It would be better just to use the standard gzip or
 other format.  If you are going to create a new file type
 you could at least discuss storing the blocksums in it so
 that the receiver wouldn't have to generate them.

Yes!  Caching the block checksums and file checksums could yield a large
improvement for the receiver.  However, an integer checksum seed is used
in each block and file MD4 checksum. The default value is unix time() on
the server, sent to the client at startup.

So currently you can't cache block and file checksums (technically it is
possible for block checksums since the checksum seed is appended at the
end of each block, so you could cache the MD4 state prior to the checksum
seed being added; for files you can't since the checksum seed is at the
start).

Enter a new option, --checksum-seed=NUM, that allows the checksum seed to
be fixed.  I've attached a patch below against 2.5.6pre1.

The motivation for this is that BackupPC (http://backuppc.sourceforge.net)
will shortly release rsync support, and I plan to support caching
block and file checksums (in addition to the existing compression,
hardlinking among any identical files etc).  So it would be really
great if this patch, or something similar, could make it into 2.5.6
or at a minimum the contributed patch area in 2.5.6.

[Also, this option is convenient for debugging because it makes the
rsync traffic identical between runs, assuming the file states at
each end are the same too.]

Thanks,
Craig

###
diff -bur rsync-2.5.6pre1/checksum.c rsync-2.5.6pre1-csum/checksum.c
--- rsync-2.5.6pre1/checksum.c  Mon Apr  8 01:31:57 2002
+++ rsync-2.5.6pre1-csum/checksum.c Thu Jan 16 23:38:47 2003
@@ -23,7 +23,7 @@
 
 #define CSUM_CHUNK 64
 
-int checksum_seed = 0;
+extern int checksum_seed;
 extern int remote_version;
 
 /*
diff -bur rsync-2.5.6pre1/compat.c rsync-2.5.6pre1-csum/compat.c
--- rsync-2.5.6pre1/compat.cSun Apr  7 20:50:13 2002
+++ rsync-2.5.6pre1-csum/compat.c   Fri Jan 17 21:18:35 2003
@@ -35,7 +35,7 @@
 extern int preserve_times;
 extern int always_checksum;
 extern int checksum_seed;
-
+extern int checksum_seed_set;
 
 extern int remote_version;
 extern int verbose;
@@ -64,11 +64,14 @@

if (remote_version = 12) {
if (am_server) {
-   if (read_batch || write_batch) /* dw */
+   if (read_batch || write_batch) { /* dw */
+   if ( !checksum_seed_set )
checksum_seed = 32761;
-   else
+   } else {
+   if ( !checksum_seed_set )
checksum_seed = time(NULL);
write_int(f_out,checksum_seed);
+   }
} else {
checksum_seed = read_int(f_in);
}
diff -bur rsync-2.5.6pre1/options.c rsync-2.5.6pre1-csum/options.c
--- rsync-2.5.6pre1/options.c   Fri Jan 10 17:30:11 2003
+++ rsync-2.5.6pre1-csum/options.c  Thu Jan 16 23:39:17 2003
@@ -116,6 +116,8 @@
 char *backup_dir = NULL;
 int rsync_port = RSYNC_PORT;
 int link_dest = 0;
+int checksum_seed = 0;
+int checksum_seed_set;
 
 int verbose = 0;
 int quiet = 0;
@@ -274,6 +276,7 @@
   rprintf(F, --bwlimit=KBPS  limit I/O bandwidth, KBytes per second\n);
   rprintf(F, --write-batch=PREFIXwrite batch fileset starting with 
PREFIX\n);
   rprintf(F, --read-batch=PREFIX read batch fileset starting with PREFIX\n);
+  rprintf(F, --checksum-seed=NUM set MD4 checksum seed\n);
   rprintf(F, -h, --help  show this help screen\n);
 #ifdef INET6
   rprintf(F, -4  prefer IPv4\n);
@@ -293,7 +296,7 @@
   OPT_COPY_UNSAFE_LINKS, OPT_SAFE_LINKS, OPT_COMPARE_DEST, OPT_LINK_DEST,
   OPT_LOG_FORMAT, OPT_PASSWORD_FILE, OPT_SIZE_ONLY, OPT_ADDRESS,
   OPT_DELETE_AFTER, OPT_EXISTING, OPT_MAX_DELETE, OPT_BACKUP_DIR, 
-  OPT_IGNORE_ERRORS, OPT_BWLIMIT, OPT_BLOCKING_IO,
+  OPT_IGNORE_ERRORS, OPT_BWLIMIT, OPT_BLOCKING_IO, OPT_CHECKSUM_SEED,
   OPT_NO_BLOCKING_IO, OPT_WHOLE_FILE, OPT_NO_WHOLE_FILE,
   OPT_MODIFY_WINDOW, OPT_READ_BATCH, OPT_WRITE_BATCH, OPT_IGNORE_EXISTING};
 
@@ -306,6 +309,7 @@
   {ignore-times,'I', POPT_ARG_NONE,   ignore_times , 0, 0, 0 },
   {size-only,0,  POPT_ARG_NONE,   size_only , 0, 0, 0 },
   {modify-window,0,  POPT_ARG_INT,modify_window, OPT_MODIFY_WINDOW, 0, 0 },
+  {checksum-seed,0,  POPT_ARG_INT,checksum_seed, OPT_CHECKSUM_SEED, 0, 0 },
   {one-file-system, 'x', POPT_ARG_NONE,   one_file_system , 0, 0, 0 },
   {delete,   0,  POPT_ARG_NONE,   delete_mode , 0, 0, 0 },
   {existing, 0,  POPT_ARG_NONE,   only_existing , 0, 0, 0 },
@@ -489,6 +493,13 @@

possible typo/bug in receiver.c

2003-01-17 Thread Craig Barratt
The following code in receiver.c around line 421 (2.5.6pre1) contains
some dead code:

/* we initially set the perms without the
   setuid/setgid bits to ensure that there is no race
   condition. They are then correctly updated after
   the lchown. Thanks to [EMAIL PROTECTED] for pointing
   this out.  We also set it initially without group
   access because of a similar race condition. */
fd2 = do_mkstemp(fnametmp, file-mode  INITACCESSPERMS);
if (fd2 == -1) {
rprintf(FERROR,mkstemp %s failed: %s\n,fnametmp,strerror(errno));
receive_data(f_in,buf,-1,NULL,file-length);
if (buf) unmap_file(buf);
if (fd1 != -1) close(fd1);
continue;
}

/* in most cases parent directories will already exist
   because their information should have been previously
   transferred, but that may not be the case with -R */
if (fd2 == -1  relative_paths  errno == ENOENT 
create_directory_path(fnametmp, orig_umask) == 0) {
strlcpy(fnametmp, template, sizeof(fnametmp));
fd2 = do_mkstemp(fnametmp, file-mode  INITACCESSPERMS);
}
if (fd2 == -1) {
rprintf(FERROR,cannot create %s : %s\n,fnametmp,strerror(errno));
receive_data(f_in,buf,-1,NULL,file-length);
if (buf) unmap_file(buf);
if (fd1 != -1) close(fd1);
continue;
}

If mkstemp() fails (for various reasons, including the directory not
existing) then fd == -1.  So the first if () executes, which flushes
the data and does a continue.  So the next two if () statements will
never execute.

It might be an editing error (not sure how old it is).  It looks
like the first if () statement was meant to be replaced by the
next two; ie: the first if () statement should be eliminated.

I haven't backed out a command-level example that shows the difference,
but it relates to receiving into a path whose last two or more
directories don't exist.  Is rsync meant to create deep directories
that don't exist?

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Initial release of PerlRsync (perl rsync client)

2002-12-25 Thread Craig Barratt
I have just released the first version 0.10 of File::RsyncP to SourceForge.
See:

http://perlrsync.sourceforge.net

File::RsyncP is a perl implementation of an Rsync client.  It is
compatible with Rsync 2.5.5 (protocol version 26).  It can send
or receive files, either by running rsync on the remote machine,
or connecting to an rsyncd deamon on the remote machine.

What use is File::RsyncP?  The main purpose is that File::RsyncP
separates all file system I/O into a separate module, which can
be replaced by any module of your own design.  This allows rsync
interfaces to non-filesystem data types (eg: databases) to be
developed with relative ease.

File::RsyncP was initially written to provide an Rsync interface
for BackupPC, http://backuppc.sourceforge.net.  See BackupPC
for programming examples.

File::RsyncP does not yet provide a command-line interface that
mimics native Rsync.  Instead it provides an API that makes it
possible to write simple scripts that talk to rsync or rsyncd.

The File::RsyncP::FileIO module contains the default file system access
functions.  File::RsyncP::FileIO may be subclassed or replaced by a
custom module to provide access to non-filesystem data types.

If you are interested there are a couple of mailing lists
(perlrsync-announce and perlrsync-users) available on the
SF project page.

Merry Christmas, Happy Holidays, Happy Hannukkah etc.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Statistics appearing in middle of file list -- no errors

2002-12-22 Thread Craig Barratt
 Has anybody seen this?  We want to seperate the statistics out from the 
 file list, and were using tail to grab the end of the file.  the command 
 we run is:
 
rsync -r -a -z --partial  --suffix=.backup  --exclude=*.backup  \
 --stats -v /. 10.1.1.60::cds101/  /var/log/rsync.log 21
 
 along with a number of excludes to skip the /tmp, /dev, /var and /proc 
 directories.  The output in file /var/log/rsync.log is:
 
 building file list ... done
 dev/ttyp0
 etc/cups/certs/
 etc/cups/certs/0
 etc/mail/statistics
 root/.bash_history
 smb_shares/var/lib/dhcp/
 smb_shares/var/lib/dhcp/dhcpd.leases
 smb_shares/var/lib/dhcp/dhcpd.leases~
 smb_shares/var/log/debug
 smb_shares/var/log/mail
 smb_shares/var/log/messages
 smb_shares/var/log/rsync.log
 smb_shares/var/log/secure
 smb_shares/var/run/utmp
 smb_shares/var/spool/clientmqueue/
 smb_shares/var/spool/mail/
 smb_shares/var/spool/mail/root
 smb_shares/var/spool/mqueue/
 usr/local/samba/var/locks/
 usr/local/samba/var/locks/browse.dat
 
 Number of files: 169315
 Number of files transferred: 13
 Total file size: 1714847358 bytes
 Total transferred file size: 1013994 bytes
 Literal data: 30552 bytes
 Matched data: 983834 bytes
 File list size: 3438061
 Total bytes written: 3442643
 Total bytes read: 8794
 
 wrote 3442643 bytes  read 8794 bytes  9094.70 bytes/sec
 total size is 1714847358  speedup is 496.85
 dev/
 etc/cups/certs/
 etc/mail/
 root/
 smb_shares/var/lib/dhcp/
 smb_shares/var/log/
 smb_shares/var/run/
 smb_shares/var/spool/mail/
 usr/local/samba/var/locks/
 
 Any ideas?  Thanks!

The final output appears to be from the final directory permission fixup.
The child process on the receiving side generates the stats, then does
deletes, hardlinks and a final fix of the directory mtimes.

In 2.5.5 this output should be disabled, see line 290 of generator.c:

/* f_out is set to -1 when doing final directory
   permission and modification time repair */
if (set_perms(fname,file,NULL,0)  verbose  (f_out != -1))
rprintf(FINFO,%s/\n,fname);
return;

Are you running 2.5.5?

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync to 2000/NT servers?

2002-12-13 Thread Craig Barratt
 Watch out for pagefile.sys (i think!)... it's won't copy. (let me know about 
 any other's)

Most important files won't copy.  The registry files are locked and
can't be read by rsync/cygin (nor are they served by smb).

Similarly, the outlook.pst file used by Outlook (which contains all
the email, attachments, calendar and address book info of an outlook
user) is locked whenever outlook is open (which is most of the time).
Exchange databases, SQL databases will be locked too.  Any file open
by a windows app is likely locked too.

So you can get 99% of the files, but the 1% you miss are the most
critical.

 Now, can you think of a way to sync the win 2000 OS? (the WHOLE flippin' 
 system) so that if it were to go down one could restore the full installation
 (bootstraps, bootloader, ect!!?) by means of the rsync'ed backup.
 please? thank you. ;-)

I wish this was possible, but I don't know how to do this.
Commercial products use an OFM (open file manager) to allow
locked files to be accessed.  Products are sold by companies
like St. Bernard or Columbia Data Products.  Apparently Veritas and
Legato bundle this product with their commercial backup products.

See for example

http://www.stbernard.com/products/docs/ofm_whitepaperV8.pdf

What we need is an open source OFM that is compatible with rsync.
Then bare-metal WinXX recovery would be possible.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Rsync performance increase through buffering

2002-12-08 Thread Craig Barratt
I've been studying the read and write buffering in rsync and it turns
out most I/O is done just a couple of bytes at a time.  This means there
are lots of system calls, and also most network traffic comprises lots
of small packets.  The behavior is most extreme when sending/receiving
file deltas of identical files.

The main case where I/O is buffered is writes from the server (when
io multiplexing is on). These are usually buffered in 4092 byte
chunks with a 4 byte header. However, reading of these packets is
usually unbuffered, and writes from the client are generally not
buffered.  For example: when receiving 1st phase checksums (6 bytes
per block), 2 reads are done: one of 4 bytes and one of 2 bytes,
meaning there are 4 system calls (select/read/select/read) per 6
bytes of checksum data).

One cost of this is some performance, but a significant issue is that
unbuffered writes generate very short (and very many) ethernet packets,
which means the overhead is quite large on slow network connections.

The initial file_list writing is typically buffered, but reading it on
the client is not.

There are some other unneeded system calls:

  - One example is that show_progress() calls gettimeofday() even
if do_progress is not set.  show_progress() is called on every
block, so there is an extra system call per (700 byte) block.

  - Another example is that file_write writes each matching (700 byte)
block without buffering, so that's another system call per block.

To study this behavior I used rsync-2.5.6cvs and had a benchmark area
comprising around 7800 files of total size 530MB.

Here are some results doing sends and receives via rsyncd, all on the
same machine, with identical source and destination files.  In each
case --ignore-times (-I) is set, so that every file is processed:

  - Send test:
  
strace -f rsync -Ir . localhost::test | wc

shows there are about 2,488,775 system calls.

  - Receive test:

strace -f rsync -Ir localhost::test . | wc

shows there are about 1,615,931 system calls.

  - Rsyncd has a roughly similar numbers of system calls.

  - Send test from another machine (cygwin/WinXP laptop):

tcpdump port 873 | wc

shows there are about 701,111 ethernet packets (many of them only
have a 4 byte payload).

Since the source and dest files are the same, the send test only
wrote 1,738,797 bytes and read 2,139,848 bytes.

These results are similar to rsync 2.5.5.

Below is a patch to a few files that adds read and write buffering in
the places where the I/O was unbuffered, adds buffering to write_file()
and removes the unneeded gettimeofday() system call in show_progress().

The results with the patch are:

  - Send test: 46,835 system calls, versus 2,488,775.
  
  - Receive test: 138,367 system calls, versus 1,615,931.

  - Send test from another machine: 5,255 ethernet packets, versus 701,111.
If the tcp/ip/udp/802.3 per-packet overhead is around 60 bytes, that
means the base case transfers an extra 42MB of data, even though the
useful data is only around 2MB.

The absolute running time on the local rsyncd test isn't much different,
probably because the test is really disk io limited and system calls on
an unloaded linux system are pretty fast.

However, on a network test doing a send from cygwin/WinXP to rsyncd
on rh-linux the running time improves from about 700 seconds to 215
seconds (with a cpu load of around 17% versus 58%, if you believe
cygwin's cpu stats).  This is probably an extreme case since the system
call penalty in cygwin is high.  But I would suspect a significant
improvement is possible with a slow network connection, since a lot
less data is being sent.

Note also that without -I rsync is already very fast, since it skips
(most) files based on attributes.

With or without this patch the test suite passes except for
daemon-gzip-upload.  One risk of buffering is the potential for
a bug caused by a missing io_flush: deadlock is possible, so try
the patch at your own risk...

Craig

###
diff -bur rsync/fileio.c rsync-craig/fileio.c
--- rsync/fileio.c  Fri Jan 25 15:07:34 2002
+++ rsync-craig/fileio.cSat Dec  7 22:21:10 2002
@@ -76,7 +76,35 @@
int ret = 0;
 
if (!sparse_files) {
-   return write(f,buf,len);
+   static char *writeBuf;
+   static size_t writeBufSize;
+   static size_t writeBufCnt;
+
+   if ( !writeBuf ) {
+   writeBufSize = MAX_MAP_SIZE;
+   writeBufCnt  = 0;
+   writeBuf = (char*)malloc(MAX_MAP_SIZE);
+   if (!writeBuf) out_of_memory(write_file);
+   }
+   ret = len;
+   do {
+   if ( buf  writeBufCnt  writeBufSize ) {
+   size_t copyLen = len;
+   if ( copyLen  writeBufSize - writeBufCnt ) {

Re: rsync as a bakcup tool and the case of rotated logs

2002-11-28 Thread Craig Barratt
 1) have rsync understand that file names might have changed, maybe by
 comparing files through their md5 signature instead of by their name,
 that way rsync would see that /backup/syslog.198.gz is the same as
 /var/log/syslog.197.gz and not retransfer it,

The best choice is to rename the syslog files with a date, and don't
repeatedly rename them, eg: syslog.MMDD.gz (eg: syslog.20021128.gz).
Pruning old ones isn't too hard: simply reverse sort the names and
remove everything after the first 213.  It also makes it easier to
find a particular log file.

 2) create hard-links to identical files in
 --backup-dir=/backup/incremental-2002-11-27 when is detects that
 /server/sylog.138.gz is the actually the same as
 /backup/current/syslog.137.gz,

BackupPC is one package that does this; see http://backuppc.sourceforge.net. 
(disclaimer: I'm the author).  I'm in the process of adding rsync support.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: unexpected tag 90

2002-11-25 Thread Craig Barratt
 can anybody help?  what does tag 90 mean?

It looks like the sender and receiver are getting out of sync while
the file list is being sent.  The data sent in blocks.  Each block
starts with an 8 bit tag and a 24 bit length. The valid values of
the tag are 7,8,9,10.  Any other value (eg: 90) produces an error.
See read_unbuffered() in io.c.  Your strace shows:

read(5, sysa, 4)

This should be the tag and length, which is clearly wrong.  The 90
is 'a' - 7.

Beyond this, I don't know why this is happening.  One completely random
thought: what is the LANG setting in /etc/sysconfig/i18n on the client
machines?  If it is UTF-8 I would suggest trying it with en_US:

LANG=en_US

Other than that, I would suggest running gdb or adding debug statements
to see where it gets out of sync.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Speed problem

2002-11-11 Thread Craig Barratt
   You haven't really provided enough data to even guess what
   is limiting your performance.

How similar is the directory tree on the target (receiving)
machine?  There are three general possibilities:

  - It's empty.

  - It's present, and substantially similar to the sending end.

  - It's present, but substantially different to the sending end.

In the first case rsync should be i/o limited (disk or network).

In the second and third cases rsync could easily be cpu limited
on the sending end.  In the third case it could also be disk
(specifically seek) limited on the receiving end.  For example,
you might dump a large database to a binary file, whose content
(records) are similar, but the order might change dramatically.
This could take a huge number of seeks on the receiving machine
to rebuild the file, even though only a small amount of data is
transferred.

Unless I'm missing something, the behavior you observe could simply be
rsync hitting files (or directories) that are in the different
categories above.

I'd try adding the -v option and see if the slowdown always
happens on certain files.  Then try running rsync on just those
files.  If it is slow right away then maybe this explanation is
correct.  If it still goes fast, then slows down, then there
is something else going on.

As another test, run rsync to an empty target directory.  Rsync
should be i/o limited for the entire running time.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Rsync help

2002-10-29 Thread Craig Barratt
 SUN box, 2gig ram, hard drive space to spare. Rsync 2.5.5, solaris 5.7
 version 7.
 Half moon, I think it only seems to work on full moon nights.
 
 Here's the command I run as well  .
 /usr/local/bin/rsync --delete --partial -P -p -z -e /usr/local/bin/ssh /dir1
 systemname:/storage

 [snip]
 
  I get the following transering a large file use rsync over ssh.
  
  rootpbodb bin$ ./ausbk.sh
  building file list ... 
  10 files to consider 
  ERROR: out of memory in generate_sums 
  rsync: connection unexpectedly closed (8 bytes read so far) 
  rsync error: error in rsync protocol data stream (code 12) at io.c(150) 

How big are the files you are trying to rsync?  It is probably failing
here:

   if (verbose  3)
   rprintf(FINFO,count=%d rem=%d n=%d flength=%.0f\n,
   s-count,s-remainder,s-n,(double)s-flength);

   s-sums = (struct sum_buf *)malloc(sizeof(s-sums[0])*s-count);
   if (!s-sums) out_of_memory(generate_sums);

sizeof(s-sums[0]) is at least 32, and s-count is ceil() of file size
divided by the block size (default is 700).  So this malloc should be
around 5% of the largest file size (eg: approx 500MB for a 10GB file).
If VM is tight on your machine (you say it is intermittent) then this
might fail.

You could try - and see what the previous rprintf() shows --
unfortunately inside the loop below it also prints every checksum
when verbose  3 so you will get a huge amount of output; just
tailing the output should be enough.

A solution is to increase the block size (eg --block-size=4096), which
reduces the malloc() needs proportionally.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: ERROR: buffer overflow in receive_file_entry

2002-10-21 Thread Craig Barratt
 has anyone seen this error:
 
 ns1: /acct/peter rsync ns1.pad.com::acct
 overflow: flags=0xe8 l1=3 l2=20709376 lastname=.
 ERROR: buffer overflow in receive_file_entry
 rsync error: error allocating core memory buffers (code 22) at util.c(238)
 ns1: /acct/peter 

Either something is wrong with your setup or configuration or this
is a bug.  The packed file list data sent right at the start is
not being decoded correctly.  l1=3 means that 3 bytes of the full
name should be kept, but lastname = . is just a single character
long.  Also, l2=20709376 looks like ascii, not a small integer.
The flag value 0xe8 is maybe ok: long file name, same mtime, same
dir, same_uid.

It would be great if you could debug this further.  I would first
try to find a small set of files on which you get the error, then
add some debug prints to writefd_unbuffered() to print what the
sender is sending, and to read_unbuffered() to print what the
receiver is reading.  Then look for 0xe8 03 76 93 70 20 in the
output (byte reversed from the error), and see what is a little
before that.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Problem with checksum failing on large files

2002-10-14 Thread Craig Barratt

 I tried --block-size=4096  -c --block-size=4096 on 2 files (2.35 GB 
 2.71 GB)  still had the same problem - rsync still needed to do a second
 pass to successfully complete. These tests were between Solaris client  AIX
 server (both running rsync 2.5.5). 

Yes, for 2.35GB there is a 92% chance, on average, that it will fail
with 4096 byte blocks.

 As I mentioned in a previous note, a 900 MB file worked fine with just -c
 (but required -c to work on the first pass).
 
 I'm willing to try the fixed md4sum implementation, what do I need for
 this?

The fixed md4sum refers to some minor tweaks for block lengths of
64*n, plus files bigger than 512MB, to get correct md4 sums. But this
shouldn't make a difference for you.

Would you mind trying the following?  Build a new rsync (on both
sides, of course) with the initial csum_length set to, say 4,
instead of 2?  You will need to change it in two places in
checksum.c; an untested patch is below.  Note that this test
version is not compatible with standard rsync, so be sure to
remove the executables once you try them.

Craig

--- checksum.c  1999-10-25 15:04:09.0 -0700
+++ checksum.c.new  2002-10-14 09:40:34.0 -0700
 -19,7 +19,7 

 #include rsync.h

-int csum_length=2; /* initial value */
+int csum_length=4; /* initial value */

 #define CSUM_CHUNK 64

 -120,7 +120,7 
 void checksum_init(void)
 {
   if (remote_version = 14)
-csum_length = 2; /* adaptive */
+csum_length = 4; /* adaptive */
   else
 csum_length = SUM_LENGTH;
 }

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Problem with checksum failing on large files

2002-10-14 Thread Craig Barratt

  Would you mind trying the following?  Build a new rsync (on both
  sides, of course) with the initial csum_length set to, say 4,
  instead of 2?  You will need to change it in two places in
  checksum.c; an untested patch is below.  Note that this test
  version is not compatible with standard rsync, so be sure to
  remove the executables once you try them.
  
  Craig
 
 
 I changed csum_length=2 to csum_length=4 in checksum.c  this time rsync
 worked on the first pass for a 2.7 GB file.  

Cool!

 I'm assuming that this change forced rsync to use a longer checksum length
 on the first pass, what checksum was actually used?

Yes.  It's now using adler32 + first 4 bytes of MD4 (64 bits total)
for each block in the first pass, instead of adler32 + first 2 bytes
of MD4 (48 bits total).  With just two more bytes, the chance of first
pass failure for random files of size 2.3GB with 700 byte block goes
from more than 99% to 0.04%.

This is in addition to the earlier problem: the chance of two different
blocks of the old file having the same checksum goes from a couple of
percent to vanishingly small.

I agree with the earlier comments: checksum size is the key variable.
Block size is secondary.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Problem with checksum failing on large files

2002-10-12 Thread Craig Barratt
terry I'm having a problem with large files being rsync'd twice 
terry because of the checksum failing.

terry Is there a different checksum mechanism used on the second
terry pass (e.g., different length)?  If so, perhaps there is an
terry issue with large files for what is used by default for the
terry first pass?

The first pass block checksum is 48 bits: the 32 bit adler32 (rolling)
checksum, plus the first 2 bytes of the MD4 block checksum.  The second
pass is 160 bits: the same 32 bit adler32 (rolling) plus the entire 128
bit MD4 block checksum.

donovan I wonder if this is related to the rsync md4sum not producing
donovan correct md4sums for files larger than 512M?
donovan
donovan The existing rsync md4sum implementation does not produce
donovan md4sums the same as the RSA implementation for files larger
donovan than 512M... but I thought it was consistant with itself so
donovan this didn't affect anything.

I doubt this matters, for just the reason you mention: it is consistent
and statistically it is still well behaved, so it won't matter.

My theory is that this is expected behavior given the check sum size.
Now, 48 bits sounds like a lot.

Let's start with an analogy.  If I have 23 (randomly-selected) people
in a room, what is the probability that some pair of people have the
same birthday?  You might guess it is quite small, maybe 23/365.  But
that's wrong.  It's actually more than 50%.  The probability that 3
people have different birthdays is:

364/365 * 363/365.

Similarly, the probability that 23 people all have unique birthdays is

364/365 * 363/365 *  * 343/365,

which is less than 0.5 (50%).

So, back to our first pass checksum.  A 4GB file has 2^32 / 700 blocks.
(The blocks are like the people, each birthday is the checksum, and the
2^48 possible checksums are like the 365 days in the year.)  Let's assume
the 48 bit checksums are random.  What's the chance that two blocks have
the same checksum?  It sounds very unlikely, but the chance is around
6.5%.  For an 8GB file it's 23%.  In reality, the block checksums are
not completely random, so the real probabilities of a collision will
be higher.

If we increase the block size to 2048, the probabilities drop to
0.8% for a 4GB file and 3% for an 8GB file.  For a block size of
4096 we get 0.2% for a 4GB file and 0.8% for an 8GB file.

To test this theory, try a bigger --block-size (eg: 4096).  If you
still see a similar number of files needing a repeat then my theory
is wrong, and a bug could be the cause.

If the theory is supported by your tests (ie: most/all files work on the
first pass) then rsync could use an adaptive-length first pass checksum:
use one or two more bytes from the MD4 block checksum (ie: 56 or 64 bits
total) for files bigger than, say, 256MB and 2GB.  Both sides know the
file size.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: MD4 bug in rsync for lengths = 64 * n

2002-09-02 Thread Craig Barratt

 This is the first detailed description of the problem I've seen. I've heard
 it mentioned several times before, and thought that the md4 code in librsync
 was the same as in rsync. I've looked and tweaked the md4 code in librsync
 and could never see the bug so I thought it was a myth. I also thought that
 samba used this code I wonder what variant it is using :-)

Samba looks right to me.  Anyhow, I looked at the archives and found
this message, so I have simply rediscovered the same bug as Tridge:

http://www.mail-archive.com/rsync@lists.samba.org/msg03919.html

   The fix is easy: a couple of  checks should be =.  I can send
   diffs if you want.  But of course this can't be rolled in unless it
   is coupled with a bump in the protocol version.  
  
  Another bump in the protocol version is no problem.  Please submit a patch.
 
 I can submit patches if required for the md4code as tweaked/fixed for
 librsync. The fixed code is faster as well as correct :-)

Sure, that would be great.  Otherwise, I would be happy to recreate
and test a patch.

   email about fixing MD4 to handle files = 512MB (I presume this
   relates to the 64-bit bit count in the final block).  Perhaps this
   change can be made at the same time?
  
  Could you please post a reference to that email?  It isn't familiar to me
  and I didn't find it through google.  There have been other problems we've
  been seeing with with the end of large files and zlib compression, though.
  I wonder if it can somehow be related.
 
 It may not have been on the rsync list, but on the librsync list... Please
 note that there are several variants of the md4 patch floating around. I've
 been meaning to seperate the latest md4 patch from my bigger librsync delta
 refactor patch for some time.

I must be spacing.  I can't find the earlier post either.  And I also
can't find my original post in the archives...

Anyhow, the bug occurs for in the file MD4 digest for file lengths = 512MB.
Step 2 in the RFC for the MD4 algorithm specifies that the lower 64 bits
(not 32 bits) of the data's bit length is embedded in the tail buffer;
see:

http://www.faqs.org/rfcs/rfc1186.html

Both librsync and rsync use a 32 bit unsigned int for counting the
number of bytes processed.  This is then multiplied by 8 (to get
bits) and this is embedded in the tail buffer when MD4 finishes up.
So for files bigger than 4GB bits (512MB) the 32 bit unsigned int
overflows.  Again, a benign bug but a little disconcerting if you
are using another program to check MD4 digests of large files.

Craig
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html