Re: Don't follow bind mounts?
On 16/05/13 03:52, Carl Brewer wrote: Hello, The manual says that rsync treats bind mounts on UNIX (Linux) to the same filesystem as being on the same filesystem. I have a server with a pile of bind mounts to the same filesystem for some access control/ease of use for FTP users modifying websites. This makes my backups using rsync messy! Is there any way to stop rsync from following bind mounts to the same filesystem? Short of unmounting them all at backup time and remounting afterwards or explicitly excluding each one? Rsync uses the device id to know whether it has crossed a mount boundary. Since bind mounts have the same device ID, rsync does not know it has reached one. Most other tools (tar) have the same problem. My personal solution is to bind-mount the root of the file system to a neutral location, and rsync from there. So if I have /dev/sda17 mounted on /src/messy, with lots and lots of bind mounts (and other filesystems) inside it, I do mount --bind /srv/messy /tmp/backup, and then rsync from /tmp/backup. This also makes sure that I back up all directories hidden by other mounts done later. Shachar -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync in theory.
On 12/04/11 01:58, Evan Rempel wrote: I am looking at add some code to the rsync tool but want to know if I am totally out to lunch. I realize that my example is so trivial that I am sure I will get replies of don't do it that way, but bear in mind that it is just an example, and there are real world cases where I think this functionality would be useful. I am trying to figure out if rsync can do something like cat myfile.dat | rsync - remoteHost:/some/path/myfile.dat which would take a stream of data and send/store it onto the remote host. Not possible with rsync (as far as I know), but is possible with librsync (which is a completely different code base than rsync itself). My questions is more about can the rsync algorithm do this?. As far as I can tell, you need two passes on one end (either the receiving or the sending). There is no reason for the other end to be completely one pass. Technically, the question boils down to Is rsync a single pass algorithm, or is it a multi-pass algorithm? If it is a single pass algorithm then all is good. If it is a multi-pass algorithm then how big of a buffer does it need to perform the passes? The definition of one pass is can be performed with one reading of the file and a O(1) buffer. If you can answer the question, then it is, by definition, one pass. Namely, is it a block by block multi-pass, or is it a complete file/object multi-pass algorithm. Again, the question is meaningless. If you can apply an algorithm one block at a time, then it's one pass by definition. Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Optimizing RSync algorithm using techniques with Google used in courgette
Hasanat Kazmi wrote: Hi, I am student at LUMS SSE (http://cs.lums.edu.pk) and an active RSync user. Just few days ago, Google wrote about Courgette*: an algorithm which is specially written for syncing executables. By using Courgette, Google made diff size 1/10th of previous techniques used. I was wondering if this (or something on same lines) can be used to optimize RSync? I am senior and have to do a project. I am thinking to implement this in RSync. I need input from developers. What do you guys think? *http://dev.chromium.org/developers/design-documents/software-updates-courgette Hasanat Kazmi Hi Hasanat, Like you said in the subject, this is an optimization. A format specific optimization. In other words, it uses a known property of the file being synchronized in order to make the diff size smaller. If you were to try to use the courgette pre-processing on something which is not an executable, you would have gotten significantly worse results than merely running rsync. At the moment, for better or for worse, rsync does not do format specific optimizations. As long as that is the case, rsync cannot be optimized using this algorithm. Even if we (and by we, I mean Wayne, or anyone else brabe enough to pick this task up) were to implement such a functionality, I can think of quite a few things that would have a lot more to gain than executables. In particular, something that would uncompress both source and destination, and apply the rsync algorithm to both files, and then make sure the recompression of the target produces the exact same result would, IMHO, be much more useful than the change you are suggesting. Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync algorithm for large files
ehar...@lyricsemiconductors.com wrote: I thought rsync, would calculate checksums of large files that have changed timestamps or filesizes, and send only the chunks which changed. Is this not correct? My goal is to come up with a reasonable (fast and efficient) way for me to daily incrementally backup my Parallels virtual machine (a directory structure containing mostly small files, and one 20G file) I’m on OSX 10.5, using rsync 2.6.9, and the destination machine has the same versions. I configured ssh keys, and this is my result: Upgrade to rsync 3 at least. Rsync keeps a hash of the blocks of sliding hashes. For older versions of rsync, the has was of a constant size. This meant that files over 3GB in size had a high chance of hash collisions. For a 20G file, the collisions alone might be the cause of your trouble. Newer rsyncs detect when the hash gets too big, and increase the has size accordingly, thus avoiding the collisions. In other words - upgrade both sides (but specifically the sender). Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Problems while transferring big files
Wayne Davison wrote: We hypothesize that there can be an accidental match in the checksum data, which would cause the two sides to put different streams of data into their gzip compression algorithm, and eventually get out of sync and blow up. If you have a repeatable case of a new file overwriting an existing file that always fails, and if you can share the files, make them available somehow (e.g. put them on a web server) and send the list (or me) an email on how to grab them, and we can run some tests. If the above is the cause of the error, running without -z should indeed avoid the issue. If I understand the scenario you describe correctly, won't running without -z will merely cause actual undetected data corruption? Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rssync source code as a windows project
Jignesh Shah wrote: Hi Friends, I have started learning rsync source code but I am finding very difficult to go back and forth to find the execution flow. I could see that rsync code is written in UNIX and the compilation is difficult. Does anybody converted it into Windows Project so that we can open in using Visual Studio IDE and it will be very simple to search for some function and find the complete work flow. Thanks, Jignesh RTFM ctags and cscope, or just create a project and put the sources into it (not that I think the later will do you much good). rant Personally, I find VS's cross reference to have deteriorated considerably over the versions. VS6 had a cross reference that was tied to the compiler's symbol tables. This worked excellent, as no amount of preprocessor trickery would fool it. I much preferred it to ctags. Somewhere between version 6 and version 9, MS switched to Intellisense for cross referencing. My guess is that the VS6 version wouldn't cross reference a project unless it could compile it, and people (or at least MS's sales people) complained. As a result, the cross reference is much less accurate and error prone, and I no longer see any advantage for it over ctags and other tools available for Linux and Posix platforms. /rant Shachar -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rssync source code as a windows project
Jignesh Shah wrote: Thanks for reply. Could you tell what do you mean by RTFM ctags and cscope,?? RTFM - Read The Manual ctags and cscope - utilities whose manual I think you should read. Creating a new project I think it will have so many errors. We can do it only if we know the complete code. If anybody or you have done then please forward it to me. rsync is a POSIX application. It will not compile natively on Windows without a considerable porting effort. No such port exists. If you only want VC to trace the function flow, it should be able to do that without compiling the code (see my rant above). If you want the code to compile on VC, I suggest you do the porting. Shachar -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Why is -e sent to the remote rsync side?
Matt McCutchen wrote: On Mon, 2008-10-06 at 18:01 +0200, Shachar Shemesh wrote: Personally, and this is not something that any shell can solve, I would love for a way to limit the files that the --server side rsync allows access to. It's called an rsync daemon. It can be invoked over ssh; the command to force in the authorized_keys file is rsync --server --daemon . . Matt Just to save others from going over the man page looking for how to cause the client side to do this - you say use a daemon (i.e. - specify the remote side using ::) but also give the -e option. Thanks, Matt and Wayne. You've been a great help. Shachar -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Backup Microsoft Exchange
Steve Zemlicka wrote: Thanks Julian and Brad, I will give ntbackup a shot. I've used rsyncrypto but I'm not a huge fan. Off topic, but as the author I'd love to hear why. I don't need the files to be encrypted except during transit which can be done with just rsync, right? Yes. Do rsync over ssh or run a daemon over SSL. Rsyncrypto is not needed for in-transit encryption, only for storage encryption. Also, whether you use rsyncrypto or not, you can delete the temporary files (the ntbackup export and the rsyncrypto encrypted file) after you rsync them. They will be created again when you repeat the operation. If you are using rsyncrypto, make sure to not delete the symmetric key file (68 bytes), so that the result will be rsyncable. Shachar -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Why is -e sent to the remote rsync side?
Wayne Davison wrote: On Sun, Oct 05, 2008 at 06:47:47AM +0200, Shachar Shemesh wrote: The reason this is brought up is because I'm using rssh (http://www.pizzashack.org/rssh/) as the user's shell to limit that user to only be allowed to run rsync. I looked at the source, and created a patch to make it just require the --server option as the first option. While I was looking at the code, I noticed that the check_command() function was busted in that it would accept any abbreviated path of a command (e.g. /usr/bin/rs would match /usr/bin/rsync). The author apparently didn't know that strncmp() stops at a null (unlike memcmp()), so the length-trimming that is done can just be removed. My patch fixes that too. Last I talked to the rssh maintainer (about a couple of years ago) I was so frustrated with the attitude that I decided to only use rssh until I knock something better together myself. He (used to) care about scp and sftp, and little else. You can send the patch over, if you're feeling lucky. I doubt I'll bother. The only reason I brought the question up was that if I am going to be writing something myself, I would need to know what to make it enforce. Personally, and this is not something that any shell can solve, I would love for a way to limit the files that the --server side rsync allows access to. I can then use a custom shell to pass that command line to rsync to ensure it's enforced. ..wayne.. Shachar -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Why is -e sent to the remote rsync side?
So, I've done some RTFS, and this is what I've got. I'd still love it if Wayne could confirm that my understanding of the source is correct. Shachar Shemesh wrote: So my questions: 1. Why does rsync need to pass -e to the remote side? After all, the connection is already established at that point. -e when combined with --server means something different than it does normally. With --server it is a means for the client to hand over to the server the options and command lines it received itself (hard links, symbolic link processing etc.) as well as the protocol version used. 2. What does this -e mean? What causes the remote side to really not run anything (trying to run .L from the path would be the way I would interpret the command at that point - obviously rsync disagrees :-) The . means protocol 3.0 (with explicit numbers for other numbers. i.e. - protocol version 3.1 will be listed as 3.1. The current code says protocol 4.0 will also be listed as ., but I'm fairly sure that's just a bug that has not manifested yet). The L means LUTIMES support. The thing I would like Wayne to confirm is that if the --server option is given, the -e option will never cause an application to be run, and should thus not be considered dangerous. Thanks, Shachar -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Why is -e sent to the remote rsync side?
$ rsync -e 'ssh -v' lingnu.com: OpenSSH_5.1p1 Debian-2, OpenSSL 0.9.8g 19 Oct 2007 debug1: Reading configuration data /etc/ssh/ssh_config debug1: Applying options for * debug1: Connecting to lingnu.com [199.203.56.105] port 22. debug1: Connection established. ... debug1: Sending command: rsync --server --sender -de.L . As we can see, rsync runs ssh, and tells it to run, on the other side, rsync with the -e flag. I am not really sure what and how the . and L are parsed by rsync (part of my problem). The reason this is brought up is because I'm using rssh (http://www.pizzashack.org/rssh/) as the user's shell to limit that user to only be allowed to run rsync. Rssh, however, prevent the passing of the -e option to rsync, as it claims (with some amount of justification) that this option allows someone to cause rsync to run any command at all, escaping the limitations imposed by rssh. So my questions: 1. Why does rsync need to pass -e to the remote side? After all, the connection is already established at that point. 2. What does this -e mean? What causes the remote side to really not run anything (trying to run .L from the path would be the way I would interpret the command at that point - obviously rsync disagrees :-) Thanks, Shachar -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: INCLUDE/EXCLUDE PATTERN RULES problem on MAC OS
Matt McCutchen wrote: (since rsync does a binary comparison). rsync as well as the Unix kernel, typically. I have implemented i18n support in several programs before, I am working on a draft for BiDi text editing, and I had to look up what decomposition means. If that's the case, I doubt we can trust a sane (vs., e.g., myself) user to get it right. I think the right thing is for rsync to have, #ifdefed on a Mac build, a decomposition algorithm for the exclude/include files. Please note that while HFS insists on decomposed characters, there is no such requirement from plain text files. For all we know, a file typed on Mac may still fail to match the file names. Shachar -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Can the rsync password be automated?
Shane Uys wrote: Is there a way to automate the rsync password or maybe disable? I am currently running rsync from a Windows command prompt and would like to run it from a .bat file. I have read through the config man pages but not sure if my ssh_config file is even being used. I tried passwordauthentication = no but it still asked for password. I have seen a option for --password-file= but I believe this does not apply in that I’m using “ssh” instead of daemon. I am using copssh and cwrsync on two Windows 2003 servers over the internet. Here is the command line used that transfers a single file. Rsync –e “ssh” file1.x [EMAIL PROTECTED]: followed by the password prompt. Thanks, Shane The official and recommended way of solving this issue is to perform public key authentication with the ssh server. You are right that the --password-file option does not work when running rsync over ssh. Public key authentication solves your problem, and does not significantly reduce the security of your system. There is another option, but only go that route if you have tried setting up public key authentication and failed for a reason over which you have no control. If your server supports public key authentication, do not continue to the next option. Only consider it if the administrator for the server to which you want to connect has disabled public key authentication and cannot be persuaded to change her mind. There is a tool called sshpass. It is available at http://sourceforge.net/projects/sshpass/. Read about it at http://www.debianadmin.com/sshpass-non-interactive-ssh-password-authentication.html Shachar -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: memory usage in rsync 3.0.3 -- how much RAM should I have to transfer 13 million files?
Aleksey Tsalolikhin wrote: I've upgraded from rsync 2.6.9 to 3.0.3 on both ends, but memory usage is still too high. Why should rsync 3's memory usage depend on the number of files? Does it keep files it already knows should not be transferred in memory? If not, then maybe we should hold back rsync's very useful, very speed productive, read ahead of the file list. If we see that the todo list piles up, maybe we should hold of the continued scan until the back log gets smaller. Yes, I know, it's the typical someone sitting on the fence, hardly ever doing anything useful for the project, and dispensing invaluable advice. Fact is, I need this. If Wayne doesn't do it, I will get around to it eventually. The problem is that the key word here is eventually. Shachar -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Large file - match process taking days
Rob Bosch wrote: I've been trying to figure out why some large files are taking a long time to rsync (80GB file). With this file, the match process is taking days. I've added logging to verbose level 4. The output from match.c is at the point where it is writing out the potential match at message. In a 9 hour period the match verbiage has changed from: Can you tell where the bottleneck is? Is it on the sender's CPU? The receiver's? The network? Local IO on either sides? I believe this means that 4.8GB of the file has been processed in this 9 hour period? Blocksize is currently manually set at 1149728, 4 times the default value. Rsync does have some CPU inefficient behavior for especially large files. However, it should not happen at the block size you are using (assuming the files are fairly identical). Try increasing it a little further, to 1638400 (80% utilization on the hash table), and see if things are any better. Are the files fairly identical? Shachar -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Large file - match process taking days
Rob Bosch wrote: The files are very similar, a maximum of about 5GB of data differences over 80GB. The CPU on both sides is low (3-5 percent) and the memory usage is low (11MB on the client, not sure on the server). Full rsync options are: -ruityz --partial --partial-dir=.rsync-partial --links --ignore-case --preallocate --ignore-errors --stats --del --block-size=1149728 -I I'm using the -I option to force a full sync since date/time changes on database files is not a reliable measure of changes. I'll try the block-size at 1638400 although I have not seen a big change in moving it from about 287000 (default square root) to 1149728. You wouldn't. If CPU utilization is low, this is not the problem. What about network utilization? What does ntop have to say? What about disk utilization? I'm not sure what the best way to measure it would be (though munin does a good job of it) Shachar -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Update the rsyncyrypto home page
Hi Wayne, or whoever it is that manages the rsync web site The rsync resources (http://samba.anu.edu.au/rsync/resources.html) points to a project of mine, rsyncrypto, as a rsync friendly encryption. Rsyncrypto now has a proper home page, and I would appreciate it if the link could be updated. The new address is http://rsyncrypto.lingnu.com. Thanks, Shachar -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: compression of source and target files
Kenneth Simpson wrote: Chuck Wolber wrote: On Fri, 21 Sep 2007, Kenneth Simpson wrote: Hi - there's a flag for rsync to compress the files in transit - is it possible to compress one side (target) with gzip and have rsync still work correctly? It'll still work correctly, but compressing a compressed file can actually make it slightly bigger and wastes CPU cycles in the process. ..Chuck.. Sorry, I neglected to mention the source is uncompressed but we need to compress the target file because we're running out of disk space and the files are highly compressible. We can't compress the source since the files are large and compressing the source would create other problems. The original thought was to use a file system with compression (I think Linux has such a beast) but this would at least require a kernel rebuild which we won't be able to do for awhile. The second thought was that we might be able to gzip on the fly and have rsync work correctly (since it's compressing them in transit.) gzip, as is, will destroy rsync's ability to sync partial file changes. Gzip does have, however, a patch that adds a rsyncable option to the command line, that makes the compressed output rsync ready. The only problem I see with your suggestion is that, as far as I know, rsync cannot sync a stream of data to a file. Do have a look at librsync, however, which reportedly can do that. Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Extremely poor rsync performance on very large files (near 100GB and larger)
Evan Harris wrote: Would it make more sense just to make rsync pick a more sane blocksize for very large files? I say that without knowing how rsync selects the blocksize, but I'm assuming that if a 65k entry hash table is getting overloaded, it must be using something way too small. rsync picks a block size that is the square root of the file size. As I didn't write this code, I can safely say that it seems like a very good compromise between too small block sizes (too many hash lookups) and too large blocksizes (decreased chance of finding matches). Should it be scaling the blocksize with a power-of-2 algorithm rather than the hash table (based on filesize)? If Wayne intends to make the hash size a power of 2, maybe selecting block sizes that are smaller will make sense. We'll see how 3.0 comes along. I haven't tested to see if that would work. Will -B accept a value of something large like 16meg? It should. That's about 10 times the block size you need in order to not overflow the hash table, though, so a block size of 2MB would seem more appropriate to me for a file size of 100GB. At my data rates, that's about a half a second of network bandwidth, and seems entirely reasonable. Evan I would just like to note that since I submitted the large hash table patch, I have seen no feedback on anyone actually testing it. If you can compile a patched rsync and report how it goes, that would be very valuable to me. Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Compressed destination files
Matt McCutchen wrote: Currently, the only way to make rsync do this is with the experimental patch source-filter_dest-filter.diff, which is distributed in patches/ in the rsync source package. If you compile a custom version of rsync containing this patch, you can specify bzip2 as the source or destination filter. Read the top of the patch for more information. The patch is only a first attempt, so you might not want to trust it with your backups yet. Matt Just one more important note. If you are using rsync over the wire (as opposed to synching local folders), gzip with rsyncable is preferable to bzip, as it does not obliterate rsync's wire efficiency. Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Saving ownership as non-root
Paul Slootman wrote: Hence although it would look like you could use rsync to backup device nodes and so on via fakeroot, as soon as the fakeroot session is ended, the information is gone. There is some support for persistent storage of the fake info, but that's not perfect; I wouldn't rely on it for _my_ backups. I, obviously, cannot argue with what you will or will not do for your backups. For one of my projects, I created a wrapper around fakeroot that makes it persistent, and even allows it to be used by several independently launched processes simultaneously. The script is far from perfect, and needs lots of tweaking UI wise. The main problem is that it takes the directory from which the fake script was launched as an indication where to store the persistent state. Otherwise, it seems to work fairly flawlessly for me. The script is attached to this mail. I took the liberty of CCing fakeroot's author on this mail, to notify him of the existence of this thread. Additionally it would be a nice idea to refer to fakeroot from the rsync manual. - It took me a day to find that out. And am still looking for alternatives... Anyone? the mention in the manual would have to be pretty explicit about the caveats. There is only two caveat that I encountered (aside from the obvious one, that files do not look right when viewed not from within fakeroot). This is after fairly extensive use of fakeroot, quite outside its original intended use pattern. The first is that the killfaked script must be run in order for the state information to be stored to disk. This is not a major problem, usually, as all killfaked does is to kill the faked daemon gracefully. Any normal session exit will, effectively, do the same. We can also rig the script used for rsync to make sure the state is stored at the end of the rsync session. The second is that a directory handled by fakeroot must not be manipulated without it, or strange things will happen. Simple moves and renames inside the directory structure are currently ok, but any permission change, as well as files being deleted or created, may result in extremely strange looking files. Paul Slootman Shachar #!/bin/sh # Run fakeroot with persistent storage of information # Copyright (C) 2005, 2006 by Shachar Shemesh # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # #This program is distributed in the hope that it will be useful, #but WITHOUT ANY WARRANTY; without even the implied warranty of #MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the #GNU General Public License for more details. # #You should have received a copy of the GNU General Public License #along with this program; if not, write to the Free Software #Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA # # $Id: fake,v 1.7 2005/09/20 15:15:15 sun Exp $ set -e statedir=`dirname $0` statefile=$statedir/.fakerootenv keyfile=/tmp/fakedkey_`whoami`_`ls -id $statedir | cut -d ' ' -f 1` if [ ! -f $keyfile ] || [ ! -d /proc/`cut -d : -f 2 $keyfile` ] || ! ( readlink /proc/`cut -d : -f 2 $keyfile`/exe | grep -q '/faked-sysv$' ) then echo Starting fakeroot daemon touch $statefile /usr/bin/faked-sysv --save-file $statefile --load $statefile $keyfile fi FAKEROOTKEY=`cut -d: -f1 $keyfile` LD_LIBRARY_PATH=/usr/lib/libfakeroot LD_PRELOAD=libfakeroot-sysv.so.0 export FAKEROOTKEY LD_LIBRARY_PATH LD_PRELOAD exec $@ #!/bin/sh # Kill the persistent faked daemon created by previous calls to fake, and save the persistent data # Copyright (C) 2005, 2006 by Shachar Shemesh # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # #This program is distributed in the hope that it will be useful, #but WITHOUT ANY WARRANTY; without even the implied warranty of #MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the #GNU General Public License for more details. # #You should have received a copy of the GNU General Public License #along with this program; if not, write to the Free Software #Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA # # $Id: killfaked,v 1.4 2005/09/19 15:36:06 sun Exp $ set -e statedir=`dirname $0` keyfile=/tmp/fakedkey_`whoami`_`ls -id $statedir | cut -d ' ' -f 1` if [ -f $keyfile ] [ -d /proc/`cut -d : -f 2 $keyfile` ] ( readlink /proc/`cut -d : -f 2 $keyfile`/exe | grep -q '/faked-sysv$' ) then kill `cut -d : -f 2 $keyfile` rm $keyfile else echo faked not running rm -f $keyfile fi -- To unsubscribe or change options
Re: Data Encryption
Brad Farrell wrote: Hi there Is there a way with rsync to encrypt data at the source before transmitting? Not talking about the actually transmission, but the data itself. I’ve got a few department heads that want their data secured before it leaves their computer so that no one in the office can access the data except for them. Rsync does not encrypt the files in a way that is impossible for the receiving machine to decrypt. There is no way (that I know) to integrate that seemlessly into the process. What you can do, however, is encrypt the files, and then run rsync on the encrypted result. Touting my own horn here, have a look at rsyncrypto (http://sf.net/projects/rsyncrypto) for an encryption scheme that does not totally destroy rsync's wire efficiency. Thanks. Brad Farrell Brevell Consulting ph: 403-279-6380 fx: 403-568-2112 Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsync 4TB datafiles...?
lsk wrote: Hello Shachar...is 2.6.7 is the latest version of rsync. I could see in the http download site it says rsync-2.6.8.tar.gz. Should I get this version 2.6.8 + the patch dynamic_hash.diff. Yes. In the over a month that passed since the email I sent a new version of rsync was released :-) Dynamic_hash.diff is available in that one too. Also I am planning to install in only the sending machine...and first try out. Should work. Thanks for your feedback. lsk. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Encryption
Julian Pace Ross wrote: Thanks everyone for your feedback. Seems to me that Alex explained the issue with this perfectly. I'm afraid that Alex's explanation does not take into account rsyncrypto's algorithm. If you encrypt two versions of a file, changed in the first bit of the file between them, using rsyncrypto, they will start out totally different. However, some time into the file (between 4KB and 16KB, depending on several factors) the files will resume to be identical, thus allowing rsync to work on them efficiently. I downloaded it and spent a few minutes trying to make it work, but I didnt manage yet. (docmentation is a bit terse). The man page for the latest version has examples designed to get you started as fast as possible. I'll grant you that there is no easy way to read the manual page if you are on Windows, though. Assuming that it works fine, and that it encrypts only changed files (thus addressing to some extent the scalability issue mentioned by Alex), this would pretty much solve the problem, assuming that one has enough harddisk space on the client side for an encrypted copy of the data to be backed up. Yes, you do need a second copy on the client side. The files are compressed prior to being encrypted, so it is, hopefully, not as big as the original. However I'm worried that rsyncrypto, although a great idea, is very much a work in progress and still shaky... I may be wrong...Anyone used it? Well, I do, obviously (I'm the one who wrote it, after all). I think the technology is fairly sound at this stage. There are still features I'd like to see implemented, as well as various optimizations. Let's put it this way. My company (http://lingnu.com) bases a commercial backup service on this technology. I would be tempted to try and merge the rsyncrypto source within rsync and add a command line argument... that would be idealoh well just a thought... Others have tried before you. They tried to pipe the rsyncrypto output to librsync based program that does a pipe rsync. At the moment, rsyncrypto cannot write the output file in a one pass way, which means its output cannot be piped. This may be solveable, but I have not gotten around to it just yet. There are more pressing issues I would like addressed with it first. Patches are, always, welcome. Cheers Julian Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Help -- rsync Causing High Load Averages
Matt McCutchen wrote: On Tue, 2006-03-28 at 16:58 -0800, Plugger wrote: We have a server with about 400GB of data that we are trying to backup with rsync. [...] When it runs, however, the load averages on the content1 server continue to grow to the 100s, bringing the server to a practical standstill. If your individual files are larger than a gigabyte or so, Shachar Shemesh's dynamic hash patch may improve performance significantly. I recommend you try an rsync with that patch. To build one, extract the rsync source package, run patch -p1 patches/dynamic_hash.diff, configure, and make. Just a reminder to everyone that we are still looking for feedback on whether it is, indeed, effective. If you compiled rsync with the dynamic_hash patch, and it indeed reduced the load (or if it didn't), please do report it here. Thanks Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: AIX 5.1 rsync large file
[EMAIL PROTECTED] wrote: Thank you for your response. I compiled rsync 2.6.7 and installed in and that did the trick. I don't know if it had the dynamic_hash patch or not. If you did not manually apply it, it did not. But I think that I was too impatient previously and the 2.6.4 would have worked had I not killed it. A benchmark would be greatly appreciated. Can you please try compiling another version of rsync 2.6.7 after you ran the following command from the source root: patch -p1 patches/dynamic_hash.diff And then tell us how the two versions compared? Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsync 4TB datafiles...?
lsk wrote: Also I use the rsync version rsync version 2.6.5 protocol version 29 does this version include this patch dynamic_hash.diff or do we need to install it seperately. Sorry. You will need to get the 2.6.7 sources, and then apply the patch yourself and compile rsync. Please do report back here your results. This patch is a result of a lot of theoretical work, but we never got any actual feedback on it. Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsync 4TB datafiles...?
lsk wrote: But I have tried various options including --inplace,--no-whole-file etc., for last few weeks but all the results show me removing the destination server oracle datafiles and after that doing an rsync -vz from source is faster than copying(rsyncing) over the old files that are present in destination. Please do try applying the patch in patches/dynamic_hash.diff to both sides (well, it's probably only necessary for the sending machine, but no matter) and making this check again. This patch is meant to address precisely your predicament. Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: So what to do with Unicode filenames?
Stuart Halliday wrote: An alternative would be to zip the offending files first and name the zip file something safe, use rsync to transport them and unzip them at the other end? If you want to maintain rsync's network efficiency, don't use Zip. Rather, use tar+gzip that has the --rsyncable patch. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: So what to do with Unicode filenames?
Stuart Halliday wrote: As long as each machine is set to its own correct default language correctly then there isn't a problem I'm aware of. But that's exactly what Georgy is complaining about. No amount of default locale tricks will help you if some of your files are in Spanish and others are in Hebrew. If there was a way to get the file names in UTF-8, you could use rsync still, but it seems that there is no way to do it. Pitty, really. Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Question about rsync and BIG mirror
Jamie Lokier wrote: Hmm. My home directory, on my laptop (a mere 60GB disk), does contain millions of files, and it takes about 20 minutes to build the list on a good day. 100Mbps network, but it's I/O bound not network bound. It looks a lot like the number of files is more significant than the amount of data at this scale. In fact, I know of at least one place where they don't use rsync because they don't have enough RAM+SWAP to hold the list of files in memory. As far as future directions for rsync, I think this is the major place where rsync needs to become better. Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Question about rsync and BIG mirror
Jamie Lokier wrote: While you're there, one little trick I've found that speeds up scanning large directory hierarchies is to stat() or open() entries in inode-number order. For some filesystems it makes no difference, but for others it reduces the average disk seek time as on many common filesystems, inode number is related to position on the disk. In unusual cases I've seen a factor of 10 improvement, but usually it's just 1-2. The way I see it, if you got that far, then you don't have any problem with the size of the file list. -- Jamie -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Question about rsync and BIG mirror
[EMAIL PROTECTED] wrote: Hello, So: each night, from 0:00am to maximum 7:00am, the server will have to check the 100Go of files and see what files have been modified, then, upload them to the clients. Each file is around 4MB to 40MB in average. Are the clients what you call the mirror? Are there several of them? I would like to know your opinion about this situation: - Should I setup a strong dual CPU computer dedicated to calculate this whole stuff? That depends. - What about the memory I should install? - Is there any bandwidth used during the checksums computation? Mine is quite limited. Is that 2 mega BYTE per second or 2 mega BIT per second? - I know the client computer will have to check files too; Disk I/O will be the most used. I think this computer will have NFS mount from a datacenter computer with a GB LAN card, I wonder it will be enough... Scanning 100GB of data in 7 hours doesn't require that much a disk bandwidth. I'm quite scared of the amount of data to check before synchronise clients, and how long it will take. To finish shortly, what do YOU think? Any advices? Here are a few performance characteristics of rsync I think you should be aware of: - By default, rsync only checks files that are different between receiver and sender in timestamp or size. If most files in your archive did not change at all, you can discard them altogether from your bandwidth calculations. - The receiver only does a linear scan of the file, followed by generating a second file (which MAY require random access of the first file, if blocks in the file changed order). It's CPU performance requirements are negligible. This is bad for the case where you have one mirror source sending out info to many mirrors, as all the CPU load falls on the single server. - If your bandwidth is 2 mega BIT per second, you are a bit marginal as far as transferring 5GB of data in 7 hours. This has nothing to do with rsync, though. A simple calculation can show you the same result. Getting full bandwidth for the entire 7 hours will allow you to transfer 6 GB of data. Thanks, Johan -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Dynamic hash table size (with static has load)
Wayne Davison wrote: http://rsync.samba.org/ftp/unpacked/rsync/patches/dynamic_hash.diff A line of credit would have been nice :-) One thing this patch does is to (1) leave the array allocated to its largest size, (2) use realloc() if we need to make it bigger, (3) make the minimum hash-table size 65537 (a prime). Some of these decisions are debatable: The 1st item makes us more efficient in our malloc calls when sending large files, but could waste sender-side memory when transferring a single large file in the middle of a bunch of normal-sized files. With a minimal struct size of 65537, I doubt we would have too many reallocations. After all, a file needs to be over about 2.5 GB for us to need a larger array. Such files are rare enough, and the time it takes to allocate the array is fairly insignificant compared to their total handling time, that I think we can save the memory when handling smaller files. As for point 2 - isn't realloc potentially less efficient than just malloc if we intend to erase the array's content anyways? The 3rd item might be a bit over the top, but we used to always allocate a tag array of 65536 elements, and since I noticed some hash collisions occurred in small files using a hash-table size of 11 items, I figured it would be an acceptable overhead to make normal-sized files much more likely to have no collisions at all. Personally, I agree that a minimum is a good idea. Comments welcomed. Thanks again for your patch! ..wayne.. Ok. Here are a few comments: 1. I guess it's a matter of taste, but when you want to make sure a type has enough states to keep count of the number of elements in an array, I prefer using size_t to int32. It's more upwards compatible. 2. In sum_buf, sum1 is defined to be unsigned. It seems dangerous to me to hash it into a signed index, even if it's almost guarenteed to be ok. I'm attaching my proposed patch, incorporating all above comments. I also did some style suggestions (use sizeof(sum_table[0]) instead of sizeof(int32), initialize the chain element to -1 instead of 0, as that's the null value). Shachar ? .sender.c.swp ? dynamic_hash.patch ? patches/.dynamic_hash.diff.swp Index: match.c === RCS file: /cvsroot/rsync/match.c,v retrieving revision 1.78 diff -u -r1.78 match.c --- match.c 24 Feb 2006 16:43:44 - 1.78 +++ match.c 26 Feb 2006 08:31:40 - @@ -26,11 +26,6 @@ int updating_basis_file; -typedef unsigned short tag; - -#define TABLESIZE (116) -#define NULL_TAG (-1) - static int false_alarms; static int tag_hits; static int matches; @@ -42,47 +37,39 @@ extern struct stats stats; -struct target { - tag t; - int32 i; -}; - -static struct target *targets; - -static int32 *tag_table; - -#define gettag2(s1,s2) (((s1) + (s2)) 0x) -#define gettag(sum) gettag2((sum)0x,(sum)16) - -static int compare_targets(struct target *t1,struct target *t2) -{ - return (int)t1-t - (int)t2-t; -} +static size_t tablesize; +static int32 *sum_table; +#define gettag2(s1,s2) gettag((s1) + ((s2)16)) +#define gettag(sum) ((sum)%tablesize) static void build_hash_table(struct sum_struct *s) { int32 i; + uint32 t; + size_t tablealloc=tablesize; - if (!tag_table) - tag_table = new_array(int32, TABLESIZE); + /* Dynamically calculate the hash table size so that the hash load + * is always about 80%. This number must be odd or s2 will not be + * able to span the entire set. */ + + tablesize = (s-count/8) * 10 + 11; + if (tablesize 65537) + tablesize = 65537; /* a prime number */ + if (tablesize != tablealloc) { + free (sum_table); + sum_table = new_array(sum_table, uint32, tablesize); + if (!sum_table) + out_of_memory(build_hash_table); + } - targets = new_array(struct target, s-count); - if (!tag_table || !targets) - out_of_memory(build_hash_table); + memset(sum_table, 0xFF, tablesize * sizeof (sum_table[0])); for (i = 0; i s-count; i++) { - targets[i].i = i; - targets[i].t = gettag(s-sums[i].sum1); + t = gettag(s-sums[i].sum1); + s-sums[i].chain = sum_table[t]; + sum_table[t] = i; } - - qsort(targets,s-count,sizeof(targets[0]),(int (*)())compare_targets); - - for (i = 0; i TABLESIZE; i++) - tag_table[i] = NULL_TAG; - - for (i = s-count; i-- 0; ) - tag_table[targets[i].t] = i; } @@ -176,20 +163,17 @@ } do { - tag t = gettag2(s1,s2); + int32 i; + size_t t = gettag2(s1,s2); int done_csum2 = 0; - int32 j = tag_table[t]; if (verbose 4) rprintf(FINFO,offset=%.0f sum=%08x\n,(double)offset,sum); - if (j == NULL_TAG) - goto null_tag; - sum = (s1 0x) | (s2 16); tag_hits++; - do { - int32 l, i = targets[j].i; + for (i = sum_table[t]; i = 0; i = s-sums[i].chain) { + int32 l; if (sum != s-sums[i].sum1) continue; @@ -205,9 +189,10 @@ !(s-sums[i].flags SUMFLG_SAME_OFFSET)) continue; - if (verbose 3) -rprintf(FINFO,potential match at %.0f
Dynamic hash table size (with static has load)
Hi list, and Wayne in particular, It was almost a year since we had the discussion (with http://lists.samba.org/archive/rsync/2005-March/011875.html as it's conclusion) regarding chances for hash collisions and large files. As now we have someone asking about synching 5TB files, I decided to actually submit a patch. Attached is a patch that uses a non-predetermined hash table size, so that the hash cell load (alpha) is never more than 80%. As far as my understanding of rsync goes, this requires no change in the rsync protocol. Comments welcome, Shachar ? .match.c.swp ? dynamic_hash.patch Index: match.c === RCS file: /cvsroot/rsync/match.c,v retrieving revision 1.78 diff -u -r1.78 match.c --- match.c 24 Feb 2006 16:43:44 - 1.78 +++ match.c 25 Feb 2006 11:22:12 - @@ -28,7 +28,6 @@ typedef unsigned short tag; -#define TABLESIZE (116) #define NULL_TAG (-1) static int false_alarms; @@ -49,10 +48,11 @@ static struct target *targets; +static size_t tablesize; static int32 *tag_table; -#define gettag2(s1,s2) (((s1) + (s2)) 0x) -#define gettag(sum) gettag2((sum)0x,(sum)16) +#define gettag2(s1,s2) gettag((s1) + ((s2)16)) +#define gettag(sum) ((sum)%tablesize) static int compare_targets(struct target *t1,struct target *t2) { @@ -64,8 +64,14 @@ { int32 i; - if (!tag_table) - tag_table = new_array(int32, TABLESIZE); + /* Dynamically calculate the hash table size so that the hash load + * is always about 80%. + * See http://lists.samba.org/archive/rsync/2005-March/011875.html + */ + tablesize=(s-count/8)*10+11; + + free(tag_table); + tag_table = new_array(int32, tablesize); targets = new_array(struct target, s-count); if (!tag_table || !targets) @@ -78,7 +84,7 @@ qsort(targets,s-count,sizeof(targets[0]),(int (*)())compare_targets); - for (i = 0; i TABLESIZE; i++) + for (i = 0; i tablesize; i++) tag_table[i] = NULL_TAG; for (i = s-count; i-- 0; ) -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Dynamic hash table size (with static has load)
Wayne Davison wrote: Thanks for the patch! Here's some comments: - You didn't change the size of the tag typedef (an unsigned short), and your patch makes the value potentially overflow. Gotcha. I'm sending an amended patch. - For smaller hash-table sizes, your algorithm does a lookup in the table based only on the s1 value (due to the (s2 16) value being too large to have any remainder less than the tablesize). So, I think this probably needs to leave gettag() calling gettag2(), and change gettag2() to factor both s1 and s2 into some kind of an improved tag-generating computation. I disagree. Let's begin with an example. Suppose that we only have 7 hashes (s-count=7). 7/8=0. 0*10=0. 0+11=11. Our hash table size is 11, which is the absolute minimum it will ever get. Now let's suppose all 7 hashes have 1234 as their lower hash value, and have the number 1000 through 1006 as their high value. They will be filed to: 1000:1234 - 4 1001:1234 - 2 1002:1234 - 0 1003:1234 - 9 1004:1234 - 7 1005:1234 - 5 1006:1234 - 3 Obviously, the higher checksum DID get a chance to affect the cell we land in. For the more general case, our function (s1+s2*65536)%ts (where ts is the table size). Modern algebra dictates that this is the same as saying (s1%ts + (s2%ts) * (65536%ts))%ts. In other words, you can first mod each element individually, and only then do the actual addition and subtraction. It's easy to see that s2 will not get nullified ever, unless 65536%ts is zero. As 65536 is 2^16, and as ts is guarenteed to be odd, this is impossible. Venturing deeper into modern algebra, we know it is theoretically possible that s2 will have some affect on the hash cell chosen, but will not be able to choose any cell at all. This can be seen in the case of (s1+s2*15)%9. If s1 is, say, 3, the different s2 values can select cells 3, 6 and 0. This will happen if and only if the factor (15) and the modulo (9) have a greatest common divisor (gcd - open office calc actually has a function of that name) which is larger than 1 (3, in this case). In jargon, we will say that two number that have a gcd of 1 are coprime. Since ts is always odd (we multiply a number by 10 and then add 11), it will always be coprime to 65536 (which is only divided by even numbers). This means that s2 has as much a chance to select the hash cell we end up in as s1. I don't think it is necessary to change that aspect of the code. I did change the comment in the patch to summarize this point. ..wayne.. Shachar ? dynamic_hash.patch Index: match.c === RCS file: /cvsroot/rsync/match.c,v retrieving revision 1.78 diff -u -r1.78 match.c --- match.c 24 Feb 2006 16:43:44 - 1.78 +++ match.c 25 Feb 2006 18:42:05 - @@ -26,9 +26,8 @@ int updating_basis_file; -typedef unsigned short tag; +typedef unsigned int32 tag; -#define TABLESIZE (116) #define NULL_TAG (-1) static int false_alarms; @@ -49,10 +48,11 @@ static struct target *targets; +static size_t tablesize; static int32 *tag_table; -#define gettag2(s1,s2) (((s1) + (s2)) 0x) -#define gettag(sum) gettag2((sum)0x,(sum)16) +#define gettag2(s1,s2) gettag((s1) + ((s2)16)) +#define gettag(sum) ((sum)%tablesize) static int compare_targets(struct target *t1,struct target *t2) { @@ -64,8 +64,14 @@ { int32 i; - if (!tag_table) - tag_table = new_array(int32, TABLESIZE); + /* Dynamically calculate the hash table size so that the hash load + * is always about 80%. + * This number must be odd or s2 will not be able to span the entire set + */ + tablesize=(s-count/8)*10+11; + + free(tag_table); + tag_table = new_array(int32, tablesize); targets = new_array(struct target, s-count); if (!tag_table || !targets) @@ -78,7 +84,7 @@ qsort(targets,s-count,sizeof(targets[0]),(int (*)())compare_targets); - for (i = 0; i TABLESIZE; i++) + for (i = 0; i tablesize; i++) tag_table[i] = NULL_TAG; for (i = s-count; i-- 0; ) -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: information on identifying hard links to a file
Wayne Davison wrote: On Thu, Feb 09, 2006 at 03:04:17PM +0100, Paul Slootman wrote: compare inode and device number. When those are the same, the two files must be hardlinked. Also, rsync only considers files that have a link count larger than 1 (see stat()'s st_nlink) since this allows it to ignore the vast majority of files that have only one link into a filesystem. ..wayne.. Do we also discard the info once we found file names in a quantity that matches the link count? This should allow us to dramatically reduce the memory usage for large transfers. Example: Found file foo1 with dev 304, link count of 2 and inode 17. Cache it. Found file bar1 with dev 304, link count of 3 and inode 18. Cache it. Found file bar2 with dev 304, link count of 3 and inode 18. Mark it as link to bar1. Found file foo2 with dev 304, link count of 2 and inode 17. Mark it as link to foo1 and remove the link cache (found two matches to a file that has two links). Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Announcing a new project allowing the use of rsync with ssh in password authentication mode
Hi all, I know the question came up once or twice lately, and as I needed something similar myself, I actually sat down and wrote it. The project is called sshpass, and it is available from sourceforge at http://www.sf.net/projects/sshpass. In a nutshell, it allows non-interactive use of ssh in password authentication mode. This warning is repeated in the README, as well as at the project's summary page, but I'll repeat it here never the less. This is NOT an ideal solution, security wise. Anyone and everyone are encouraged to use ssh's public key authentication instead of this little utility. It is only meant for use in cases where public key authentication is out of the question, for one reason or another. Usage example (taken from the man page): Performing a password based authentication for rsync, with the password given o nthe command line (sshpass' least secure mode of them all): rsync --rsh='sshpass -p 12345 ssh -l test' host.example.com:path . Hoping this proves useful enough for everyone. Sorry about the noise. Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Help me with this Questiions
Harish wrote: I would like to understand the capabilities of GNU rsync software / utility. This is used for syncing file systems / file level data across two systems. I specifically would like to know its capabilities in syncing files – 1) How does it replicate data changes to files – entire file or only the incremental blocks? As far as I know, actual changes written to file are replacing the entire file. Rsync's incremental nature only extends as far as the network usage goes. 2) Does it have any block level replication capabilities? Rsync is a file utility. It has no awareness of blocks. If you are using Linux (you are not, I know), check out LVM for what you want. It has generic snapshots support, which does what you want. 3) Can it replicate files while the file is in open state (oracle redo log file)? This is particularly a problem in windows environment, typically. Not unless the OS supports it. Rsync doesn't even have a native Windows version at all. It only runs on Windows through an daptive layer called Cygwin, which brings the Unix semantics to rsync. Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: request: add TCP buffer options to rsync CLI?
Wayne Davison wrote: On Tue, Nov 01, 2005 at 09:55:06PM -0600, Lawrence D. Dunn wrote: is it likely, or routine, or will-take-some-time, (or all-of-the-above), for that patch to be vetted and integrated into mainline rsync released code? I'm currently leaning towards including this in the next rsync release unless someone can come up with a reason why it would be a bad idea. ..wayne.. I think it's a good idea, so long as a remote user cannot dictate the options to a running pserver. For all other invocation moethods, an admin had better use something along the lines of http://olivier.sessink.nl/jailkit/jk_lsh.8.html to restrict what should and should not be possible to run on the remote machine. Also, when I have the time (i.e. - not soon :-( ), I will try to write a hardening rsync howto, if you'll publish it. Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Link typo in web resources
To whoever it is that maintains the web site. The page at http://samba.anu.edu.au/rsync/resources.html has a link to the GNU project management page. The link as a space between the http://; and the host name, which means it cannot be opened. Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: request: add TCP buffer options to rsync CLI?
Wayne Davison wrote: The patch also makes the new option accepted by the daemon's command- line parser, allowing whomever starts the daemon to override the config file's socket option settings via the command-line. Care to elaborate on the security implications? What is the potential for a DoS on someone giving out rsync services to basically untrusted parties? Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: meta data stored in separate file?
Joe Pruett wrote: There's something called backuppc (i think backuppc.sourceforge.net) which uses some sort of db backend and has multiple possible transports, rsync is one option. I think it might do what you're looking for. interesting tool, but it is not what i need. it doesn't do acls. it is a pull system, rather than push. this is for an isp setting (which i didn't mention yet) where colocation customers would push their backups to a central box. any other tools out there? I'm currently adding metadata extraction and saving to rsyncrypto. I'm not sure it is within your use scenario, and in any case, it will not have Windows ACLs for 0.16 (the coming version). Just thought I'd put the info in. Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: encrypted destination
George Georgalis wrote: In the archives I see the question about encrypted destination and it's mostly answered with the --source-filter / --dest-filter patch by Kyle Jones. There are also some proposed updates to the patch. A lot of these posts 3 years old, is there plans or reasons not to include them in the main line code? // George Personally, I solved that problem using a preprocessing program. The idea was to not share any key data with the destination. If that's interesting to you, do check out rsyncrypto (http://sf.net/projects/rsyncrypto). What it does is to encrypt the files prior to rsyncing them. The twist is that the files are encrypted in a way that does not obliterate the wire efficiency of applying rsync to the encrypted files. Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Has anyone seen this?
http://use.perl.org/~Matts/journal/25138 Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: RSYNC doesn't like Unicode?
Stuart Halliday wrote: Paul Slootman said: The common issue seems to be windows systems, as far as I can tell here. Perhaps transferring files (or rather, filenames) between windows systems with differing locales (or language settings) is the problem, and someone with intimate knowledge of how to manipulate filenames on windows needs to investigate this. The problem seems to be that there aren't too many people that fall into that category. As a Wine hacker working on Unicode related tasks, I think I'll pick this title up. I was using Rsync to copy favourites from one english UK XP sp2 machine to a Windows 2000 sp4 english UK machine. No different language settings involved. The important thing here is the codepage used. Rsync is not a Unicode application on Windows, and so it's interpretation of the file names is dependent on the current codepage. The current codepage is called Default locale on Windows 2000 and Codepage for non Unicode applications on Windows XP. Either way, it's in the Regional Options control panel applet, it's global to the computer, and requires administrator privilege and a reboot to change. Please check what your settings are on both computers, and let us know. It just so happened that I had placed in my favourites some URLs with a few European characters in their name. Something on Windows is tripping up Rsync that's for certain. Theoretically, turning rsync into a unicode app on Windows could solve these issues. I doubt it will actually work, however. It is highly likely to create more problems than it solves, but let's try and find out what the current problems are before we try to think of a solution. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. http://www.lingnu.com/ -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Close list to outsider's posts?
Hi I'm assuming that Wayne is the obvious destination for this request. Can we make the mailing list reject emails from non-subscribers? This would drastically reduce the amount of spam we receive. Thanks, Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Encryption
Gary Holzer wrote: Hi All, I am using rsync to backup our office server to our Internet server (RHE). As an association for doctors we are looking at providing a backup service for their practices using rsync. As it would be patient data it would need to be encrypted. I have found a few options, namely esync wurt rsyncrypto Does anyone have experience with the above and perhaps like to recommend one? On the client side we are on Windows boxes using cygwin. Thanks I am (as you know) the maintainer for rsyncrypto. I looked a little into esync (a while back, I'm not sure I fully remember the differences, though). I have no idea what wurt is, so a link would be greatly appreciated. The main difference between rsyncrypto and esync is in the amount of state information stored between operations. With rsyncrypto, this is a mere 52 bytes, containing the initial value for the CBC, the symmetric encryption key for the file, as well as three parameters used to determine CBC resets. This information is enough to make a repeated encryption of the same file (modified or not) identical enough to the original that rsync will manage to pick up just the differences. This 52 byte file is fully recoverable from the encrypted file, if you have the assymetric private key. Esync, assuming I understood it correctly, actually requires keeping around enough information about the properties of the reset points (it uses a completely different algorithm). On first reading the esync algorithm sounded like one having a cryptographic weakness, but: 1. It was a long time ago, and I don't remember the details. 2. On second reading I remember thinking that the hole was plugged after all, at the expense of performance. 3. I cannot be said to be impartial, being as I maintain a competing technology. Also with esync: - You need a custom version of rsync on both ends. - May be relevant for you - there is no Debian package :-) Bear in mind that any manipulation to an encryption system to make it rsync friendly means that we are weakening it. This is obviously true for rsyncrypto too. Myself, I'm fairly confident that the weakening is nothing to be worried about, but do bear that in mind. This is stepping off the trodden path, a cryptographic risk, in exchange for better network performance. As for experience, rsyncrypto is part of a commercial backup service my company is running, so you can say I have some experience with it, yes :-). Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Spam to this list
Alun wrote: Shachar Shemesh [EMAIL PROTECTED] said, in message [EMAIL PROTECTED]: Reject codes were very common once. Then they were recommended against. They were recommended against for a reason, that reason being that they expose the user base to password and other guessing. Who recommended this?! I'm replying off list, to tune down the noise. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Spam to this list
John E. Malmberg wrote: The I.P. address is listed in bl.spamcop.net as hitting spamtraps. Just so you know, spamcop view bounces as spam. According to them, you should never send bounces. I believe the right approach is to convince admins to drop spamcop from their RBL list, rather than remove the very essential NACK SMTP has from all servers, as per spamcop's request. -John [EMAIL PROTECTED] Personal Opinion Only Same here. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Spam to this list
John E. Malmberg wrote: The essential SMTP NACK is not what is the problem as long as it is done during the SMTP connection using reject codes. Issuing a SMTP reject code for undeliverable messages will never cause a spamcop.net listing. Reject codes were very common once. Then they were recommended against. They were recommended against for a reason, that reason being that they expose the user base to password and other guessing. When Spamcop was confronted with spammers harvesting email using rejection codes, Julian responded with the laughable I don't know of spammers who do that. What? Not to mention the fact that secondary MXes are impossible to reject during SMTP, as are virtual domains (for all practical purposes), later filters, and many many many other cases. Julian's solution is either don't provide NACK or hold the original SMTP until you know what to reply. I'm sorry, but both answers are laughably sad, and effectively mean the end of SMTP. I know, it's bad to be bombarded with bounces. I've been there myself. Destroying the reliability of SMTP for this high cause, however, is something I cannot abide by. I have heard of enough cases where important emails vanished without leaving a trace to consider this a trivial or unimportant problem. The SMTP bounce is an artifact from the time when third party open relays where also in common use. At that time, it was needed by the third party open relay to return the non-delivery message. No. See above. I won't mention qmail again, because Julian seems to not mind the fact that it's the only safe MTA around, but the simple fact is that any time you need to perform processing in order to accept or reject an email, you need to accept the mail and then decide. Keeping a TCP connection open just so you can put in a reject code in the protocol opens you up for DoS, as well as threaten the very delivery due to timeouts. And, you have not mentioned secondary MXes and downed networks yet. Now, almost no mail servers will accept e-mail from known open relays, so when they can not deliver an e-mail, if they use an SMTP reject code, then the sender's mail server, which should trust the sender will generate the bounce message. It's a great theory. Too bad it doesn't cover all cases. If these bounces from the sender's mail server are going to forged addresses, then there is a security problem on the sending network that needs to be fixed. No, there is a bandwidth problem. I agree that it's a problem, but I totally disagree with the solution. And since medium to large networks pay a metered rate for their internet connection, bouncing instead of using SMTP rejects will significantly increase their operating costs as it will cause them to pay for the bandwidth for 6 spam/virus e-mails for every 1 real e-mail that they receive. Using SMTP rejects and DNSbls eliminates almost all of that cost from their operation. I don't see the difference. One way or the other, SMTP is shot. If someone shoots down a protocol I need, I call him the enemy of the public. So far it has been spammers. Now it's spammers and Spamcop. My mail server doesn't bounce viruses. The reason is that I can detect viruses with close to 0% false positives, so I feel fairly confident in sending them to /dev/null. Unfortunately, spam does not enjoy this rate of false positives. What's more, even if it did, the occasional false negative would mean that I would still get blacklisted. Look, I chose a difficult to understand name for my company (Lingnu). As a result, many times, if I'm telling someone my email address over the phone, they'll get it wrong. It didn't used to be a problem. If one in four got it wrong, they would get a bounce and call me. Not any more. One of the domains around mine sends all incoming email to /dev/null, and people are mad at me for not responding to my emails. Do tell me that this is ok with you, or that you don't think that SMTP loses a lot from it's functionality (even more so than because of Spam) as a result. Again, this is without bringing qmail into the picture. Qmail, as a direct result of a design that keeps security in mind, cannot send rejections inline. The daemon accepting the mail simply doesn't know what's behind it. It's an unprivileged drown that take the emails and queue them, not having any idea what will happen to them afterwards. In an environment where spammers exploit security holes to infect computers with spam sending zombies, telling an MTA admin to switch to something less secure because you don't like something defined by the RFC is counter-productive and does more to hurt spam fighting than to help it. Now, this is getting off topic for rsync, so please do feel free to send me your reply privately. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https
Re: Max filesize for rsync?
[EMAIL PROTECTED] wrote: I would be very happy to test any patches. (Assorted RedHat/Fedora i386) (Assume I am a total newbie, much safer that way) A few very large files regularly rsync'd in production. Seems like it sometimes gets somewhat stuck in the middle of something large. (The rsync is mostly staging area to staging area. Plenty of redundancy, so I'm unlikely to get hurt if I'm aware of problems. The way the targets are used, I will know about problems before any real damage is done.) The more important of the transfers are over occasionally very bad internet connections, so I'm pretty much in the situation of something to gain, nothing to lose. -rw-rw1 27 27 1187270120 Apr 13 03:24 /home/rsync-sjs/mysql/sjs/dwf.MYD -rw-rw1 27 27 1098515060 Apr 8 07:34 /home/rsync-sjs/mysql/srvs/dwf.MYD -rw-rw1 mysqlmysql840374964 Apr 12 12:59 /home/rsync-2bb/mysql/srv/dwf.MYD -rw-rw1 mysqlmysql520216980 Apr 12 20:54 /home/rsync-2bb/mysql/map/dwf.MYD -rw-rw1 mysqlmysql221876208 Apr 11 14:21 /home/rsync-2bb/mysql/ecp/dwf.MYD Actually, rsync's implementation only starts to give suboptimal results when we pass the 2.5GB area. Up until there one should see no noticeable difference in performance (well, my patch can allow using somewhat less memory for those cases, but no one would really miss 256KB of RAM these days, and I'm not sure how much impact the CPU cache effect is going to have in rsync's case. I guess that some effect will be seen anyways, but not as noticeable). The performance bottleneck is due to hash table buckets load. Optimal load, as taught at computer science, is 80% (or 0.8). Your smallest file puts a load of 23%, while your largest file has a load of 53%. You shouldn't see any problems when using rsync (at least, not the type of problem I'm talking about). Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Max filesize for rsync?
Jeff Schoby wrote: Well, I got -further- by changing the fsize= to -1 in /etc/security/limits on my AIX boxes, but rsync ultimately still did not like my 15GB file I wanted to transfer. What does doesn't like mean? Does it freeze with too much CPU usage? Had to resort to good ol' plain vanilla ftp. How long does transferring the file via ftp take? A 15GB file puts a load of 194% on the hash table. This means that, *on average*, each lookup will have to scan two areas. I'm not sure whether that should have created a huge slowdown or not. Try rsyncing that file with the --block-size=524288 (i.e. - a 0.5MB block size), and please report whether rsync's behavior had improved, and in particular, how does it rate against vanilla ftp. Thanks, Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Max filesize for rsync?
Jeff Schoby wrote: What the maximum filesize rsync can transfer? I'm trying to rsync one of my servers to another but the rsync is croaking on a file that's barely 1GB. Tips, hints, suggestions? rsync server is AIX 4.3.3 ML11 - rsync 2.6.3 rsync client is AIX 5.3 ML1 - rsync 2.6.4 Thanks -Jeff Please note that, all file size OS limitations aside, rsync has suboptimal performance for too big files. When I get around to it I'll try to create a patch, but in the mean while too big files will have too many non-real hash table collisions, and may become extremely slow. If you run across this problem, please post on list, as we need someone to experience this problem in order to try and fix it. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Always exitcode 256 under Cygwin with rsync 2.6.4
Wayne Davison wrote: On Mon, Apr 04, 2005 at 07:28:02AM +0200, Joost van den Broek wrote: When you just give an empty rsync command, it should also exit with an exit code (1). But the errorlevel gets set to no. 256 instead. As mentioned in the other message that brought this up, I assume that this is something wrong with the cygwin version (perhaps in how it was compiled?). Rsync is exiting with all the right codes under Linux. ..wayne.. It is curious to note that, under Posix, it is impossible to exit with return code 256, as the return code is an 8 bit value. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Last plug - rsyncrypto independant devel mailing list
Hi all, A while back I announced Rsyncrypto, a rsync friendly encryption system. With version 0.11 just out, and proving reasonably usable in and on its own right, we now have an independent mailing list discussing just rsyncrypto. I have therefor allowed myself this one last notice to this list. All further rsyncrypto related announcements will go to [EMAIL PROTECTED] (http://lists.sourceforge.net/lists/listinfo/rsyncrypto-devel for the subscription page). That is, unless Wayne tells me that it's ok by him to announce major milestones here as well :-) Thanks, and sorry about the noise. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsyncing really large files
Kevin Day wrote: Shachar- I think Wayne is mostly pointing you to the correct location here. If you look at the code where the checksum is computer, you'll find that the rolling checksum actually consists of two parts - one is the sum of all of the byte values in the window, the other is the offset weighted sum of all of the byte values in the window. The first of these is then left shifted 16 bits and added to the other to come up with the official 32 bit rolling checksum. This works fine as long as you aren't counting on a random distribution of bits among the 32 - if you mod the value, you are giving much greater importance to the lower XX bits, effectively dropping the distribution of the high order bits... Anyway, the two 16 bit values may be random enough that my concern is not founded, but it should be tested before assuming that the rolling checksum is really a 32 bit value that can easily be divided up into buckets. PS - none of the above has anything to do with the strong signature of the window - just the rolling check sum. Cheers! - K Some modern algebra for you, then. We have two numbers. One is always multiplied by 2^16. We want both numbers to be able to totally affect the bucket into which the eventual checksum arrives. If we will choose a hash table size that is co-prime to the bits we want to remain significant, then we achieve that. Well, guess what? Factorization of any and each bit in the combined checksums yields only twos. In other words, any hash table size that will be odd (i.e. - two is not in it's factorization primes) will be co-prime to our shifted checksums, thus promising that they will get an equal chance of affecting what bucket our checksum actually falls into. It therefor follows that I have to amend my previously proposed hash table size choosing formula. The new formula is: (numblocks/8+1)*10+1 And you're done. Of course, this can also be written as: (numblocks/8)*10+11 Which is slightly more efficient. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsyncing really large files
Lars Karlslund wrote: And I'm suggesting making it static, by adjusting the hash table's size according to the number of blocks. Just do hashtablesize=(numblocks/8+1)*10;, and you should be set. Or maybe it should really be dynamic. I'm talking about the hash table load. I.e. - the ratio between the number of buckets the table has, and the number of blocks that go in it. This is almost unrelated to your problem. I adjusted the block-size as an experiment, as I read somewhere about the default blocksize of 700 bytes. Now I'm told the blocksize is calculated automatically. Which is it? According to my extremely non-official reading of the source code - dynamic. Do try to lose the parameter and see how things are doing. Also, try setting it really high, say, 50MB, and tell us how things go then. This is just so we find out where the bottleneck is in your case. But hey, I can run all the tests you want. Just tell me what to do. See previous paragraph. Comparative numbers of block sizes of: always transfer. Block sizes of 64k (as you have been doing so far) Default block sizes (about 700K, according to my calculations). 50MB block sizes. Also, knowing the CPU and network load of each solution would be very beneficial. Okay, okay, my mistake. Should I just remove the parameter altogether? Would probably be better, yes. Keyin, I'm trying to make rsync better. Lars' problem is an opportunity to find a potential bottleneck. Trying to solve his use of possibly Well, its probably a non-standard situation for rsync anyway. The fact that your setup is highly likely non-optimal does not mean that rsync cannot be made even better. non-optimal values won't help rsync, though it will help him. Let's keep Well, me either, as the rsync job processes both this gigantic file and other smaller ones. If you don't specify block sizes, this should not be a problem. Whoa, it that the subject? I thought the subject was solving my problem big smile Not for four or five messages, no :-) Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsyncing really large files
Kevin Day wrote: As a quick FYI, the block size absolutely has an impact on the effectiveness of the checksum table - larger blocks means fewer blocks, which means fewer hash colissions. Since you wouldn't expect that many blocks in the file, a 32bit weak checksum would only produce about 4 or 5 real collisions. Mind you, this is me doing educated guesses. I haven't worked out the actual math yet. I don't think we need to worry about this particular problem just yet. Hash table collisions, however, are much more likely, which is what I'm trying to solve here. That said, however, I completely agree that for very large files, the number of buckets in the current implementation is not optimal. Perhaps having a mode for really large files would be appropriate. I don't see why such a mode would be necessary. One caution on increasing the size of the hash: The current implementation gives 16 bits of spread, so modding that value with the desired bucket count would work fine. That's not what I read into it. It seems to me that the checksum function gives a 32bit result, and we are squashing that into a 16bit hash table. Can you point me to the code? Wayne? However, if you choose to start with the 32 bit rolling hash and mod that, you will have problems. The rolling checksum has two distinct parts, and modding will only pull info from the low order bits, Why? This may be something I missed within the code. which will likely get you considerably less than even the 16 bits that the current implementation gives. If the source is 16 bit, doing any hash table size bigger than 65536 buckets would make no sense, true. Is it 16bit? I'd recommend using a real 32 bit hashing function to mix the two rolling checksum components, What two parts? If I understand rsync correctly, we have a rolling checksum, and a real checksum. The rolling checksum is used to single out potential matches, and the real checksum makes sure these are, indeed, real matches. We only need to put the first one into the hash, as we are never doing mass-lookups on the second. Am I missing something basic here? Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsyncing really large files
Wayne Davison wrote: On Thu, Mar 03, 2005 at 10:18:01AM +0200, Shachar Shemesh wrote: And I'm suggesting making it static, by adjusting the hash table's size according to the number of blocks. The block-size? Definitely not! I was talking about the hash table load. I.e. - the ratio between the number of blocks and the number of hash table buckets. I.e. - after determining the number of blocks, only then decide on a hash table size, and work accordingly. This means you use little memory for small files, and more memory for big files - should be an acceptable trade off. Since it only needs to note a found/not-found state, the table can be a single bit per node, and a 19-bit lookup only needs 64k of memory. But that only works if the checksum function and the hash table are exactly the same size. Also, you still need to store the verify value somewhere, and efficiently find it. I'm not sure that's optimal. If we take a 500GB file, as is Lars' case, and assuming we don't touch the block size (i.e. - we use the default 740K blocks of 740K size each), we will need about 900 thousand buckets in the hash table at alpha ratio of 80%, which means 4MB in pointers. I hardly think this is enough memory consumption (for efficiently transferring a 500GB file) to justify further complicated bit operations. (on the flip side, 64KB fit into the CPU's data cache, while 4MB usually will not. I'm not sure how crucial that is going to be turn out to be). This allows a rapid yes/no pre-check for the weak value before we look-up the actual strong checksum value in the hash table and should result in less searching for values that aren't there. But how will you find it there? If you are going to have 740K blocks (i.e. - 740,000 strong hashes) in a 16bit hash table, you are going to have lots of collisions there (190 per bucket, on average), and you gained nothing. ..wayne.. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsyncing really large files
Lars Karlslund wrote: Also as far as I could read, the default block size is 700 bytes? What kind of application would default to moving data around 700 bytes at a time internally in a file? I'm not criticizing rsync, merely questioning the functionality of this feature. I believe you may have missed the point there. 700 bytes is not the amount the application is expected to have changed. 700 bytes is merely the unit of data examined as one block. This means that if you take a file and change it by one byte, 700 consecutive bytes (sometimes), or two bytes 698 bytes away from one another, rsync will treat it as a single changed block, and will resynchronize the data there. This number is a trade off. The larger the number, the more bytes need to be synched if a single byte changes (more network traffic). Also, the larger the number, the higher the cost if a small change crosses a block boundary, but the lower the chances of that happening. The smaller the number, the more checksums have to be calculated and transferred (more network traffic). Also, the smaller the number, the more blocks in a file, and the higher the chances of checksum collisions that do not stem from a truly identical block, resulting in the need to calculate a stronger hash for the block and transfer it (more IO, cpu, network load and latency). I'm too new to this project to know what benchmarks were done to bring the block size to default to 700, but it seems like a nice number. If your characteristics vary, you may wish to play around with it. As for granularity - supposed you added one byte to the file. This means that the sender has a file which has a one byte offset from the receiver. The sender will have one block for which there is no counterpart at the receiver, and all other blocks will have a one byte offset (which rsync will detect, and save the traffic). In short, we see that the 700 number has almost nothing to do with the application that the file belongs to. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsyncing really large files
Lars Karlslund wrote: Maybe I didn't express myself thoroughly enough :-) Or me. Yes, a block is a minimum storage unit, which is considered for transfer. In size, yes. Not in position. But it's a fact that the rsync algorithm as it is now checks to see if a block should have moved. And in that case, the 700 bytes default is very much worth considering. No, because the rsync algorithm can detect single byte moves of this 700 bytes block. If no blocks at all move in a 700 byte increment (i.e. 700 bytes gets inserted somewhere - optimally at a 700-byte boundary in the file), then all you get is larger memory and CPU usage and all the bandwidth reduction you need. The point I think you are missing is that the 700 bytes block need not be on 700 bytes boundaries. They can be on one byte boundaries. It may very well be that, for your specific application, increasing the block size considerably will be better. If your files are huge, and the changed areas are very small in comparison to the file size, that can yield significant improvement. However, this is due to the trade offs I talked about in my previous email. It has nothing to do with 700 bytes being unrealistic or incorrect. True, and in that scenario it makes no difference what the block size you choose: if the one byte is inserted at the beginning, the entire file will be transferred. No, just the first block. Rsync is not diff, and does not patch the file dynamically if the file has random insertions/removals. Well, in a way, it does. It's really quite ingenious. As I have no relation to it's implementation, I can say that whole heartily. I encourage you to read the about the algorithm on the site. You make no comment on my calculations on the block-moving algorithm in my real-world scenario, which was the basis for this discussion anyway. I'm sorry. You just stated as facts things I knew to be incorrect, so I allowed myself to skip your calculations. I don't think there is any argument that you are getting sub-optimal results from rsync. The question is why. How much memory is on the machines? Try to bring the block size up to 1MB. This will mean you will have only 524 thousand blocks, which may prove more manageable. Best regards, Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsyncing really large files
Shachar Shemesh wrote: No, because the rsync algorithm can detect single byte moves of this 700 bytes block. I will just mention that I opened the ultimate documentation for rsync (the source), and it says that the default block size is the rounded square root of the file's size. This means that your 64KB blocks are considerably smaller than what rsync would use if you didn't force it (which is about 740KB, much closer to my 1MB suggestion than to your 64KB actual use). If I were you, I'd try to remove the --block-size option, and see what happens. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsyncing really large files
Wayne Davison wrote: However, you should be sure to have measured what is causing the slowdown first to know how much that will help. If it is not memory that is swapping on the sender, it may be that the computing of the checksums in maxing out your CPU, and removing the caching of the remote checksums won't buy you as much as you think. You could use some of the librsync tools (e.g. rdiff) to calculate how long various actions take on each system (i.e. try running rdiff on each system outputting to /dev/null to see how long the computing of the checksums takes). ..wayne.. Hi Wayne, Excuse me if I'm talking utter nonsense here. I have only just now opened the code up and looked at it. It does seem, however, that there is a considerable optimization that can be performed here. Correct me if I'm wrong, but it seems to me that the checksum matching code is at match.c, inside hash_search. Particularly, the do...while loop. It seems that the loop is there to scan the entire checksums list for each byte. Is that really the case? If so, we can probably make it much much (much much much) more efficient by using a hash table instead. We wouldn't even have to change the line protocol in any way. Am I misreading the code? Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsyncing really large files
Kevin Day wrote: I would *strongly* recommend that you dig into the thesis a bit (just the section that describes the rsync algorithm itself). I tried a few weeks ago. I started to print it, and my printer ran out of ink :-). I will read it electronically eventually (I hope). Now, if you have huge files, then the 16 bit checksum may not be sufficient to keep the hit rate down. At that point, you may want to consider a few algorithmic alternatives: 1. Break up your file into logical blocks and rsync each block individually. If the file is an append only file, then this is fine. However, if the contents of the file get re-ordered across block boundaries, then the efficiency of the rsync algorithm would be seriously degraded. 2. Use a larger hash table. Instead of 16 bits, expand it to 20 bits - it will require 16 times as much memory for the hash table, but that may not be an issue for you - you are probably workring with some relatively beefy hardware anyway, so what the heck. Now, here's the part where I don't get it. We have X blocks checksummed, covering Y bytes each (actually, we have X blocks of checksum covering X bytes each, but that's not important). This means we actually know, before we get the list of checksums, how many we will have. As for the hash table size - that's standard engineering. Alpha is defined as the ratio between the number of used buckets in the table to the number of total buckets. 0.8 is considered a good value. What I can propose is to make the hash table size a function of X. If we take Lars' case, he has 500GB file, which means you ideally need about 1 million buckets in the hash to have reasonable performance. We only have 65 thousand. His Alpha is 0.008. No wonder he is getting abysmal performance. On the other hand, if I'm syncing a 100K file, I'm only going to have 320 blocks. A good hash table size for me will be 400 buckets. Having 65536 buckets instead means I'm less likely to have memory cache hits, and performance suffers again. My Alpha value is 204 (instead of 0.8). If my proposal is accepted, we will be adaptive in CPU-memory trade off. I'll leave the excercise of converting the full rsync 32 bit rolling checksum into a 20 bit value to you. A simple modulo ought to be good enough. If the checksum is 42891 and I have 320 buckets, it should go into bucket 11. Assuming the checksums are sufficiently random-like, this algorithm is good enough. Cheers, Kevin Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Commercial Liscense for rsync
fred wu wrote: Dear All, I am very new in open source. Please give me some ideas on the following questions. What is the liscense of rsync for commerical use? Do I need to pay for commercial use? Is it still GPL? A similar case is MySQL. It has commercial liscense for business use. Any help will be appreciate! Thanks! Regards, Fred Hi Fred, My answer is neither official (I am not a copyright holder for rsync) nor authorative (I am not a lawyer). I did study these things, however. The question of whether you can or cannot use a GPL product for commercial use depends on what the use is. If all you want is to USE the product, go right ahead. It matters not whether for money or not. Use rsync (or Mysql, or Linux, or anything else under the GPL). If what you want to do is to resell one of those technologies, it depends on what you want to happen to it in the interim. In the case of a pure standalone program, such as rsync, if you didn't change the program itself, you are fairly free to sell it as part of your product. If, however, you want to make changes to it, you must read and abide by the terms of the GPL in order to do that. In essence, you can do whatever you like, so long as you: 1. Don't change the license. I.e. - it must still be GPL (along with all of your changes). 2. Distribute the complete sources, in a form that allows your clients to recreate the program. 3. Make sure your clients know, either through the documentation or through something the program itself prints, that they have these rights. I hope this answers your question. Also, don't even think about relying on this small email to base your business on this answer. If in doubt, get a lawyer. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsync run on TCP/IP?
fred wu wrote: Dear All, I have searched the email archive but i didn't find a clear answer(maybe I miss it) ^^ 1. Does rsync run on TCP/IP? Rsync has a server mode. Read the manual, and particularly the --daemon option. 2. Without ssh, then rsync will be transferred in plain text? If you don't use SSH (or another encryption encapsulation, such as sslproxy), the data will be transferred in plain text. 3. Does rsync support Samba? Am I true to say that rsync, which is mainly on user authetication, is transparent to the storage like ftp? rsync works on the file system. It doesn't care what the file system is. If it's a samba mount, then that's ok. I am not aware of a FTP filesystem, and therefor can't say that it will work over ftp. Any help to any questions will be appreciated! Thank You! Regards, Fred Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Rsync friendly zlib/gzip compression - revisited
block size to 1000 and leave it there. If we look at RSync performance as a function of window size, we find, as expected, that the rsync performance decreases. The curve isn't smooth (the compression algorithm causes all sorts of havoc here), but the general trend is clear: Increasing the compressed window size and block mask size = decreased rsync performance. What isn't clear (because I varied the window size and block mask size together), is which of the two actually is the driving force (window size or block mask size). My initial results with fiddling with the compression performance indicate that it should be the block mask size that matters, and the window size shouldn't matter. Here are the test results: [Window AND block mask Size] [Speed Up] 100 335.6063 600 335.56348 1100 264.708 1600 273.0129 2100 220.42537 2600 226.33165 3000 219.47977 3500 225.90854 4000 179.82353 4500 195.95692 5000 230.86983 My initial thoughts here are: 1. If the block mask size is mostly responsible for determining the performance of the zlib compression algorithm and if the window size is mostly responsible for determining the performance of the rsync algorithm, then we may have an opportunity to optimize the performance of the current --rsyncable patch. The test results above imply that we could be looking at a 25% improvement in rsync performance without impacting compression or significantly adjusting Rusty's algorithm. I think that the test results so far tend to support this as a possibility. 2. Further, if we can get really small window sizes by switching to the original rsync rolling checksum (instead of using the simple checksum in the current patch), then we *might* be able to achieve even better optimization, without adding a lot of computation overhead (the overhead of the rsync algorithm compared to the simple computation used now should be pretty low compared to the rest of what's going on in the zlib computation). Anyway - that's where I'm headed with this. I'll have results to indicate whether #1 is worth pursuing (i.e. testing with many other files) tomorrow. #2 will require a bit more coding to the --rsyncable patch, but it should be relatively simple to do. That's all I have for now - I'm running tests this evening to independently adjust the window and block mask size, and to test a different data file just to make sure I'm not way off-base (i.e. make sure the trends indicated by my results are not dependent on the input file). I'd love to hear any thoughts on the matter! - Kevin -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync huge tar files
Martin Schröder wrote: On 2005-02-04 11:51:20 +0200, Shachar Shemesh wrote: What distro is this? If it's Debian, gzip has an option called --rsyncable. This makes changes to the uncompressed file local in the This is a debian-only patch which doesn't change the gzip version. :-( Best regards Martin Which kinda reminds me. Anyone knows where I can find the rsyncable patch in an isolated form? Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. http://www.lingnu.com/ -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync huge tar files
Harald Dunkel wrote: Hi folks, Are there any tricks known to let rsync operate on huge tar files? I've got a local tar file (e.g. 2GByte uncompressed) that is rebuilt each night (with just some tiny changes, of course), and I would like to update the remote copies of this file without extracting the tar files into temporary directories. Any ideas? Regards Harri What distro is this? If it's Debian, gzip has an option called --rsyncable. This makes changes to the uncompressed file local in the compressed file. If this is not a Debian system check maybe the rsyncable patch was integrated there too. If not, compile your own version of gzip. In order to apply it to the tar file, you will have to not use the z option while creating the tar, but instead pipe it to gzip. Instead of doing tar czf file.tgz dirs... you do tar cf - dirs... | gzip --rsyncable file.tgz Enjoy Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. http://www.lingnu.com/ -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Transforming file names contents
[EMAIL PROTECTED] wrote: --- Wayne Davison wrote: Another option would be to use some kind of a compressed filesystem. Do you know of one that works on Linux? I searched for this a few months back but came up empty. Thanks, Joe I'm currently working on a tool that would do such a transformation as a preprocessing. I.e. - the files are preprocessed, and then the other directory is rsynced. In my case this is for encrypting (but I compress as part of the process, using the gzip rsyncable option). If you're interested in this extremely preliminary pre-alpha not tested still being developed use at your own risk project, check out http://sourceforge.net/projects/rsyncrypto Shachar P.s. Currently, it's just the encryption part that I've worked on. The code in CVS (there is no other code at the moment) does not handle the actual tree scanning and copying, but it will soon. It goes without saying that any help would be appreciated. -- Shachar Shemesh Lingnu Open Source Consulting ltd. http://www.lingnu.com/ -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html