Re: Don't follow bind mounts?

2013-05-16 Thread Shachar Shemesh
On 16/05/13 03:52, Carl Brewer wrote:

 Hello,
 The manual says that rsync treats bind mounts on UNIX (Linux) to the
 same filesystem as being on the same filesystem.

 I have a server with a pile of bind mounts to the same filesystem for
 some access control/ease of use for FTP users modifying websites. 
 This makes my backups using rsync messy!

 Is there any way to stop rsync from following bind mounts to the same
 filesystem?  Short of unmounting them all at backup time and
 remounting afterwards or explicitly excluding each one?
Rsync uses the device id to know whether it has crossed a mount
boundary. Since bind mounts have the same device ID, rsync does not know
it has reached one. Most other tools (tar) have the same problem.

My personal solution is to bind-mount the root of the file system to a
neutral location, and rsync from there. So if I have /dev/sda17 mounted
on /src/messy, with lots and lots of bind mounts (and other filesystems)
inside it, I do mount --bind /srv/messy /tmp/backup, and then rsync
from /tmp/backup.

This also makes sure that I back up all directories hidden by other
mounts done later.

Shachar
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync in theory.

2011-04-15 Thread Shachar Shemesh

On 12/04/11 01:58, Evan Rempel wrote:


I am looking at add some code to the rsync tool but want to
know if I am totally out to lunch.

I realize that my example is so trivial that I am sure I will
get replies of don't do it that way, but bear in mind
that it is just an example, and there are real world cases
where I think this functionality would be useful.

I am trying to figure out if rsync can do something like

cat myfile.dat | rsync - remoteHost:/some/path/myfile.dat

which would take a stream of data and send/store it onto the
remote host.
Not possible with rsync (as far as I know), but is possible with 
librsync (which is a completely different code base than rsync itself).


My questions is more about can the rsync algorithm
do this?.

As far as I can tell, you need two passes on one end (either the 
receiving or the sending). There is no reason for the other end to be 
completely one pass.

Technically, the question boils down to Is rsync a single pass
algorithm, or is it a multi-pass algorithm?

If it is a single pass algorithm then all is good.

If it is a multi-pass algorithm then how big of a buffer does it
need to perform the passes?
The definition of one pass is can be performed with one reading of 
the file and a O(1) buffer. If you can answer the question, then it is, 
by definition, one pass.

Namely, is it a block by block multi-pass,
or is it a complete file/object multi-pass algorithm.
Again, the question is meaningless. If you can apply an algorithm one 
block at a time, then it's one pass by definition.


Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Optimizing RSync algorithm using techniques with Google used in courgette

2009-09-08 Thread Shachar Shemesh

Hasanat Kazmi wrote:

Hi,
I am student at LUMS SSE (http://cs.lums.edu.pk) and an active RSync user.
Just few days ago, Google wrote about Courgette*: an algorithm which 
is specially written for syncing executables. By using Courgette, 
Google made diff size 1/10th of previous techniques used.
I was wondering if this (or something on same lines) can be used to 
optimize RSync? I am senior and have to do a project. I am thinking to 
implement this in RSync. I need input from developers. What do you 
guys think?


*http://dev.chromium.org/developers/design-documents/software-updates-courgette

Hasanat Kazmi


Hi Hasanat,

Like you said in the subject, this is an optimization. A format specific 
optimization. In other words, it uses a known property of the file being 
synchronized in order to make the diff size smaller. If you were to try 
to use the courgette pre-processing on something which is not an 
executable, you would have gotten significantly worse results than 
merely running rsync.


At the moment, for better or for worse, rsync does not do format 
specific optimizations. As long as that is the case, rsync cannot be 
optimized using this algorithm.


Even if we (and by we, I mean Wayne, or anyone else brabe enough to 
pick this task up) were to implement such a functionality, I can think 
of quite a few things that would have a lot more to gain than 
executables. In particular, something that would uncompress both source 
and destination, and apply the rsync algorithm to both files, and then 
make sure the recompression of the target produces the exact same result 
would, IMHO, be much more useful than the change you are suggesting.


Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync algorithm for large files

2009-09-05 Thread Shachar Shemesh

ehar...@lyricsemiconductors.com wrote:


I thought rsync, would calculate checksums of large files that have 
changed timestamps or filesizes, and send only the chunks which 
changed.  Is this not correct?  My goal is to come up with a 
reasonable (fast and efficient) way for me to daily incrementally 
backup my Parallels virtual machine (a directory structure containing 
mostly small files, and one 20G file)


 

I’m on OSX 10.5, using rsync 2.6.9, and the destination machine has 
the same versions.  I configured ssh keys, and this is my result:



Upgrade to rsync 3 at least.

Rsync keeps a hash of the blocks of sliding hashes. For older versions 
of rsync, the has was of a constant size. This meant that files over 3GB 
in size had a high chance of hash collisions. For a 20G file, the 
collisions alone might be the cause of your trouble.


Newer rsyncs detect when the hash gets too big, and increase the has 
size accordingly, thus avoiding the collisions.


In other words - upgrade both sides (but specifically the sender).

Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Problems while transferring big files

2009-03-08 Thread Shachar Shemesh

Wayne Davison wrote:


We hypothesize that there can be an accidental match in the checksum
data, which would cause the two sides to put different streams of data
into their gzip compression algorithm, and eventually get out of sync
and blow up.  If you have a repeatable case of a new file overwriting an
existing file that always fails, and if you can share the files, make
them available somehow (e.g. put them on a web server) and send the list
(or me) an email on how to grab them, and we can run some tests.

If the above is the cause of the error, running without -z should indeed
avoid the issue.
  
If I understand the scenario you describe correctly, won't running 
without -z will merely cause actual undetected data corruption?


Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rssync source code as a windows project

2008-12-04 Thread Shachar Shemesh

Jignesh Shah wrote:

Hi Friends,
 
I have started learning rsync source code but I am finding very 
difficult to go back and forth to find the execution flow. I could see 
that rsync code is written in UNIX and the compilation is difficult. 
Does anybody converted it into Windows Project so that we can open in 
using Visual Studio IDE and it will be very simple to search for some 
function and find the complete work flow.
 
Thanks,

Jignesh
RTFM ctags and cscope, or just create a project and put the sources into 
it (not that I think the later will do you much good).


rant
Personally, I find VS's cross reference to have deteriorated 
considerably over the versions. VS6 had a cross reference that was tied 
to the compiler's symbol tables. This worked excellent, as no amount of 
preprocessor trickery would fool it. I much preferred it to ctags. 
Somewhere between version 6 and version 9, MS switched to Intellisense 
for cross referencing. My guess is that the VS6 version wouldn't cross 
reference a project unless it could compile it, and people (or at least 
MS's sales people) complained. As a result, the cross reference is much 
less accurate and error prone, and I no longer see any advantage for it 
over ctags and other tools available for Linux and Posix platforms.

/rant

Shachar
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rssync source code as a windows project

2008-12-04 Thread Shachar Shemesh

Jignesh Shah wrote:
Thanks for reply. Could you tell what do you mean by RTFM ctags and 
cscope,??

RTFM - Read The Manual

ctags and cscope - utilities whose manual I think you should read.
 
Creating a new project I think it will have so many errors. We can do 
it only if we know the complete code. If anybody or you have done then 
please forward it to me.
rsync is a POSIX application. It will not compile natively on Windows 
without a considerable porting effort. No such port exists. If you only 
want VC to trace the function flow, it should be able to do that without 
compiling the code (see my rant above). If you want the code to compile 
on VC, I suggest you do the porting.


Shachar
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Why is -e sent to the remote rsync side?

2008-10-07 Thread Shachar Shemesh

Matt McCutchen wrote:

On Mon, 2008-10-06 at 18:01 +0200, Shachar Shemesh wrote:
  
Personally, and this is not something that any shell can solve, I would 
love for a way to limit the files that the --server side rsync allows 
access to.



It's called an rsync daemon.  It can be invoked over ssh; the command to
force in the authorized_keys file is rsync --server --daemon . .

Matt

  
Just to save others from going over the man page looking for how to 
cause the client side to do this - you say use a daemon (i.e. - 
specify the remote side using ::) but also give the -e option.


Thanks, Matt and Wayne. You've been a great help.

Shachar
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Backup Microsoft Exchange

2008-10-07 Thread Shachar Shemesh

Steve Zemlicka wrote:

Thanks Julian and Brad, I will give ntbackup a shot.  I've used
rsyncrypto but I'm not a huge fan.

Off topic, but as the author I'd love to hear why.

  I don't need the files to be
encrypted except during transit which can be done with just rsync,
right?
  
Yes. Do rsync over ssh or run a daemon over SSL. Rsyncrypto is not 
needed for in-transit encryption, only for storage encryption.


Also, whether you use rsyncrypto or not, you can delete the temporary 
files (the ntbackup export and the rsyncrypto encrypted file) after you 
rsync them. They will be created again when you repeat the operation. If 
you are using rsyncrypto, make sure to not delete the symmetric key file 
(68 bytes), so that the result will be rsyncable.


Shachar
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Why is -e sent to the remote rsync side?

2008-10-06 Thread Shachar Shemesh

Wayne Davison wrote:

On Sun, Oct 05, 2008 at 06:47:47AM +0200, Shachar Shemesh wrote:
  

The reason this is brought up is because I'm using rssh
(http://www.pizzashack.org/rssh/) as the user's shell to limit that
user to only be allowed to run rsync.



I looked at the source, and created a patch to make it just require the
--server option as the first option.

While I was looking at the code, I noticed that the check_command()
function was busted in that it would accept any abbreviated path of a
command (e.g. /usr/bin/rs would match /usr/bin/rsync).  The author
apparently didn't know that strncmp() stops at a null (unlike memcmp()),
so the length-trimming that is done can just be removed.  My patch fixes
that too.
  
Last I talked to the rssh maintainer (about a couple of years ago) I was 
so frustrated with the attitude that I decided to only use rssh until I 
knock something better together myself. He (used to) care about scp and 
sftp, and little else. You can send the patch over, if you're feeling 
lucky. I doubt I'll bother. The only reason I brought the question up 
was that if I am going to be writing something myself, I would need to 
know what to make it enforce.


Personally, and this is not something that any shell can solve, I would 
love for a way to limit the files that the --server side rsync allows 
access to. I can then use a custom shell to pass that command line to 
rsync to ensure it's enforced.

..wayne..
  


Shachar
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Why is -e sent to the remote rsync side?

2008-10-05 Thread Shachar Shemesh
So, I've done some RTFS, and this is what I've got. I'd still love it if 
Wayne could confirm that my understanding of the source is correct.


Shachar Shemesh wrote:

So my questions:
1. Why does rsync need to pass -e to the remote side? After all, the 
connection is already established at that point.
-e when combined with --server means something different than it does 
normally. With --server it is a means for the client to hand over to the 
server the options and command lines it received itself (hard links, 
symbolic link processing etc.) as well as the protocol version used.
2. What does this -e mean? What causes the remote side to really not 
run anything (trying to run .L from the path would be the way I 
would interpret the command at that point - obviously rsync disagrees :-)
The . means protocol 3.0 (with explicit numbers for other numbers. 
i.e. - protocol version 3.1 will be listed as 3.1. The current code 
says protocol 4.0 will also be listed as ., but I'm fairly sure that's 
just a bug that has not manifested yet).


The L means LUTIMES support.

The thing I would like Wayne to confirm is that if the --server option 
is given, the -e option will never cause an application to be run, and 
should thus not be considered dangerous.


Thanks,
Shachar
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Why is -e sent to the remote rsync side?

2008-10-04 Thread Shachar Shemesh

$ rsync -e 'ssh -v' lingnu.com:
OpenSSH_5.1p1 Debian-2, OpenSSL 0.9.8g 19 Oct 2007
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Connecting to lingnu.com [199.203.56.105] port 22.
debug1: Connection established.

...

debug1: Sending command: rsync --server --sender -de.L .
As we can see, rsync runs ssh, and tells it to run, on the other side, 
rsync with the -e flag. I am not really sure what and how the . and 
L are parsed by rsync (part of my problem).


The reason this is brought up is because I'm using rssh 
(http://www.pizzashack.org/rssh/) as the user's shell to limit that user 
to only be allowed to run rsync. Rssh, however, prevent the passing of 
the -e option to rsync, as it claims (with some amount of 
justification) that this option allows someone to cause rsync to run any 
command at all, escaping the limitations imposed by rssh.


So my questions:
1. Why does rsync need to pass -e to the remote side? After all, the 
connection is already established at that point.
2. What does this -e mean? What causes the remote side to really not run 
anything (trying to run .L from the path would be the way I would 
interpret the command at that point - obviously rsync disagrees :-)


Thanks,
Shachar
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: INCLUDE/EXCLUDE PATTERN RULES problem on MAC OS

2008-09-27 Thread Shachar Shemesh

Matt McCutchen wrote:


(since rsync does a binary comparison).

rsync as well as the Unix kernel, typically.

I have implemented i18n support in several programs before, I am working 
on a draft for BiDi text editing, and I had to look up what 
decomposition means. If that's the case, I doubt we can trust a sane 
(vs., e.g., myself) user to get it right.


I think the right thing is for rsync to have, #ifdefed on a Mac build, a 
decomposition algorithm for the exclude/include files. Please note that 
while HFS insists on decomposed characters, there is no such requirement 
from plain text files. For all we know, a file typed on Mac may still 
fail to match the file names.


Shachar

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Can the rsync password be automated?

2008-08-18 Thread Shachar Shemesh

Shane Uys wrote:


Is there a way to automate the rsync password or maybe disable? I am 
currently running rsync from a Windows command prompt and would like 
to run it from a .bat file. I have read through the config man pages 
but not sure if my ssh_config file is even being used. I tried 
passwordauthentication = no but it still asked for password. I have 
seen a option for --password-file= but I believe this does not apply 
in that I’m using “ssh” instead of daemon. I am using copssh and 
cwrsync on two Windows 2003 servers over the internet. Here is the 
command line used that transfers a single file. Rsync –e “ssh” file1.x 
[EMAIL PROTECTED]:   followed by the password prompt.


Thanks, Shane

 

The official and recommended way of solving this issue is to perform 
public key authentication with the ssh server. You are right that the 
--password-file option does not work when running rsync over ssh. Public 
key authentication solves your problem, and does not significantly 
reduce the security of your system.


There is another option, but only go that route if you have tried 
setting up public key authentication and failed for a reason over which 
you have no control. If your server supports public key authentication, 
do not continue to the next option. Only consider it if the 
administrator for the server to which you want to connect has disabled 
public key authentication and cannot be persuaded to change her mind.


There is a tool called sshpass. It is available at 
http://sourceforge.net/projects/sshpass/. Read about it at 
http://www.debianadmin.com/sshpass-non-interactive-ssh-password-authentication.html


Shachar
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: memory usage in rsync 3.0.3 -- how much RAM should I have to transfer 13 million files?

2008-08-12 Thread Shachar Shemesh

Aleksey Tsalolikhin wrote:


I've upgraded from rsync 2.6.9 to 3.0.3 on both ends, but memory usage
is still too high.
Why should rsync 3's memory usage depend on the number of files? Does it 
keep files it already knows should not be transferred in memory?


If not, then maybe we should hold back rsync's very useful, very speed 
productive, read ahead of the file list. If we see that the todo list 
piles up, maybe we should hold of the continued scan until the back log 
gets smaller.


Yes, I know, it's the typical someone sitting on the fence, hardly ever 
doing anything useful for the project, and dispensing invaluable 
advice. Fact is, I need this. If Wayne doesn't do it, I will get around 
to it eventually. The problem is that the key word here is eventually.


Shachar
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Large file - match process taking days

2008-07-30 Thread Shachar Shemesh

Rob Bosch wrote:

I've been trying to figure out why some large files are taking a long time
to rsync (80GB file).  With this file, the match process is taking days.
I've added logging to verbose level 4.  The output from match.c is at the
point where it is writing out the potential match at message.  In a 9 hour
period the match verbiage has changed from:

  
Can you tell where the bottleneck is? Is it on the sender's CPU? The 
receiver's? The network? Local IO on either sides?

I believe this means that 4.8GB of the file has been processed in this 9
hour period?  Blocksize is currently manually set at 1149728, 4 times the
default value. 
Rsync does have some CPU inefficient behavior for especially large 
files. However, it should not happen at the block size you are using 
(assuming the files are fairly identical). Try increasing it a little 
further, to 1638400 (80% utilization on the hash table), and see if 
things are any better.


Are the files fairly identical?

Shachar
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Large file - match process taking days

2008-07-30 Thread Shachar Shemesh

Rob Bosch wrote:

The files are very similar, a maximum of about 5GB of data differences over
80GB.  The CPU on both sides is low (3-5 percent) and the memory usage is
low (11MB on the client, not sure on the server).

Full rsync options are:

-ruityz --partial --partial-dir=.rsync-partial --links --ignore-case
--preallocate --ignore-errors --stats --del --block-size=1149728 -I 


I'm using the -I option to force a full sync since date/time changes on
database files is not a reliable measure of changes.

I'll try the block-size at 1638400 although I have not seen a big change in
moving it from about 287000 (default square root) to 1149728.
  

You wouldn't. If CPU utilization is low, this is not the problem.

What about network utilization? What does ntop have to say?
What about disk utilization? I'm not sure what the best way to measure 
it would be (though munin does a good job of it)


Shachar
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Update the rsyncyrypto home page

2008-06-15 Thread Shachar Shemesh

Hi Wayne, or whoever it is that manages the rsync web site

The rsync resources (http://samba.anu.edu.au/rsync/resources.html) 
points to a project of mine, rsyncrypto, as a rsync friendly encryption. 
Rsyncrypto now has a proper home page, and I would appreciate it if the 
link could be updated. The new address is http://rsyncrypto.lingnu.com.


Thanks,
Shachar
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: compression of source and target files

2007-09-22 Thread Shachar Shemesh
Kenneth Simpson wrote:
 Chuck Wolber wrote:
   
 On Fri, 21 Sep 2007, Kenneth Simpson wrote:

   
 
 Hi - there's a flag for rsync to compress the files in transit - is it 
 possible to compress one side (target) with gzip and have rsync still 
 work correctly?
 
   
 It'll still work correctly, but compressing a compressed file can actually 
 make it slightly bigger and wastes CPU cycles in the process.

 ..Chuck..

   
 
  Sorry, I neglected to mention the source is uncompressed but
 we need to compress the target file because we're running out
 of disk space and the files are highly compressible.


 We can't compress the source since the files are large and
 compressing the source would create other problems.
  
 The original thought was to use a file system with compression (I
 think Linux has such a beast) but this would at least require a
 kernel rebuild which we won't be able to do for awhile.

 The second thought was that we might be able to gzip on the fly
 and have rsync work correctly (since it's compressing them in
 transit.)
   
gzip, as is, will destroy rsync's ability to sync partial file changes.
Gzip does have, however, a patch that adds a rsyncable option to the
command line, that makes the compressed output rsync ready. The only
problem I see with your suggestion is that, as far as I know, rsync
cannot sync a stream of data to a file. Do have a look at librsync,
however, which reportedly can do that.

Shachar


   

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Extremely poor rsync performance on very large files (near 100GB and larger)

2007-01-12 Thread Shachar Shemesh
Evan Harris wrote:
 Would it make more sense just to make rsync pick a more sane blocksize
 for very large files?  I say that without knowing how rsync selects
 the blocksize, but I'm assuming that if a 65k entry hash table is
 getting overloaded, it must be using something way too small.
rsync picks a block size that is the square root of the file size. As I
didn't write this code, I can safely say that it seems like a very good
compromise between too small block sizes (too many hash lookups) and too
large blocksizes (decreased chance of finding matches).
 Should it be scaling the blocksize with a power-of-2 algorithm rather
 than the hash table (based on filesize)?
If Wayne intends to make the hash size a power of 2, maybe selecting
block sizes that are smaller will make sense. We'll see how 3.0 comes along.
 I haven't tested to see if that would work.  Will -B accept a value of
 something large like 16meg?
It should. That's about 10 times the block size you need in order to not
overflow the hash table, though, so a block size of 2MB would seem more
appropriate to me for a file size of 100GB.
   At my data rates, that's about a half a second of network bandwidth,
 and seems entirely reasonable.
 Evan
I would just like to note that since I submitted the large hash table
patch, I have seen no feedback on anyone actually testing it. If you can
compile a patched rsync and report how it goes, that would be very
valuable to me.

Shachar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Compressed destination files

2006-11-30 Thread Shachar Shemesh
Matt McCutchen wrote:

 Currently, the only way to make rsync do this is with the experimental
 patch source-filter_dest-filter.diff, which is distributed in
 patches/ in the rsync source package.  If you compile a custom
 version of rsync containing this patch, you can specify bzip2 as the
 source or destination filter.  Read the top of the patch for more
 information.  The patch is only a first attempt, so you might not want
 to trust it with your backups yet.

 Matt
Just one more important note. If you are using rsync over the wire (as
opposed to synching local folders), gzip with rsyncable is preferable
to bzip, as it does not obliterate rsync's wire efficiency.

Shachar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Saving ownership as non-root

2006-08-17 Thread Shachar Shemesh
Paul Slootman wrote:

 Hence although it would look like you could use rsync to backup device
 nodes and so on via fakeroot, as soon as the fakeroot session is ended,
 the information is gone. There is some support for persistent storage of
 the fake info, but that's not perfect; I wouldn't rely on it for _my_
 backups.
   
I, obviously, cannot argue with what you will or will not do for your
backups. For one of my projects, I created a wrapper around fakeroot
that makes it persistent, and even allows it to be used by several
independently launched processes simultaneously. The script is far from
perfect, and needs lots of tweaking UI wise. The main problem is that it
takes the directory from which the fake script was launched as an
indication where to store the persistent state. Otherwise, it seems to
work fairly flawlessly for me. The script is attached to this mail.

I took the liberty of CCing fakeroot's author on this mail, to notify
him of the existence of this thread.
 Additionally it would be a nice idea to refer to fakeroot from the
 rsync manual. - It took me a day to find that out. And am still looking
 for alternatives... Anyone?
 

 the mention in the manual would have to be pretty explicit about the
 caveats.
   
There is only two caveat that I encountered (aside from the obvious one,
that files do not look right when viewed not from within fakeroot). This
is after fairly extensive use of fakeroot, quite outside its original
intended use pattern.

The first is that the killfaked script must be run in order for the
state information to be stored to disk. This is not a major problem,
usually, as all killfaked does is to kill the faked daemon gracefully.
Any normal session exit will, effectively, do the same. We can also rig
the script used for rsync to make sure the state is stored at the end of
the rsync session.

The second is that a directory handled by fakeroot must not be
manipulated without it, or strange things will happen. Simple moves and
renames inside the directory structure are currently ok, but any
permission change, as well as files being deleted or created, may result
in extremely strange looking files.
 Paul Slootman
   
Shachar
#!/bin/sh
# Run fakeroot with persistent storage of information
# Copyright (C) 2005, 2006 by Shachar Shemesh
#
#   This program is free software; you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation; either version 2 of the License, or
#   (at your option) any later version.
#
#This program is distributed in the hope that it will be useful,
#but WITHOUT ANY WARRANTY; without even the implied warranty of
#MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#GNU General Public License for more details.
#
#You should have received a copy of the GNU General Public License
#along with this program; if not, write to the Free Software
#Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
#
# $Id: fake,v 1.7 2005/09/20 15:15:15 sun Exp $

set -e

statedir=`dirname $0`
statefile=$statedir/.fakerootenv
keyfile=/tmp/fakedkey_`whoami`_`ls -id $statedir | cut -d ' ' -f 1`


if [ ! -f $keyfile ] || [ ! -d /proc/`cut -d : -f 2  $keyfile` ] ||
! ( readlink /proc/`cut -d : -f 2 $keyfile`/exe | grep -q '/faked-sysv$' )
then
echo Starting fakeroot daemon
touch $statefile
/usr/bin/faked-sysv --save-file $statefile --load  $statefile  $keyfile
fi

FAKEROOTKEY=`cut -d: -f1 $keyfile`
LD_LIBRARY_PATH=/usr/lib/libfakeroot
LD_PRELOAD=libfakeroot-sysv.so.0
export FAKEROOTKEY LD_LIBRARY_PATH LD_PRELOAD

exec $@
#!/bin/sh
# Kill the persistent faked daemon created by previous calls to fake, and 
save the persistent data
# Copyright (C) 2005, 2006 by Shachar Shemesh
#
#   This program is free software; you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation; either version 2 of the License, or
#   (at your option) any later version.
#
#This program is distributed in the hope that it will be useful,
#but WITHOUT ANY WARRANTY; without even the implied warranty of
#MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#GNU General Public License for more details.
#
#You should have received a copy of the GNU General Public License
#along with this program; if not, write to the Free Software
#Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
#
# $Id: killfaked,v 1.4 2005/09/19 15:36:06 sun Exp $

set -e

statedir=`dirname $0`
keyfile=/tmp/fakedkey_`whoami`_`ls -id $statedir | cut -d ' ' -f 1`

if [ -f $keyfile ]  [ -d /proc/`cut -d : -f 2  $keyfile` ] 
( readlink /proc/`cut -d : -f 2 $keyfile`/exe | grep -q '/faked-sysv$' )
then
kill `cut -d : -f 2 $keyfile`
rm $keyfile
else
echo faked not running
rm -f $keyfile
fi
-- 
To unsubscribe or change options

Re: Data Encryption

2006-06-13 Thread Shachar Shemesh
Brad Farrell wrote:

 Hi there

  

 Is there a way with rsync to encrypt data at the source before
 transmitting?  Not talking about the actually transmission, but the
 data itself.  I’ve got a few department heads that want their data
 secured before it leaves their computer so that no one in the office
 can access the data except for them.

Rsync does not encrypt the files in a way that is impossible for the
receiving machine to decrypt. There is no way (that I know) to integrate
that seemlessly into the process.


What you can do, however, is encrypt the files, and then run rsync on
the encrypted result. Touting my own horn here, have a look at
rsyncrypto (http://sf.net/projects/rsyncrypto) for an encryption scheme
that does not totally destroy rsync's wire efficiency.

  

 Thanks.

  

 Brad Farrell

 Brevell Consulting

 ph: 403-279-6380

 fx: 403-568-2112

  

Shachar


-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsync 4TB datafiles...?

2006-05-02 Thread Shachar Shemesh
lsk wrote:

Hello Shachar...is 2.6.7 is the latest version of rsync. I could see
in the http download site it says rsync-2.6.8.tar.gz. Should I get this
version 2.6.8 + the patch dynamic_hash.diff. 
  

Yes. In the over a month that passed since the email I sent a new
version of rsync was released :-)

Dynamic_hash.diff is available in that one too.

Also I am planning to install in only the sending machine...and first try
out.
  

Should work.

Thanks for your feedback.
lsk.

   Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Encryption

2006-04-18 Thread Shachar Shemesh
Julian Pace Ross wrote:

 Thanks everyone for your feedback.
 Seems to me that Alex explained the issue with this perfectly.

I'm afraid that Alex's explanation does not take into account
rsyncrypto's algorithm. If you encrypt two versions of a file, changed
in the first bit of the file between them, using rsyncrypto, they will
start out totally different. However, some time into the file (between
4KB and 16KB, depending on several factors) the files will resume to be
identical, thus allowing rsync to work on them efficiently.

 I downloaded it and spent a few minutes trying to make it work, but I
 didnt manage yet. (docmentation is a bit terse).

The man page for the latest version has examples designed to get you
started as fast as possible. I'll grant you that there is no easy way to
read the manual page if you are on Windows, though.

 Assuming that it works fine, and that it encrypts only changed files
 (thus addressing to some extent the scalability issue mentioned by
 Alex), this would pretty much solve the problem, assuming that one has
 enough harddisk space on the client side for an encrypted copy of the
 data to be backed up.

Yes, you do need a second copy on the client side. The files are
compressed prior to being encrypted, so it is, hopefully, not as big as
the original.

 However I'm worried that rsyncrypto, although a great idea, is very
 much a work in progress and still shaky... I may be wrong...Anyone
 used it?

Well, I do, obviously (I'm the one who wrote it, after all). I think the
technology is fairly sound at this stage. There are still features I'd
like to see implemented, as well as various optimizations.

Let's put it this way. My company (http://lingnu.com) bases a commercial
backup service on this technology.

 I would be tempted to try and merge the rsyncrypto source within rsync
 and add a command line argument... that would be idealoh well just
 a thought...

Others have tried before you. They tried to pipe the rsyncrypto output
to librsync based program that does a pipe rsync. At the moment,
rsyncrypto cannot write the output file in a one pass way, which means
its output cannot be piped. This may be solveable, but I have not gotten
around to it just yet. There are more pressing issues I would like
addressed with it first. Patches are, always, welcome.

 Cheers
 Julian

   Shachar

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Help -- rsync Causing High Load Averages

2006-03-30 Thread Shachar Shemesh
Matt McCutchen wrote:

On Tue, 2006-03-28 at 16:58 -0800, Plugger wrote:
  

We have a server with about 400GB of data that we are trying to backup
with rsync. [...] When it runs,
however, the load averages on the content1 server continue to grow to
the 100s, bringing the server to a practical standstill.



If your individual files are larger than a gigabyte or so, Shachar
Shemesh's dynamic hash patch may improve performance significantly.  I
recommend you try an rsync with that patch.  To build one, extract the
rsync source package, run patch -p1 patches/dynamic_hash.diff,
configure, and make.
  

Just a reminder to everyone that we are still looking for feedback on
whether it is, indeed, effective.

If you compiled rsync with the dynamic_hash patch, and it indeed reduced
the load (or if it didn't), please do report it here.

   Thanks
Shachar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: AIX 5.1 rsync large file

2006-03-25 Thread Shachar Shemesh
[EMAIL PROTECTED] wrote:



Thank you for your response.

I compiled rsync 2.6.7 and installed in and that did the trick.  I don't
know if it had the dynamic_hash patch or not.

If you did not manually apply it, it did not.

  But I think that I was too
impatient previously and the 2.6.4 would have worked had I not killed it.
  

A benchmark would be greatly appreciated. Can you please try compiling
another version of rsync 2.6.7 after you ran the following command from
the source root:
patch -p1  patches/dynamic_hash.diff

And then tell us how the two versions compared?

  Shachar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsync 4TB datafiles...?

2006-03-22 Thread Shachar Shemesh
lsk wrote:

Also I use the rsync version rsync  version 2.6.5  protocol version 29 does
this version include this patch dynamic_hash.diff or do we need to
install it seperately.
  

Sorry. You will need to get the 2.6.7 sources, and then apply the patch
yourself and compile rsync.

Please do report back here your results. This patch is a result of a lot
of theoretical work, but we never got any actual feedback on it.

   Shachar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsync 4TB datafiles...?

2006-03-21 Thread Shachar Shemesh
lsk wrote:

But I have tried various options including --inplace,--no-whole-file etc.,
for last few weeks but all the results show me removing the destination
server oracle datafiles and after that doing an rsync -vz from source is
faster than copying(rsyncing) over the old files that are present in
destination.
  

Please do try applying the patch in patches/dynamic_hash.diff to both
sides (well, it's probably only necessary for the sending machine, but
no matter) and making this check again. This patch is meant to address
precisely your predicament.

  Shachar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: So what to do with Unicode filenames?

2006-03-20 Thread Shachar Shemesh
Stuart Halliday wrote:

An alternative would be to zip the offending files first and name the zip
file something safe, use rsync to transport them and unzip them at the
other end?
  

If you want to maintain rsync's network efficiency, don't use Zip.
Rather, use tar+gzip that has the --rsyncable patch.

   Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: So what to do with Unicode filenames?

2006-03-19 Thread Shachar Shemesh
Stuart Halliday wrote:

As long as each machine is set to its own correct default language
correctly then there isn't a problem I'm aware of.
  

But that's exactly what Georgy is complaining about. No amount of
default locale tricks will help you if some of your files are in Spanish
and others are in Hebrew. If there was a way to get the file names in
UTF-8, you could use rsync still, but it seems that there is no way to
do it.


Pitty, really.


  Shachar

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Question about rsync and BIG mirror

2006-03-06 Thread Shachar Shemesh
Jamie Lokier wrote:

Hmm.  My home directory, on my laptop (a mere 60GB disk), does contain
millions of files, and it takes about 20 minutes to build the list on
a good day.  100Mbps network, but it's I/O bound not network bound.

It looks a lot like the number of files is more significant than the
amount of data at this scale.
  

In fact, I know of at least one place where they don't use rsync because
they don't have enough RAM+SWAP to hold the list of files in memory.

As far as future directions for rsync, I think this is the major place
where rsync needs to become better.

  Shachar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Question about rsync and BIG mirror

2006-03-06 Thread Shachar Shemesh
Jamie Lokier wrote:

While you're there, one little trick I've found that speeds up
scanning large directory hierarchies is to stat() or open() entries in
inode-number order.  For some filesystems it makes no difference, but
for others it reduces the average disk seek time as on many common
filesystems, inode number is related to position on the disk.  In
unusual cases I've seen a factor of 10 improvement, but usually it's
just 1-2.

  

The way I see it, if you got that far, then you don't have any problem
with the size of the file list.

-- Jamie
  


-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Question about rsync and BIG mirror

2006-03-03 Thread Shachar Shemesh
[EMAIL PROTECTED] wrote:

Hello,

  So: each night, from 0:00am to maximum 7:00am, the server will have to
check the 100Go of files and see what files have been modified, then,
upload them to the clients. Each file is around 4MB to 40MB in average. 
  

Are the clients what you call the mirror? Are there several of them?

I would like to know your opinion about this situation:  
 - Should I setup a strong dual CPU computer dedicated to calculate this
whole stuff? 
  

That depends.

 - What about the memory I should install? 
 - Is there any bandwidth used during the checksums computation? Mine is
quite limited.
  

Is that 2 mega BYTE per second or 2 mega BIT per second?

 - I know the client computer will have to check files too; Disk I/O
will be the most used. I think this computer will have NFS mount from a
datacenter computer with a GB LAN card, I wonder it will be enough...
  

Scanning 100GB of data in 7 hours doesn't require that much a disk
bandwidth.

  I'm quite scared of the amount of data to check before synchronise
clients, and how long it will take. To finish shortly, what do YOU
think? Any advices?
  

Here are a few performance characteristics of rsync I think you should
be aware of:
- By default, rsync only checks files that are different between
receiver and sender in timestamp or size. If most files in your archive
did not change at all, you can discard them altogether from your
bandwidth calculations.
- The receiver only does a linear scan of the file, followed by
generating a second file (which MAY require random access of the first
file, if blocks in the file changed order). It's CPU performance
requirements are negligible. This is bad for the case where you have one
mirror source sending out info to many mirrors, as all the CPU load
falls on the single server.
- If your bandwidth is 2 mega BIT per second, you are a bit marginal as
far as transferring 5GB of data in 7 hours. This has nothing to do with
rsync, though. A simple calculation can show you the same result.
Getting full bandwidth for the entire 7 hours will allow you to transfer
6 GB of data.

Thanks,

Johan
  

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Dynamic hash table size (with static has load)

2006-02-26 Thread Shachar Shemesh
Wayne Davison wrote:

http://rsync.samba.org/ftp/unpacked/rsync/patches/dynamic_hash.diff
  

A line of credit would have been nice :-)

One thing this patch does is to (1) leave the array allocated to its
largest size, (2) use realloc() if we need to make it bigger, (3) make
the minimum hash-table size 65537 (a prime).  Some of these decisions
are debatable:

The 1st item makes us more efficient in our malloc calls when sending
large files, but could waste sender-side memory when transferring a
single large file in the middle of a bunch of normal-sized files.
  

With a minimal struct size of 65537, I doubt we would have too many
reallocations. After all, a file needs to be over about 2.5 GB for us to
need a larger array. Such files are rare enough, and the time it takes
to allocate the array is fairly insignificant compared to their total
handling time, that I think we can save the memory when handling smaller
files.

As for point 2 - isn't realloc potentially less efficient than just
malloc if we intend to erase the array's content anyways?

The 3rd item might be a bit over the top, but we used to always allocate
a tag array of 65536 elements, and since I noticed some hash collisions
occurred in small files using a hash-table size of 11 items, I figured
it would be an acceptable overhead to make normal-sized files much more
likely to have no collisions at all.
  

Personally, I agree that a minimum is a good idea.

Comments welcomed.  Thanks again for your patch!
..wayne..
  

Ok. Here are a few comments:

1. I guess it's a matter of taste, but when you want to make sure a type
has enough states to keep count of the number of elements in an array, I
prefer using size_t to int32. It's more upwards compatible.

2. In sum_buf, sum1 is defined to be unsigned. It seems dangerous to
me to hash it into a signed index, even if it's almost guarenteed to be ok.


I'm attaching my proposed patch, incorporating all above comments. I
also did some style suggestions (use sizeof(sum_table[0]) instead of
sizeof(int32), initialize the chain element to -1 instead of 0, as
that's the null value).


Shachar

? .sender.c.swp
? dynamic_hash.patch
? patches/.dynamic_hash.diff.swp
Index: match.c
===
RCS file: /cvsroot/rsync/match.c,v
retrieving revision 1.78
diff -u -r1.78 match.c
--- match.c	24 Feb 2006 16:43:44 -	1.78
+++ match.c	26 Feb 2006 08:31:40 -
@@ -26,11 +26,6 @@
 
 int updating_basis_file;
 
-typedef unsigned short tag;
-
-#define TABLESIZE (116)
-#define NULL_TAG (-1)
-
 static int false_alarms;
 static int tag_hits;
 static int matches;
@@ -42,47 +37,39 @@
 
 extern struct stats stats;
 
-struct target {
-	tag t;
-	int32 i;
-};
-
-static struct target *targets;
-
-static int32 *tag_table;
-
-#define gettag2(s1,s2) (((s1) + (s2))  0x)
-#define gettag(sum) gettag2((sum)0x,(sum)16)
-
-static int compare_targets(struct target *t1,struct target *t2)
-{
-	return (int)t1-t - (int)t2-t;
-}
+static size_t tablesize;
+static int32 *sum_table;
 
+#define gettag2(s1,s2) gettag((s1) + ((s2)16))
+#define gettag(sum) ((sum)%tablesize)
 
 static void build_hash_table(struct sum_struct *s)
 {
 	int32 i;
+	uint32 t;
+	size_t tablealloc=tablesize;
 
-	if (!tag_table)
-		tag_table = new_array(int32, TABLESIZE);
+	/* Dynamically calculate the hash table size so that the hash load
+	 * is always about 80%.  This number must be odd or s2 will not be
+	 * able to span the entire set. */
+
+	tablesize = (s-count/8) * 10 + 11;
+	if (tablesize  65537)
+		tablesize = 65537; /* a prime number */
+	if (tablesize != tablealloc) {
+		free (sum_table);
+		sum_table = new_array(sum_table, uint32, tablesize);
+		if (!sum_table)
+			out_of_memory(build_hash_table);
+	}
 
-	targets = new_array(struct target, s-count);
-	if (!tag_table || !targets)
-		out_of_memory(build_hash_table);
+	memset(sum_table, 0xFF, tablesize * sizeof (sum_table[0]));
 
 	for (i = 0; i  s-count; i++) {
-		targets[i].i = i;
-		targets[i].t = gettag(s-sums[i].sum1);
+		t = gettag(s-sums[i].sum1);
+		s-sums[i].chain = sum_table[t];
+		sum_table[t] = i;
 	}
-
-	qsort(targets,s-count,sizeof(targets[0]),(int (*)())compare_targets);
-
-	for (i = 0; i  TABLESIZE; i++)
-		tag_table[i] = NULL_TAG;
-
-	for (i = s-count; i--  0; )
-		tag_table[targets[i].t] = i;
 }
 
 
@@ -176,20 +163,17 @@
 	}
 
 	do {
-		tag t = gettag2(s1,s2);
+		int32 i;
+		size_t t = gettag2(s1,s2);
 		int done_csum2 = 0;
-		int32 j = tag_table[t];
 
 		if (verbose  4)
 			rprintf(FINFO,offset=%.0f sum=%08x\n,(double)offset,sum);
 
-		if (j == NULL_TAG)
-			goto null_tag;
-
 		sum = (s1  0x) | (s2  16);
 		tag_hits++;
-		do {
-			int32 l, i = targets[j].i;
+		for (i = sum_table[t]; i = 0; i = s-sums[i].chain) {
+			int32 l;
 
 			if (sum != s-sums[i].sum1)
 continue;
@@ -205,9 +189,10 @@
 			 !(s-sums[i].flags  SUMFLG_SAME_OFFSET))
 continue;
 
-			if (verbose  3)
-rprintf(FINFO,potential match at %.0f 

Dynamic hash table size (with static has load)

2006-02-25 Thread Shachar Shemesh
Hi list, and Wayne in particular,


It was almost a year since we had the discussion (with
http://lists.samba.org/archive/rsync/2005-March/011875.html as it's
conclusion) regarding chances for hash collisions and large files. As
now we have someone asking about synching 5TB files, I decided to
actually submit a patch.


Attached is a patch that uses a non-predetermined hash table size, so
that the hash cell load (alpha) is never more than 80%. As far as my
understanding of rsync goes, this requires no change in the rsync protocol.


Comments welcome,


Shachar

? .match.c.swp
? dynamic_hash.patch
Index: match.c
===
RCS file: /cvsroot/rsync/match.c,v
retrieving revision 1.78
diff -u -r1.78 match.c
--- match.c	24 Feb 2006 16:43:44 -	1.78
+++ match.c	25 Feb 2006 11:22:12 -
@@ -28,7 +28,6 @@
 
 typedef unsigned short tag;
 
-#define TABLESIZE (116)
 #define NULL_TAG (-1)
 
 static int false_alarms;
@@ -49,10 +48,11 @@
 
 static struct target *targets;
 
+static size_t tablesize;
 static int32 *tag_table;
 
-#define gettag2(s1,s2) (((s1) + (s2))  0x)
-#define gettag(sum) gettag2((sum)0x,(sum)16)
+#define gettag2(s1,s2) gettag((s1) + ((s2)16))
+#define gettag(sum) ((sum)%tablesize)
 
 static int compare_targets(struct target *t1,struct target *t2)
 {
@@ -64,8 +64,14 @@
 {
 	int32 i;
 
-	if (!tag_table)
-		tag_table = new_array(int32, TABLESIZE);
+	/* Dynamically calculate the hash table size so that the hash load
+	 * is always about 80%.
+	 * See http://lists.samba.org/archive/rsync/2005-March/011875.html
+	 */
+	tablesize=(s-count/8)*10+11;
+	
+	free(tag_table);
+	tag_table = new_array(int32, tablesize);
 
 	targets = new_array(struct target, s-count);
 	if (!tag_table || !targets)
@@ -78,7 +84,7 @@
 
 	qsort(targets,s-count,sizeof(targets[0]),(int (*)())compare_targets);
 
-	for (i = 0; i  TABLESIZE; i++)
+	for (i = 0; i  tablesize; i++)
 		tag_table[i] = NULL_TAG;
 
 	for (i = s-count; i--  0; )
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Dynamic hash table size (with static has load)

2006-02-25 Thread Shachar Shemesh
Wayne Davison wrote:

Thanks for the patch!  Here's some comments:

 - You didn't change the size of the tag typedef (an unsigned short),
   and your patch makes the value potentially overflow.
  

Gotcha. I'm sending an amended patch.

 - For smaller hash-table sizes, your algorithm does a lookup in the
   table based only on the s1 value (due to the (s2  16) value being
   too large to have any remainder less than the tablesize).

So, I think this probably needs to leave gettag() calling gettag2(), and
change gettag2() to factor both s1 and s2 into some kind of an improved
tag-generating computation.
  

I disagree.

Let's begin with an example. Suppose that we only have 7 hashes
(s-count=7). 7/8=0. 0*10=0. 0+11=11. Our hash table size is 11, which
is the absolute minimum it will ever get.

Now let's suppose all 7 hashes have 1234 as their lower hash value,
and have the number 1000 through 1006 as their high value. They will be
filed to:
1000:1234 - 4
1001:1234 - 2
1002:1234 - 0
1003:1234 - 9
1004:1234 - 7
1005:1234 - 5
1006:1234 - 3

Obviously, the higher checksum DID get a chance to affect the cell we
land in.

For the more general case, our function (s1+s2*65536)%ts (where ts is
the table size). Modern algebra dictates that this is the same as saying
(s1%ts + (s2%ts) * (65536%ts))%ts. In other words, you can first mod
each element individually, and only then do the actual addition and
subtraction. It's easy to see that s2 will not get nullified ever,
unless 65536%ts is zero. As 65536 is 2^16, and as ts is guarenteed to be
odd, this is impossible.

Venturing deeper into modern algebra, we know it is theoretically
possible that s2 will have some affect on the hash cell chosen, but will
not be able to choose any cell at all. This can be seen in the case of
(s1+s2*15)%9. If s1 is, say, 3, the different s2 values can select cells
3, 6 and 0. This will happen if and only if the factor (15) and the
modulo (9) have a greatest common divisor (gcd - open office calc
actually has a function of that name) which is larger than 1 (3, in this
case). In jargon, we will say that two number that have a gcd of 1 are
coprime.

Since ts is always odd (we multiply a number by 10 and then add 11), it
will always be coprime to 65536 (which is only divided by even numbers).
This means that s2 has as much a chance to select the hash cell we end
up in as s1. I don't think it is necessary to change that aspect of the
code.

I did change the comment in the patch to summarize this point.

..wayne..
  

   Shachar

? dynamic_hash.patch
Index: match.c
===
RCS file: /cvsroot/rsync/match.c,v
retrieving revision 1.78
diff -u -r1.78 match.c
--- match.c	24 Feb 2006 16:43:44 -	1.78
+++ match.c	25 Feb 2006 18:42:05 -
@@ -26,9 +26,8 @@
 
 int updating_basis_file;
 
-typedef unsigned short tag;
+typedef unsigned int32 tag;
 
-#define TABLESIZE (116)
 #define NULL_TAG (-1)
 
 static int false_alarms;
@@ -49,10 +48,11 @@
 
 static struct target *targets;
 
+static size_t tablesize;
 static int32 *tag_table;
 
-#define gettag2(s1,s2) (((s1) + (s2))  0x)
-#define gettag(sum) gettag2((sum)0x,(sum)16)
+#define gettag2(s1,s2) gettag((s1) + ((s2)16))
+#define gettag(sum) ((sum)%tablesize)
 
 static int compare_targets(struct target *t1,struct target *t2)
 {
@@ -64,8 +64,14 @@
 {
 	int32 i;
 
-	if (!tag_table)
-		tag_table = new_array(int32, TABLESIZE);
+	/* Dynamically calculate the hash table size so that the hash load
+	 * is always about 80%.
+	 * This number must be odd or s2 will not be able to span the entire set
+	 */
+	tablesize=(s-count/8)*10+11;
+	
+	free(tag_table);
+	tag_table = new_array(int32, tablesize);
 
 	targets = new_array(struct target, s-count);
 	if (!tag_table || !targets)
@@ -78,7 +84,7 @@
 
 	qsort(targets,s-count,sizeof(targets[0]),(int (*)())compare_targets);
 
-	for (i = 0; i  TABLESIZE; i++)
+	for (i = 0; i  tablesize; i++)
 		tag_table[i] = NULL_TAG;
 
 	for (i = s-count; i--  0; )
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: information on identifying hard links to a file

2006-02-11 Thread Shachar Shemesh
Wayne Davison wrote:

On Thu, Feb 09, 2006 at 03:04:17PM +0100, Paul Slootman wrote:
  

compare inode and device number. When those are the same, the two files
must be hardlinked.



Also, rsync only considers files that have a link count larger than 1
(see stat()'s st_nlink) since this allows it to ignore the vast majority
of files that have only one link into a filesystem.

..wayne..
  

Do we also discard the info once we found file names in a quantity that
matches the link count? This should allow us to dramatically reduce the
memory usage for large transfers.


Example:

Found file foo1 with dev 304, link count of 2 and inode 17. Cache it.

Found file bar1 with dev 304, link count of 3 and inode 18. Cache it.

Found file bar2 with dev 304, link count of 3 and inode 18. Mark it as
link to bar1.

Found file foo2 with dev 304, link count of 2 and inode 17. Mark it as
link to foo1 and remove the link cache (found two matches to a file that
has two links).


   Shachar

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Announcing a new project allowing the use of rsync with ssh in password authentication mode

2006-01-05 Thread Shachar Shemesh
Hi all,


I know the question came up once or twice lately, and as I needed
something similar myself, I actually sat down and wrote it. The project
is called sshpass, and it is available from sourceforge at
http://www.sf.net/projects/sshpass. In a nutshell, it allows
non-interactive use of ssh in password authentication mode.


This warning is repeated in the README, as well as at the project's
summary page, but I'll repeat it here never the less. This is NOT an
ideal solution, security wise. Anyone and everyone are encouraged to use
ssh's public key authentication instead of this little utility. It is
only meant for use in cases where public key authentication is out of
the question, for one reason or another.


Usage example (taken from the man page):

Performing a password based authentication for rsync, with the password
given o nthe command line (sshpass' least secure mode of them all):

rsync --rsh='sshpass -p 12345 ssh -l test' host.example.com:path .


Hoping this proves useful enough for everyone. Sorry about the noise.


  Shachar


-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Help me with this Questiions

2005-11-24 Thread Shachar Shemesh
Harish wrote:

I would like to understand the capabilities of GNU rsync software /
utility. This is used for syncing file systems / file level data across
two systems. I specifically would like to know its capabilities in
syncing files – 

 
1)  How does it replicate data changes to  files – entire file or only
the incremental blocks? 
  

As far as I know, actual changes written to file are replacing the
entire file. Rsync's incremental nature only extends as far as the
network usage goes.

2) Does it have any block level replication capabilities?
  

Rsync is a file utility. It has no awareness of blocks. If you are using
Linux (you are not, I know), check out LVM for what you want. It has
generic snapshots support, which does what you want.

3) Can it replicate files while the file is in open state (oracle redo
log file)? This is particularly a problem in windows environment,
typically. 
  

Not unless the OS supports it. Rsync doesn't even have a native
Windows version at all. It only runs on Windows through an daptive layer
called Cygwin, which brings the Unix semantics to rsync.

  Shachar

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: request: add TCP buffer options to rsync CLI?

2005-11-08 Thread Shachar Shemesh
Wayne Davison wrote:

On Tue, Nov 01, 2005 at 09:55:06PM -0600, Lawrence D. Dunn wrote:
  

is it likely, or routine, or will-take-some-time, (or all-of-the-above),
for that patch to be vetted and integrated into mainline rsync released 
code?



I'm currently leaning towards including this in the next rsync release
unless someone can come up with a reason why it would be a bad idea.

..wayne..
  

I think it's a good idea, so long as a remote user cannot dictate the
options to a running pserver. For all other invocation moethods, an
admin had better use something along the lines of
http://olivier.sessink.nl/jailkit/jk_lsh.8.html to restrict what should
and should not be possible to run on the remote machine.


Also, when I have the time (i.e. - not soon :-( ), I will try to write a
hardening rsync howto, if you'll publish it.


  Shachar

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Link typo in web resources

2005-11-08 Thread Shachar Shemesh
To whoever it is that maintains the web site.


The page at http://samba.anu.edu.au/rsync/resources.html has a link to
the GNU project management page. The link as a space between the
http://; and the host name, which means it cannot be opened.


  Shachar

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: request: add TCP buffer options to rsync CLI?

2005-11-05 Thread Shachar Shemesh
Wayne Davison wrote:

The patch also makes the new option accepted by the daemon's command-
line parser, allowing whomever starts the daemon to override the config
file's socket option settings via the command-line.
  

Care to elaborate on the security implications? What is the potential
for a DoS on someone giving out rsync services to basically untrusted
parties?

  Shachar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: meta data stored in separate file?

2005-08-28 Thread Shachar Shemesh
Joe Pruett wrote:

There's something called backuppc (i think backuppc.sourceforge.net)
which uses some sort of db backend and has multiple possible transports,
rsync is one option.  I think it might do what you're looking for.



interesting tool, but it is not what i need.  it doesn't do acls.  it is a
pull system, rather than push.  this is for an isp setting (which i didn't
mention yet) where colocation customers would push their backups to a
central box.

any other tools out there?
  

I'm currently adding metadata extraction and saving to rsyncrypto. I'm
not sure it is within your use scenario, and in any case, it will not
have Windows ACLs for 0.16 (the coming version).

Just thought I'd put the info in.

  Shachar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: encrypted destination

2005-08-28 Thread Shachar Shemesh
George Georgalis wrote:

In the archives I see the question about encrypted destination and it's
mostly answered with the --source-filter / --dest-filter patch by Kyle
Jones. There are also some proposed updates to the patch.

A lot of these posts 3 years old, is there plans or reasons not to
include them in the main line code?

// George
  

Personally, I solved that problem using a preprocessing program. The
idea was to not share any key data with the destination. If that's
interesting to you, do check out rsyncrypto
(http://sf.net/projects/rsyncrypto).

What it does is to encrypt the files prior to rsyncing them. The twist
is that the files are encrypted in a way that does not obliterate the
wire efficiency of applying rsync to the encrypted files.

  Shachar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Has anyone seen this?

2005-06-11 Thread Shachar Shemesh

http://use.perl.org/~Matts/journal/25138

 Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html

--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: RSYNC doesn't like Unicode?

2005-05-26 Thread Shachar Shemesh

Stuart Halliday wrote:


Paul Slootman said:

 


The common issue seems to be windows systems, as far as I can tell
here.
Perhaps transferring files (or rather, filenames) between windows
systems with differing locales (or language settings) is the problem,
and someone with intimate knowledge of how to manipulate filenames on
windows needs to investigate this. The problem seems to be that there
aren't too many people that fall into that category.
   

As a Wine hacker working on Unicode related tasks, I think I'll pick 
this title up.



I was using Rsync to copy favourites from one english UK XP sp2 machine to a 
Windows 2000 sp4 english UK machine.
No different language settings involved.
 

The important thing here is the codepage used. Rsync is not a Unicode 
application on Windows, and so it's interpretation of the file names is 
dependent on the current codepage. The current codepage is called 
Default locale on Windows 2000 and Codepage for non Unicode 
applications on Windows XP. Either way, it's in the Regional Options 
control panel applet, it's global to the computer, and requires 
administrator privilege and a reboot to change.


Please check what your settings are on both computers, and let us know.


It just so happened that I had placed in my favourites some URLs with a few 
European characters in their name.

Something on Windows is tripping up Rsync that's for certain.
 

Theoretically, turning rsync into a unicode app on Windows could solve 
these issues. I doubt it will actually work, however. It is highly 
likely to create more problems than it solves, but let's try and find 
out what the current problems are before we try to think of a solution.


 Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
http://www.lingnu.com/

--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Close list to outsider's posts?

2005-05-20 Thread Shachar Shemesh
Hi
I'm assuming that Wayne is the obvious destination for this request. Can 
we make the mailing list reject emails from non-subscribers? This would 
drastically reduce the amount of spam we receive.

Thanks,
  Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Encryption

2005-05-13 Thread Shachar Shemesh
Gary Holzer wrote:
Hi All,
I am using rsync to backup our office server to our Internet server (RHE).
As an association for doctors we are looking at providing  a backup service
for their practices using rsync. As it would be patient data it would need
to be encrypted. I have found a few options, namely
esync
wurt
rsyncrypto
Does anyone have experience with the above and perhaps like to recommend
one? On the client side we are on Windows boxes using cygwin.
Thanks
 

I am (as you know) the maintainer for rsyncrypto. I looked a little into 
esync (a while back, I'm not sure I fully remember the differences, 
though). I have no idea what wurt is, so a link would be greatly 
appreciated.

The main difference between rsyncrypto and esync is in the amount of 
state information stored between operations. With rsyncrypto, this is a 
mere 52 bytes, containing the initial value for the CBC, the symmetric 
encryption key for the file, as well as three parameters used to 
determine CBC resets. This information is enough to make a repeated 
encryption of the same file (modified or not) identical enough to the 
original that rsync will manage to pick up just the differences.  This 
52 byte file is fully recoverable from the encrypted file, if you have 
the assymetric private key.

Esync, assuming I understood it correctly, actually requires keeping 
around enough information about the properties of the reset points (it 
uses a completely different algorithm). On first reading the esync 
algorithm sounded like one having a cryptographic weakness, but:
1. It was a long time ago, and I don't remember the details.
2. On second reading I remember thinking that the hole was plugged after 
all, at the expense of performance.
3. I cannot be said to be impartial, being as I maintain a competing 
technology.

Also with esync:
- You need a custom version of rsync on both ends.
- May be relevant for you - there is no Debian package :-)
Bear in mind that any manipulation to an encryption system to make it 
rsync friendly means that we are weakening it. This is obviously true 
for rsyncrypto too. Myself, I'm fairly confident that the weakening is 
nothing to be worried about, but do bear that in mind. This is stepping 
off the trodden path, a cryptographic risk, in exchange for better 
network performance.

As for experience, rsyncrypto is part of a commercial backup service my 
company is running, so you can say I have some experience with it, yes :-).

Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Spam to this list

2005-04-19 Thread Shachar Shemesh
Alun wrote:
Shachar Shemesh [EMAIL PROTECTED] said, in message
[EMAIL PROTECTED]:
 

Reject codes were very common once. Then they were recommended
against.  They were recommended against for a reason, that reason
being that they  expose the user base to password and other guessing.
   

Who recommended this?!
 

I'm replying off list, to tune down the noise.
  Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Spam to this list

2005-04-18 Thread Shachar Shemesh
John E. Malmberg wrote:
The I.P. address is listed in bl.spamcop.net as hitting spamtraps.
Just so you know, spamcop view bounces as spam. According to them, you 
should never send bounces. I believe the right approach is to convince 
admins to drop spamcop from their RBL list, rather than remove the very 
essential NACK SMTP has from all servers, as per spamcop's request.

-John
[EMAIL PROTECTED]
Personal Opinion Only
Same here.
 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Spam to this list

2005-04-18 Thread Shachar Shemesh
John E. Malmberg wrote:
The essential SMTP NACK is not what is the problem as long as it is 
done during the SMTP connection using reject codes.  Issuing a SMTP 
reject code for undeliverable messages will never cause a spamcop.net 
listing.
Reject codes were very common once. Then they were recommended against. 
They were recommended against for a reason, that reason being that they 
expose the user base to password and other guessing.

When Spamcop was confronted with spammers harvesting email using 
rejection codes, Julian responded with the laughable I don't know of 
spammers who do that. What?

Not to mention the fact that secondary MXes are impossible to reject 
during SMTP, as are virtual domains (for all practical purposes), later 
filters, and many many many other cases.

Julian's solution is either don't provide NACK or hold the original 
SMTP until you know what to reply. I'm sorry, but both answers are 
laughably sad, and effectively mean the end of SMTP.

I know, it's bad to be bombarded with bounces. I've been there myself. 
Destroying the reliability of SMTP for this high cause, however, is 
something I cannot abide by. I have heard of enough cases where 
important emails vanished without leaving a trace to consider this a 
trivial or unimportant problem.

The SMTP bounce is an artifact from the time when third party open 
relays where also in common use.  At that time, it was needed by the 
third party open relay to return the non-delivery message.
No. See above. I won't mention qmail again, because Julian seems to 
not mind the fact that it's the only safe MTA around, but the simple 
fact is that any time you need to perform processing in order to accept 
or reject an email, you need to accept the mail and then decide. Keeping 
a TCP connection open just so you can put in a reject code in the 
protocol opens you up for DoS, as well as threaten the very delivery due 
to timeouts.

And, you have not mentioned secondary MXes and downed networks yet.
Now, almost no mail servers will accept e-mail from known open relays, 
so when they can not deliver an e-mail, if they use an SMTP reject 
code, then the sender's mail server, which should trust the sender 
will generate the bounce message.
It's a great theory. Too bad it doesn't cover all cases.
If these bounces from the sender's mail server are going to forged 
addresses, then there is a security problem on the sending network 
that needs to be fixed.
No, there is a bandwidth problem. I agree that it's a problem, but I 
totally disagree with the solution.

And since medium to large networks pay a metered rate for their 
internet connection, bouncing instead of using SMTP rejects will 
significantly increase their operating costs as it will cause them to 
pay for the bandwidth for 6 spam/virus e-mails for every 1 real e-mail 
that they receive.  Using SMTP rejects and DNSbls eliminates almost 
all of that cost from their operation.
I don't see the difference. One way or the other, SMTP is shot. If 
someone shoots down a protocol I need, I call him the enemy of the 
public. So far it has been spammers. Now it's spammers and Spamcop.

My mail server doesn't bounce viruses. The reason is that I can detect 
viruses with close to 0% false positives, so I feel fairly confident in 
sending them to /dev/null. Unfortunately, spam does not enjoy this rate 
of false positives. What's more, even if it did, the occasional false 
negative would mean that I would still get blacklisted.

Look, I chose a difficult to understand name for my company (Lingnu). As 
a result, many times, if I'm telling someone my email address over the 
phone, they'll get it wrong. It didn't used to be a problem. If one in 
four got it wrong, they would get a bounce and call me. Not any more. 
One of the domains around mine sends all incoming email to /dev/null, 
and people are mad at me for not responding to my emails. Do tell me 
that this is ok with you, or that you don't think that SMTP loses a lot 
from it's functionality (even more so than because of Spam) as a result.

Again, this is without bringing qmail into the picture. Qmail, as a 
direct result of a design that keeps security in mind, cannot send 
rejections inline. The daemon accepting the mail simply doesn't know 
what's behind it. It's an unprivileged drown that take the emails and 
queue them, not having any idea what will happen to them afterwards. In 
an environment where spammers exploit security holes to infect computers 
with spam sending zombies, telling an MTA admin to switch to something 
less secure because you don't like something defined by the RFC is 
counter-productive and does more to hurt spam fighting than to help it.

Now, this is getting off topic for rsync, so please do feel free to send 
me your reply privately.

 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https

Re: Max filesize for rsync?

2005-04-14 Thread Shachar Shemesh
[EMAIL PROTECTED] wrote:
I would be very happy to test any patches. (Assorted RedHat/Fedora i386)
(Assume I am a total newbie, much safer that way)
A few very large files regularly rsync'd in production.
Seems like it sometimes gets somewhat stuck in the middle of something
large.
(The rsync is mostly staging area to staging area.
Plenty of redundancy, so I'm unlikely to get hurt if I'm aware of problems.
The way the targets are used, I will know about problems before any real
damage is done.)
The more important of the transfers are over occasionally very bad internet
connections, so I'm pretty much in the situation of something to gain,
nothing to lose.
-rw-rw1 27   27   1187270120 Apr 13 03:24
/home/rsync-sjs/mysql/sjs/dwf.MYD
-rw-rw1 27   27   1098515060 Apr  8 07:34
/home/rsync-sjs/mysql/srvs/dwf.MYD
-rw-rw1 mysqlmysql840374964 Apr 12 12:59
/home/rsync-2bb/mysql/srv/dwf.MYD
-rw-rw1 mysqlmysql520216980 Apr 12 20:54
/home/rsync-2bb/mysql/map/dwf.MYD
-rw-rw1 mysqlmysql221876208 Apr 11 14:21
/home/rsync-2bb/mysql/ecp/dwf.MYD
 

Actually, rsync's implementation only starts to give suboptimal results 
when we pass the 2.5GB area. Up until there one should see no noticeable 
difference in performance (well, my patch can allow using somewhat less 
memory for those cases, but no one would really miss 256KB of RAM these 
days, and I'm not sure how much impact the CPU cache effect is going 
to have in rsync's case. I guess that some effect will be seen anyways, 
but not as noticeable).

The performance bottleneck is due to hash table buckets load. Optimal 
load, as taught at computer science, is 80% (or 0.8). Your smallest file 
puts a load of 23%, while your largest file has a load of 53%. You 
shouldn't see any problems when using rsync (at least, not the type of 
problem I'm talking about).

 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Max filesize for rsync?

2005-04-14 Thread Shachar Shemesh
Jeff Schoby wrote:
Well, I got -further- by changing the fsize= to -1 in
/etc/security/limits on my AIX boxes, 
but rsync ultimately still did not like my 15GB file I wanted to
transfer. 
 

What does doesn't like mean? Does it freeze with too much CPU usage?
Had to resort to good ol' plain vanilla ftp.  
 

How long does transferring the file via ftp take?
A 15GB file puts a load of 194% on the hash table. This means that, *on 
average*, each lookup will have to scan two areas. I'm not sure whether 
that should have created a huge slowdown or not.

Try rsyncing that file with the --block-size=524288 (i.e. - a 0.5MB 
block size), and please report whether rsync's behavior had improved, 
and in particular, how does it rate against vanilla ftp.

Thanks,
 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Max filesize for rsync?

2005-04-13 Thread Shachar Shemesh
Jeff Schoby wrote:
What the maximum filesize rsync can transfer?
I'm trying to rsync one of my servers to another but the rsync is
croaking on a file that's barely 1GB.  

Tips, hints, suggestions?
rsync server is AIX 4.3.3 ML11 - rsync 2.6.3
rsync client is AIX 5.3 ML1 - rsync 2.6.4
Thanks
-Jeff
 

Please note that, all file size OS limitations aside, rsync has 
suboptimal performance for too big files. When I get around to it I'll 
try to create a patch, but in the mean while too big files will have too 
many non-real hash table collisions, and may become extremely slow.

If you run across this problem, please post on list, as we need someone 
to experience this problem in order to try and fix it.

 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Always exitcode 256 under Cygwin with rsync 2.6.4

2005-04-04 Thread Shachar Shemesh
Wayne Davison wrote:
On Mon, Apr 04, 2005 at 07:28:02AM +0200, Joost van den Broek wrote:
 

When you just give an empty rsync command, it should also exit with an
exit code (1). But the errorlevel gets set to no. 256 instead.
   

As mentioned in the other message that brought this up, I assume that
this is something wrong with the cygwin version (perhaps in how it was
compiled?).  Rsync is exiting with all the right codes under Linux.
..wayne..
 

It is curious to note that, under Posix, it is impossible to exit with 
return code 256, as the return code is an 8 bit value.

 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Last plug - rsyncrypto independant devel mailing list

2005-03-11 Thread Shachar Shemesh
Hi all,
A while back I announced Rsyncrypto, a rsync friendly encryption 
system. With version 0.11 just out, and proving reasonably usable in and 
on its own right, we now have an independent mailing list discussing 
just rsyncrypto. I have therefor allowed myself this one last notice to 
this list. All further rsyncrypto related announcements will go to 
[EMAIL PROTECTED] 
(http://lists.sourceforge.net/lists/listinfo/rsyncrypto-devel for the 
subscription page).

That is, unless Wayne tells me that it's ok by him to announce major 
milestones here as well :-)

Thanks, and sorry about the noise.
 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsyncing really large files

2005-03-09 Thread Shachar Shemesh
Kevin Day wrote:
Shachar-
 
I think Wayne is mostly pointing you to the correct location here.  If 
you look at the code where the checksum is computer, you'll find that 
the rolling checksum actually consists of two parts - one is the sum 
of all of the byte values in the window, the other is the offset 
weighted sum of all of the byte values in the window.  The first of 
these is then left shifted 16 bits and added to the other to come up 
with the official 32 bit rolling checksum.  This works fine as long 
as you aren't counting on a random distribution of bits among the 32 - 
if you mod the value, you are giving much greater importance to the 
lower XX bits, effectively dropping the distribution of the high order 
bits...
 
Anyway, the two 16 bit values may be random enough that my concern is 
not founded, but it should be tested before assuming that the rolling 
checksum is really a 32 bit value that can easily be divided up into 
buckets.
 
PS - none of the above has anything to do with the strong signature of 
the window - just the rolling check sum.
 
Cheers!
 
- K
Some modern algebra for you, then.
We have two numbers. One is always multiplied by 2^16. We want both 
numbers to be able to totally affect the bucket into which the eventual 
checksum arrives. If we will choose a hash table size that is co-prime 
to the bits we want to remain significant, then we achieve that.

Well, guess what? Factorization of any and each bit in the combined 
checksums yields only twos. In other words, any hash table size that 
will be odd (i.e. - two is not in it's factorization primes) will be 
co-prime to our shifted checksums, thus promising that they will get an 
equal chance of affecting what bucket our checksum actually falls into.

It therefor follows that I have to amend my previously proposed hash 
table size choosing formula. The new formula is:
(numblocks/8+1)*10+1
And you're done. Of course, this can also be written as:
(numblocks/8)*10+11
Which is slightly more efficient.

 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsyncing really large files

2005-03-05 Thread Shachar Shemesh
Lars Karlslund wrote:
And I'm suggesting making it static, by adjusting the hash table's size 
according to the number of blocks. Just do 
hashtablesize=(numblocks/8+1)*10;, and you should be set.
   

Or maybe it should really be dynamic.
I'm talking about the hash table load. I.e. - the ratio between the 
number of buckets the table has, and the number of blocks that go in it. 
This is almost unrelated to your problem.

I adjusted the block-size as an experiment, as I read somewhere about 
the default blocksize of 700 bytes. Now I'm told the blocksize is 
calculated automatically. Which is it?
According to my extremely non-official reading of the source code - 
dynamic. Do try to lose the parameter and see how things are doing. 
Also, try setting it really high, say, 50MB, and tell us how things go 
then. This is just so we find out where the bottleneck is in your case.

But hey, I can run all the tests you want. Just tell me what to do.
See previous paragraph. Comparative numbers of block sizes of:
always transfer.
Block sizes of 64k (as you have been doing so far)
Default block sizes (about 700K, according to my calculations).
50MB block sizes.
Also, knowing the CPU and network load of each solution would be very 
beneficial.

Okay, okay, my mistake. Should I just remove the parameter altogether?
Would probably be better, yes.
Keyin, I'm trying to make rsync better. Lars' problem is an opportunity 
to find a potential bottleneck. Trying to solve his use of possibly 
   

Well, its probably a non-standard situation for rsync anyway.
The fact that your setup is highly likely non-optimal does not mean that 
rsync cannot be made even better.

non-optimal values won't help rsync, though it will help him. Let's keep 
   

Well, me either, as the rsync job processes both this gigantic file 
and other smaller ones.
If you don't specify block sizes, this should not be a problem.
Whoa, it that the subject? I thought the subject was solving my 
problem big smile
Not for four or five messages, no :-)
 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsyncing really large files

2005-03-05 Thread Shachar Shemesh
Kevin Day wrote:
As a quick FYI, the block size absolutely has an impact on the 
effectiveness of the checksum table - larger blocks means fewer 
blocks, which means fewer hash colissions.
Since you wouldn't expect that many blocks in the file, a 32bit weak 
checksum would only produce about 4 or 5 real collisions. Mind you, this 
is me doing educated guesses. I haven't worked out the actual math yet. 
I don't think we need to worry about this particular problem just yet. 
Hash table collisions, however, are much more likely, which is what I'm 
trying to solve here.  

That said, however, I completely agree that for very large files, the 
number of buckets in the current implementation is not optimal.  
Perhaps having a mode for really large files would be appropriate.
I don't see why such a mode would be necessary.
One caution on increasing the size of the hash:  The current 
implementation gives 16 bits of spread, so modding that value with the 
desired bucket count would work fine.
That's not what I read into it. It seems to me that the checksum 
function gives a 32bit result, and we are squashing that into a 16bit 
hash table. Can you point me to the code? Wayne?

  However, if you choose to start with the 32 bit rolling hash and mod 
that, you will have problems.  The rolling checksum has two distinct 
parts, and modding will only pull info from the low order bits,
Why? This may be something I missed within the code.
which will likely get you considerably less than even the 16 bits that 
the current implementation gives.
If the source is 16 bit, doing any hash table size bigger than 65536 
buckets would make no sense, true. Is it 16bit?

I'd recommend using a real 32 bit hashing function to mix the two 
rolling checksum components,
What two parts? If I understand rsync correctly, we have a rolling 
checksum, and a real checksum. The rolling checksum is used to single 
out potential matches, and the real checksum makes sure these are, 
indeed, real matches. We only need to put the first one into the hash, 
as we are never doing mass-lookups on the second.

Am I missing something basic here?
 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsyncing really large files

2005-03-05 Thread Shachar Shemesh
Wayne Davison wrote:
On Thu, Mar 03, 2005 at 10:18:01AM +0200, Shachar Shemesh wrote:
 

And I'm suggesting making it static, by adjusting the hash table's
size according to the number of blocks.
   

The block-size? 
 

Definitely not! I was talking about the hash table load. I.e. - the 
ratio between the number of blocks and the number of hash table buckets.

I.e. - after determining the number of blocks, only then decide on a 
hash table size, and work accordingly. This means you use little memory 
for small files, and more memory for big files - should be an acceptable 
trade off.

Since it only needs to note a
found/not-found state, the table can be a single bit per node, and a
19-bit lookup only needs 64k of memory.
But that only works if the checksum function and the hash table are 
exactly the same size. Also, you still need to store the verify value 
somewhere, and efficiently find it. I'm not sure that's optimal.

If we take a 500GB file, as is Lars' case, and assuming we don't touch 
the block size (i.e. - we use the default 740K blocks of 740K size 
each), we will need about 900 thousand buckets in the hash table at 
alpha ratio of 80%, which means 4MB in pointers. I hardly think this is 
enough memory consumption (for efficiently transferring a 500GB file) to 
justify further complicated bit operations.

(on the flip side, 64KB fit into the CPU's data cache, while 4MB usually 
will not. I'm not sure how crucial that is going to be turn out to be).

 This allows a rapid yes/no
pre-check for the weak value before we look-up the actual strong
checksum value in the hash table and should result in less searching
for values that aren't there.
But how will you find it there? If you are going to have 740K blocks 
(i.e. - 740,000 strong hashes) in a 16bit hash table, you are going to 
have lots of collisions there (190 per bucket, on average), and you 
gained nothing.

..wayne..
 

 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsyncing really large files

2005-02-28 Thread Shachar Shemesh
Lars Karlslund wrote:
Also as far as I could read, the default block size is 700 bytes? What 
kind of application would default to moving data around 700 bytes at a 
time internally in a file? I'm not criticizing rsync, merely 
questioning the functionality of this feature.
I believe you may have missed the point there. 700 bytes is not the 
amount the application is expected to have changed. 700 bytes is merely 
the unit of data examined as one block. This means that if you take a 
file and change it by one byte, 700 consecutive bytes (sometimes), or 
two bytes 698 bytes away from one another, rsync will treat it as a 
single changed block, and will resynchronize the data there. This number 
is a trade off.

The larger the number, the more bytes need to be synched if a single 
byte changes (more network traffic). Also, the larger the number, the 
higher the cost if a small change crosses a block boundary, but the 
lower the chances of that happening.

The smaller the number, the more checksums have to be calculated and 
transferred (more network traffic). Also, the smaller the number, the 
more blocks in a file, and the higher the chances of checksum collisions 
that do not stem from a truly identical block, resulting in the need to 
calculate a stronger hash for the block and transfer it (more IO, cpu, 
network load and latency).

I'm too new to this project to know what benchmarks were done to bring 
the block size to default to 700, but it seems like a nice number. If 
your characteristics vary, you may wish to play around with it.

As for granularity - supposed you added one byte to the file. This means 
that the sender has a file which has a one byte offset from the 
receiver. The sender will have one block for which there is no 
counterpart at the receiver, and all other blocks will have a one byte 
offset (which rsync will detect, and save the traffic). In short, we see 
that the 700 number has almost nothing to do with the application that 
the file belongs to.

 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsyncing really large files

2005-02-28 Thread Shachar Shemesh
Lars Karlslund wrote:
Maybe I didn't express myself thoroughly enough  :-)
Or me.
Yes, a block is a minimum storage unit, which is considered for transfer.
In size, yes. Not in position.
But it's a fact that the rsync algorithm as it is now checks to see if 
a block should have moved. And in that case, the 700 bytes default is 
very much worth considering.
No, because the rsync algorithm can detect single byte moves of this 700 
bytes block.

If no blocks at all move in a 700 byte increment (i.e. 700 bytes gets 
inserted somewhere - optimally at a 700-byte boundary in the file), 
then all you get is larger memory and CPU usage and
all the bandwidth reduction you need.
The point I think you are missing is that the 700 bytes block need not 
be on 700 bytes boundaries. They can be on one byte boundaries.

It may very well be that, for your specific application, increasing the 
block size considerably will be better. If your files are huge, and the 
changed areas are very small in comparison to the file size, that can 
yield significant improvement. However, this is due to the trade offs I 
talked about in my previous email. It has nothing to do with 700 bytes 
being unrealistic or incorrect.

True, and in that scenario it makes no difference what the block size 
you choose: if the one byte is inserted at the beginning, the entire 
file will be transferred.
No, just the first block.
Rsync is not diff, and does not patch the file dynamically if the 
file has random insertions/removals.
Well, in a way, it does. It's really quite ingenious. As I have no 
relation to it's implementation, I can say that whole heartily. I 
encourage you to read the about the algorithm on the site.

You make no comment on my calculations on the block-moving algorithm 
in my real-world scenario, which was the basis for this discussion anyway.
I'm sorry. You just stated as facts things I knew to be incorrect, so I 
allowed myself to skip your calculations. I don't think there is any 
argument that you are getting sub-optimal results from rsync. The 
question is why.

How much memory is on the machines? Try to bring the block size up to 
1MB. This will mean you will have only 524 thousand blocks, which may 
prove more manageable.

Best regards,
 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsyncing really large files

2005-02-28 Thread Shachar Shemesh
Shachar Shemesh wrote:
No, because the rsync algorithm can detect single byte moves of this 
700 bytes block.
I will just mention that I opened the ultimate documentation for rsync 
(the source), and it says that the default block size is the rounded 
square root of the file's size. This means that your 64KB blocks are 
considerably smaller than what rsync would use if you didn't force it 
(which is about 740KB, much closer to my 1MB suggestion than to your 
64KB actual use).

If I were you, I'd try to remove the --block-size option, and see what 
happens.

Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsyncing really large files

2005-02-28 Thread Shachar Shemesh
Wayne Davison wrote:
However, you should be sure to have measured what is causing the
slowdown first to know how much that will help.  If it is not memory
that is swapping on the sender, it may be that the computing of the
checksums in maxing out your CPU, and removing the caching of the
remote checksums won't buy you as much as you think.  You could use some
of the librsync tools (e.g. rdiff) to calculate how long various actions
take on each system (i.e. try running rdiff on each system outputting to
/dev/null to see how long the computing of the checksums takes).
..wayne..
 

Hi Wayne,
Excuse me if I'm talking utter nonsense here. I have only just now 
opened the code up and looked at it. It does seem, however, that there 
is a considerable optimization that can be performed here.

Correct me if I'm wrong, but it seems to me that the checksum matching 
code is at match.c, inside hash_search. Particularly, the do...while 
loop. It seems that the loop is there to scan the entire checksums list 
for each byte. Is that really the case? If so, we can probably make it 
much much (much much much) more efficient by using a hash table instead. 
We wouldn't even have to change the line protocol in any way.

Am I misreading the code?
Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsyncing really large files

2005-02-28 Thread Shachar Shemesh
Kevin Day wrote:
I would *strongly* recommend that you dig into the thesis a bit (just 
the section that describes the rsync algorithm itself).
I tried a few weeks ago. I started to print it, and my printer ran out 
of ink :-). I will read it electronically eventually (I hope).

Now, if you have huge files, then the 16 bit checksum may not be 
sufficient to keep the hit rate down.  At that point, you may want to 
consider a few algorithmic alternatives:

1.  Break up your file into logical blocks and rsync each block 
individually.  If the file is an append only file, then this is 
fine.  However, if the contents of the file get re-ordered across 
block boundaries, then the efficiency of the rsync algorithm would be 
seriously degraded.

2.  Use a larger hash table.  Instead of 16 bits, expand it to 20 bits 
- it will require 16 times as much memory for the hash table, but that 
may not be an issue for you - you are probably workring with some 
relatively beefy hardware anyway, so what the heck.
Now, here's the part where I don't get it. We have X blocks checksummed, 
covering Y bytes each (actually, we have X blocks of checksum covering X 
bytes each, but that's not important). This means we actually know, 
before we get the list of checksums, how many we will have.

As for the hash table size - that's standard engineering. Alpha is 
defined as the ratio between the number of used buckets in the table to 
the number of total buckets. 0.8 is considered a good value.

What I can propose is to make the hash table size a function of X. If we 
take Lars' case, he has 500GB file, which means you ideally need about 1 
million buckets in the hash to have reasonable performance. We only have 
65 thousand. His Alpha is 0.008. No wonder he is getting abysmal 
performance.

On the other hand, if I'm syncing a 100K file, I'm only going to have 
320 blocks. A good hash table size for me will be 400 buckets. Having 
65536 buckets instead means I'm less likely to have memory cache hits, 
and performance suffers again. My Alpha value is 204 (instead of 0.8).

If my proposal is accepted, we will be adaptive in CPU-memory trade off.
 I'll leave the excercise of converting the full rsync 32 bit rolling 
checksum into a 20 bit value to you.
A simple modulo ought to be good enough. If the checksum is 42891 and I 
have 320 buckets, it should go into bucket 11. Assuming the checksums 
are sufficiently random-like, this algorithm is good enough.

Cheers,
Kevin
  Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Commercial Liscense for rsync

2005-02-20 Thread Shachar Shemesh
fred wu wrote:
Dear All,
 I am very new in open source. Please give me some ideas on the
following questions.
 What is the liscense of rsync for commerical use? Do I need to pay
for commercial use? Is it still GPL?
 A similar case is MySQL. It has commercial liscense for business use.
 Any help will be appreciate!
 Thanks!
Regards,
Fred
 

Hi Fred,
My answer is neither official (I am not a copyright holder for rsync) 
nor authorative (I am not a lawyer). I did study these things, however.

The question of whether you can or cannot use a GPL product for 
commercial use depends on what the use is. If all you want is to USE the 
product, go right ahead. It matters not whether for money or not. Use 
rsync (or Mysql, or Linux, or anything else under the GPL).

If what you want to do is to resell one of those technologies, it 
depends on what you want to happen to it in the interim. In the case of 
a pure standalone program, such as rsync, if you didn't change the 
program itself, you are fairly free to sell it as part of your product. 
If, however, you want to make changes to it, you must read and abide by 
the terms of the GPL in order to do that. In essence, you can do 
whatever you like, so long as you:
1. Don't change the license. I.e. - it must still be GPL (along with all 
of your changes).
2. Distribute the complete sources, in a form that allows your clients 
to recreate the program.
3. Make sure your clients know, either through the documentation or 
through something the program itself prints, that they have these rights.

I hope this answers your question.
Also, don't even think about relying on this small email to base your 
business on this answer. If in doubt, get a lawyer.

 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsync run on TCP/IP?

2005-02-20 Thread Shachar Shemesh
fred wu wrote:
Dear All,
 I have searched the email archive but i didn't find a clear
answer(maybe I miss it) ^^
1. Does rsync run on TCP/IP?
 

Rsync has a server mode. Read the manual, and particularly the 
--daemon option.

2. Without ssh, then rsync will be transferred in plain text? 
 

If you don't use SSH (or another encryption encapsulation, such as 
sslproxy), the data will be transferred in plain text.

3. Does rsync support Samba? Am I true to say that rsync, which is
mainly on user authetication,  is transparent  to the storage like
ftp?
 

rsync works on the file system. It doesn't care what the file system is. 
If it's a samba mount, then that's ok. I am not aware of a FTP 
filesystem, and therefor can't say that it will work over ftp.

Any help to any questions will be appreciated! Thank You!
Regards,
Fred
 

 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsync friendly zlib/gzip compression - revisited

2005-02-13 Thread Shachar Shemesh
 block size to 
1000 and leave it there.
If we look at RSync performance as a function of window size, we find, as 
expected, that the rsync performance decreases.  The curve isn't smooth (the 
compression algorithm causes all sorts of havoc here), but the general trend is 
clear:  Increasing the compressed window size and block mask size = decreased 
rsync performance.  What isn't clear (because I varied the window size and 
block mask size together), is which of the two actually is the driving force 
(window size or block mask size).  My initial results with fiddling with the 
compression performance indicate that it should be the block mask size that 
matters, and the window size shouldn't matter.
Here are the test results:
[Window AND block mask Size] [Speed Up]
100 335.6063
600 335.56348
1100 264.708
1600 273.0129
2100 220.42537
2600 226.33165
3000 219.47977
3500 225.90854
4000 179.82353
4500 195.95692
5000 230.86983

My initial thoughts here are:
1.  If the block mask size is mostly responsible for determining the 
performance of the zlib compression algorithm and if the window size is mostly 
responsible for determining the performance of the rsync algorithm, then we may 
have an opportunity to optimize the performance of the current --rsyncable 
patch.  The test results above imply that we could be looking at a 25% 
improvement in rsync performance without impacting compression or significantly 
adjusting Rusty's algorithm.  I think that the test results so far tend to 
support this as a possibility.
2.  Further, if we can get really small window sizes by switching to the original rsync rolling checksum (instead of using the simple checksum in the current patch), then we *might* be able to achieve even better optimization, without adding a lot of computation overhead (the overhead of the rsync algorithm compared to the simple computation used now should be pretty low compared to the rest of what's going on in the zlib computation).  

Anyway - that's where I'm headed with this.  I'll have results to indicate 
whether #1 is worth pursuing (i.e. testing with many other files) tomorrow.  #2 
will require a bit more coding to the --rsyncable patch, but it should be 
relatively simple to do.

That's all I have for now - I'm running tests this evening to independently 
adjust the window and block mask size, and to test a different data file just 
to make sure I'm not way off-base (i.e. make sure the trends indicated by my 
results are not dependent on the input file).
I'd love to hear any thoughts on the matter!
- Kevin
 


--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync huge tar files

2005-02-05 Thread Shachar Shemesh
Martin Schröder wrote:
On 2005-02-04 11:51:20 +0200, Shachar Shemesh wrote:
 

What distro is this? If it's Debian, gzip has an option called 
--rsyncable. This makes changes to the uncompressed file local in the 
   

This is a debian-only patch which doesn't change the gzip
version. :-(
Best regards
   Martin
 

Which kinda reminds me. Anyone knows where I can find the rsyncable 
patch in an isolated form?

 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
http://www.lingnu.com/
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync huge tar files

2005-02-04 Thread Shachar Shemesh
Harald Dunkel wrote:
Hi folks,
Are there any tricks known to let rsync operate on huge tar
files?
I've got a local tar file (e.g. 2GByte uncompressed) that is
rebuilt each night (with just some tiny changes, of course),
and I would like to update the remote copies of this file
without extracting the tar files into temporary directories.
Any ideas?
Regards
Harri
What distro is this? If it's Debian, gzip has an option called 
--rsyncable. This makes changes to the uncompressed file local in the 
compressed file.

If this is not a Debian system check maybe the rsyncable patch was 
integrated there too. If not, compile your own version of gzip. In order 
to apply it to the tar file, you will have to not use the z option 
while creating the tar, but instead pipe it to gzip. Instead of doing 
tar czf file.tgz dirs... you do tar cf - dirs... | gzip --rsyncable  
file.tgz

Enjoy
 Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
http://www.lingnu.com/
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Transforming file names contents

2005-01-22 Thread Shachar Shemesh
[EMAIL PROTECTED] wrote:
--- Wayne Davison wrote:
 

Another option would be to use some kind of a compressed filesystem.
   

Do you know of one that works on Linux? I searched for this a few months 
back
but came up empty.
Thanks,
Joe
 

I'm currently working on a tool that would do such a transformation as a 
preprocessing. I.e. - the files are preprocessed, and then the other 
directory is rsynced. In my case this is for encrypting (but I compress 
as part of the process, using the gzip rsyncable option).

If you're interested in this extremely preliminary pre-alpha not tested 
still being developed use at your own risk project, check out 
http://sourceforge.net/projects/rsyncrypto

 Shachar
P.s.
Currently, it's just the encryption part that I've worked on. The code 
in CVS (there is no other code at the moment) does not handle the actual 
tree scanning and copying, but it will soon. It goes without saying that 
any help would be appreciated.

--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
http://www.lingnu.com/
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html