On Nov 3, 2008, at 5:36 PM, Tom Metro wrote:

John Orthoefer wrote:
Tony Rudie wrote:
Rsync should be fine. And searching for a specific entry in a directory should be way faster than looking at every entry to see if it needs copying. Right?
If your filesystem stores directory entries in something other than an unsorted list, yes it should be faster.
...
But if your filesystem still keeps you directories in something that is unhashed, then you might as well just let rsync do it's job, you are only saving a stat call at that point...

I'm not sure I see the relevance of hashed directories with respect to the OP's question.


The relevance is if you pass rsync a list of files (because we don't know if the OPs plan is to sync each file as inotify fires, or to sync the files every n minutes based on a list generated by inotify), as far as I know it takes the first file in the list, and grabs the file. Which means it needs to scan each directory down until it hits the file, rinse and repeat for each file. If some of those directories are huge and the scan is linear it can a long time to scan each directory. But if rsync does it's own thing it does each file as it encounters it, so it doesn't end up rescanning for each file.

Really I can't tell if it's relevant to the OP question, because he didn't give any of those details. Further more, I was responding not to the OP question but statement that Tony made, that searching for a file is faster than checking each file. I was trying to point out you need profile what is going on, and know where the bottleneck is and what the problem is you are trying to solve. (not the problem you want to solve, that is called research. And no they aren't always the same thing.)

If you let rsync operate in its usual fashion, then it needs to scan the directory hierarchy, and look at the file system metadata for each file, comparing it with the remote file, and if a difference is found, perform a more detailed block comparison.

It depends on what other flags you give rsync, you can tell it to just blind sync the file based on metadata, --whole-file (as I recall.)

The OP was seeking to replace that scan with an event driven model using inotify or an equivalent service hooked into the OS's kernel that would fire events when a change occurred in the area of interest.


When I first saw this message, my answer was use rsync with
--from-file...

So where does the list of files come from that you put into the file pointed at by --files-from?

Sure, you can use something like:

inotifywait -q -r -m ... /path | perl -pe ... | rsync --files- from=- ...

but it requires more than just rsync.


Yes, I didn't provide a whole script, some glue is required. Depending on how fast you are expecting files to change, how fast you can sync them, and how many are changing per unit time (maybe the right answer for his problem might be to do something even more complex, inotify feeding a program that forks off 20 or so rsyncs which it queues files to be synced.) So it requires more knowledge about what is going on to find the best solutions. With that said based on what he asked, I would look to use rsync --file-from=. Also you'll have to do something like rsync to resync after a restart, because you won't be able to guarantee that while your notify wasn't in place something didn't change.

Then again you might not be able to do what he needs with inotify, and you might have to bring out the really heavy guns. Like a pair of netapps with SnapSync/SnapMirror (I think those are the products that keep 2 netapps in sync.)

Johno

_______________________________________________
bblisa mailing list
[email protected]
http://www.bblisa.org/mailman/listinfo/bblisa

Reply via email to