Re: Huge directory tree: Get files to sync via tools like sysdig
> On 10 Feb 2017, at 01:21, Karl O. Pincwrote: > > On Fri, 10 Feb 2017 12:38:32 +1300 > Henri Shustak wrote: > >> As Ben mentioned, ZFS snapshots is one possible approach. Another >> approach is to have a faster storage system. I have seen considerable >> speed improvements with rsync on similar data sets by say upgrading >> the storage sub system. > > Another possibility could be to use lvm and lvmcache to throw a ssd in > front of the spinning disks. This would only improve things if > you didn't otherwise fill up the cache with data -- you want > the cache to contain inodes. So this might work only if your > ssd cache was larger than whatever amount of data you typically > write between rsync runs, plus enough to hold all the inodes > in your rsync-ed fs. > > I've not tried this. I'm not even certain it's a good idea. It's > just a thought. It's also possible to have a SSD cache with ZFS (called the L2ARC). You can even ask this cache to only store your metadata. Some (same ?) changes may also be needed on receiver/server side too (depending on its current setting) to see a performance improvement. Ben -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Huge directory tree: Get files to sync via tools like sysdig
That sounds like it certinally would not hurt! This email is protected by LBackup, an open source backup solution http://www.lbackup.org -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Huge directory tree: Get files to sync via tools like sysdig
As Ben mentioned, ZFS snapshots is one possible approach. Another approach is to have a faster storage system. I have seen considerable speed improvements with rsync on similar data sets by say upgrading the storage sub system. This email is protected by LBackup, an open source backup solution http://www.lbackup.org -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Huge directory tree: Get files to sync via tools like sysdig
> On 09 Feb 2017, at 16:10, Thomas Güttlerwrote: > > Am 09.02.2017 um 11:05 schrieb Ben RUBSON: >>> On 09 Feb 2017, at 10:05, Thomas Güttler >>> wrote: >>> >>> Hi, >>> >>> we have a huge directory tree. >>> >>> >>> * 17M files (number of files) >>> * 2.2TBytes of data. >>> * Only 0.1% changes per day >>> >>> Current pain: rsyncs directory tree traversal needs to long to discover the >>> changed files. >> >> Hi, >> >> On which type of FS is this directory ? > > ext4 Any way to prefer snapshots in your backup strategy ? Or to use a ZFS ready OS to benefit from a SSD cache (which would store your metadata) ? -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Huge directory tree: Get files to sync via tools like sysdig
Directory creation is not a race condition when done properly. The application (like Lsyncd) gets a directory creation event, creates a watch for the directory and scans the new directory for files or subdirectories in there, subdirectories are handled recursevly. This way nothing can be missed. The general warning of "bugs may be possible" is a no-brainer. Yes, they are always possible, everywhere. As said, there are some issues with the "move" (aka rename) event to be detected as such, sometimes it may be detected as a create / delete without proper acknowleding the move within the watched tree. And events may not arrive in the same order as they happened, due to multi-core nature of modern systems. But otherwise than that, I'm convinced it is fine. And all of this is not a real issue with event based filter list creation to minify rsyncs work. The only other issue I know of is hard links. Create a hard link outside the watched directory to a file within the watched directory tree and altering will not create an event. In that case you just must not do them. This has hardly been an issue in most usecases tough. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Huge directory tree: Get files to sync via tools like sysdig
On Thu, 9 Feb 2017 14:43:57 +0100 Axel Kittenbergerwrote: > > > > Not only that, but inotify is not guaranteed. (At least not on > > 3.16.0. Can't say regards later versions.) So you might miss some > > changes. > > > > Got any info on that? > > I noted that MOVE_FROM and MOVE_TO events are not guaranted to arrive > in order, or even the file descriptor might briefly close with "no > more events" inbetween them, but I never ever heared of anybody > encountering an issue of an event in a watched directory on not being > correctly reported, without getting the information of an overlfow > with an OVERFLOW event, which results in case of Lsyncd in a full > rescan of everything. Not much. inotify(7) on my system says: With careful programming, an application can use inotify to efficiently monitor and cache the state of a set of filesystem objects. However, robust applications should allow for the fact that bugs in the monitor‐ ing logic or races of the kind described below may leave the cache inconsistent with the filesystem state. It is probably wise to to do some consistency checking, and rebuild the cache when inconsistencies are detected. I think one of the pretty much unavoidable race conditions is sub-directory creation; the sub-directory can have files added to it before the monitoring process is able to set a watch on it. Of course this is an application level race. I've had incron (which uses inotify) regularly fail to catch all monitored fs changes on a busy system. And the monitored system does not involve creating sub-directories -- and I don't think I'm exceeding the system's inotify event limit either. But I could be wrong about either of these. So perhaps the take-away is that inotify is "hard", or even "impossible" to rely on as the sole method for change monitoring. It may not be right to say it's "unreliable" as I did above. I'm not the expert here. But I can say that my limited experience with it makes me want to look very closely before relying on it. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Huge directory tree: Get files to sync via tools like sysdig
On Thu, 9 Feb 2017 10:55:51 +0100 Axel Kittenbergerwrote: > > Has someone experience with collecting the changed files > > with a third party tool which detects which files were changed? > > I don't know of sysdig but am the developer of Lsyncd which does > exactly that, collect file changes via inotify event mechanism and > then calls rsync with a matching filter mask. > > However, since you say, your directory tree is hugh, the main issue > is that for every directory an inotify watch must be created, taking > about 1KB of kernel memory per watch. Not only that, but inotify is not guaranteed. (At least not on 3.16.0. Can't say regards later versions.) So you might miss some changes. Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Huge directory tree: Get files to sync via tools like sysdig
> > Not only that, but inotify is not guaranteed. (At least not on > 3.16.0. Can't say regards later versions.) So you might miss some > changes. > Got any info on that? I noted that MOVE_FROM and MOVE_TO events are not guaranted to arrive in order, or even the file descriptor might briefly close with "no more events" inbetween them, but I never ever heared of anybody encountering an issue of an event in a watched directory on not being correctly reported, without getting the information of an overlfow with an OVERFLOW event, which results in case of Lsyncd in a full rescan of everything. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Huge directory tree: Get files to sync via tools like sysdig
> On 09 Feb 2017, at 10:05, Thomas Güttlerwrote: > > Hi, > > we have a huge directory tree. > > > * 17M files (number of files) > * 2.2TBytes of data. > * Only 0.1% changes per day > > Current pain: rsyncs directory tree traversal needs to long to discover the > changed files. Hi, On which type of FS is this directory ? Ben -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Huge directory tree: Get files to sync via tools like sysdig
> Has someone experience with collecting the changed files > with a third party tool which detects which files were changed? I don't know of sysdig but am the developer of Lsyncd which does exactly that, collect file changes via inotify event mechanism and then calls rsync with a matching filter mask. However, since you say, your directory tree is hugh, the main issue is that for every directory an inotify watch must be created, taking about 1KB of kernel memory per watch. If you got a million directories this is a GB of unswapable memory use. Unfortunally the Linux kernel doesn't provide a better way yet, and I suppose other tools like sysdig suffer from the same issue. There is fanotify, but that doesn't report move event and thus is not useable for this task. Kind regards, Axel On Thu, Feb 9, 2017 at 10:05 AM, Thomas Güttler < guettl...@thomas-guettler.de> wrote: > Hi, > > we have a huge directory tree. > > > * 17M files (number of files) > * 2.2TBytes of data. > * Only 0.1% changes per day > > Current pain: rsyncs directory tree traversal needs to long to discover > the changed files. Only few files change. > > I discovered the tool sysdig which could be used to monitor the files > which were changed. > > Then we could feed the list of changed files to rsync and avoid the long > directory traversal of rsync. > > Has someone experience with collecting the changed files with a third > party tool which detects which > files were changed? > > Regards, > Thomas Güttler > > > > -- > Thomas Guettler http://www.thomas-guettler.de/ > > -- > Please use reply-all for most replies to avoid omitting the mailing list. > To unsubscribe or change options: https://lists.samba.org/mailma > n/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html > -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html