Huge directory tree: Get files to sync via tools like sysdig

2017-02-09 Thread Thomas Güttler
Hi, we have a huge directory tree. * 17M files (number of files) * 2.2TBytes of data. * Only 0.1% changes per day Current pain: rsyncs directory tree traversal needs to long to discover the changed files. Only few files change. I discovered the tool sysdig which could be used to monitor

Re: Huge directory tree: Get files to sync via tools like sysdig

2017-02-09 Thread Ben RUBSON
> On 09 Feb 2017, at 10:05, Thomas Güttler wrote: > > Hi, > > we have a huge directory tree. > > > * 17M files (number of files) > * 2.2TBytes of data. > * Only 0.1% changes per day > > Current pain: rsyncs directory tree traversal needs to long to discover the

Re: Huge directory tree: Get files to sync via tools like sysdig

2017-02-09 Thread Axel Kittenberger
> Has someone experience with collecting the changed files > with a third party tool which detects which files were changed? I don't know of sysdig but am the developer of Lsyncd which does exactly that, collect file changes via inotify event mechanism and then calls rsync with a matching filter

Re: Huge directory tree: Get files to sync via tools like sysdig

2017-02-09 Thread Henri Shustak
That sounds like it certinally would not hurt! This email is protected by LBackup, an open source backup solution http://www.lbackup.org -- Please use reply-all for most replies to avoid omitting the mailing list. To

Re: Huge directory tree: Get files to sync via tools like sysdig

2017-02-09 Thread Henri Shustak
As Ben mentioned, ZFS snapshots is one possible approach. Another approach is to have a faster storage system. I have seen considerable speed improvements with rsync on similar data sets by say upgrading the storage sub system.

Re: Huge directory tree: Get files to sync via tools like sysdig

2017-02-09 Thread Ben RUBSON
> On 10 Feb 2017, at 01:21, Karl O. Pinc wrote: > > On Fri, 10 Feb 2017 12:38:32 +1300 > Henri Shustak wrote: > >> As Ben mentioned, ZFS snapshots is one possible approach. Another >> approach is to have a faster storage system. I have seen considerable

Re: Huge directory tree: Get files to sync via tools like sysdig

2017-02-09 Thread Karl O. Pinc
On Thu, 9 Feb 2017 14:43:57 +0100 Axel Kittenberger wrote: > > > > Not only that, but inotify is not guaranteed. (At least not on > > 3.16.0. Can't say regards later versions.) So you might miss some > > changes. > > > > Got any info on that? > > I noted that MOVE_FROM

Re: Huge directory tree: Get files to sync via tools like sysdig

2017-02-09 Thread Axel Kittenberger
> > Not only that, but inotify is not guaranteed. (At least not on > 3.16.0. Can't say regards later versions.) So you might miss some > changes. > Got any info on that? I noted that MOVE_FROM and MOVE_TO events are not guaranted to arrive in order, or even the file descriptor might briefly

Re: Huge directory tree: Get files to sync via tools like sysdig

2017-02-09 Thread Karl O. Pinc
On Thu, 9 Feb 2017 10:55:51 +0100 Axel Kittenberger wrote: > > Has someone experience with collecting the changed files > > with a third party tool which detects which files were changed? > > I don't know of sysdig but am the developer of Lsyncd which does > exactly that,

Re: Huge directory tree: Get files to sync via tools like sysdig

2017-02-09 Thread Axel Kittenberger
Directory creation is not a race condition when done properly. The application (like Lsyncd) gets a directory creation event, creates a watch for the directory and scans the new directory for files or subdirectories in there, subdirectories are handled recursevly. This way nothing can be missed.

Re: Huge directory tree: Get files to sync via tools like sysdig

2017-02-09 Thread Ben RUBSON
> On 09 Feb 2017, at 16:10, Thomas Güttler wrote: > > Am 09.02.2017 um 11:05 schrieb Ben RUBSON: >>> On 09 Feb 2017, at 10:05, Thomas Güttler >>> wrote: >>> >>> Hi, >>> >>> we have a huge directory tree. >>> >>> >>> * 17M files