Hello,

Very interesting results, not too surprising.  See below ...

On Wednesday 31 January 2007 15:52, Alan Davis wrote:
> I've completed my initial tests with very large include lists with a
> successful outcome.
>
> The final test list was intended to simulate a backup file list
> generated by a db query to backup new files or files modified by a part
> of our production process. The filesystems are in the 30TB range
> containing millions of files in hundreds of directories.
>
> The file list was generated to include only regular files, no links,
> devices files, sockets, etc. and excluded bare directory names. All
> files were specified using their full path.
>
> The file list had 292296 entries.
>
> It took longer than the default FD-SD timeout of 30 minutes to ingest
> the file list from the Director. I've modified the SD to change the
> timeout from 30 minutes to 90 minutes to allow the SD to wait long
> enough for the FD to begin sending data.
>
> While the FD was reading the file list from the Director the FD was
> using nearly 100% of the CPU. 

Yes, the list structure of included and excluded files is really intended for 
very small numbers.  I think you could get a *very* significant reduction in 
the time needed to "ingest" the file list from the Director by modifying line 
699 of <bacula-source>/src/filed/job.c from:

      fileset->incexe->name_list.init(1, true);

to

      fileset->incexe->name_list.init(10000, true);

The 10,000 may be a bit of over kill, and you could very likely get almost 
equally good performance by setting it to 1000.  It seems to me that setting 
it to 10,000 is worth a try.  It is not something I would do in the 
mainstream code, but it would probably solve your performance problem.  I 
would have no problem to set it to say 100 in the mainstream code.

A better solution (probably), but one requiring more work would be to change 
the list from an alist (allocated, indexed list) to a dlist (doubly linked 
list).  For an alist adding a new member is very expensive when the growth 
size of the list (first argument above) is exceeded.  On the other hand, 
adding a new member is no more expensive than adding any other member for a 
dlist.

I'll be interested how much the change I suggest above will improve things ...

Best regards, 

Kern

> This usage then dropped back to a more 
> normal 10%.
>
> Memory use on the FD was larger than normal, with max_bytes of
> approximately 24MB.
>
> Data transfer rates were normal for that server. Time to complete the
> backup was no longer than expected for that server and amount of data.
>
> Conclusion : Using a file list consisting of only the files to be backed
> up and excluding bare directory entries (which would cause the full
> content of the directory to be backed up) is possible and scales
> reasonably into the 200,000+ file range. Larger file lists would need to
> be tested to determine what, if any, the practical limit of the FD is.
>
>
> ----
> Alan Davis
> Senior Architect
> Ruckus Network, Inc.
> 703.464.6578 (o)
> 410.365.7175 (m)
> [EMAIL PROTECTED]
> alancdavis AIM
>
> > -----Original Message-----
> > From: Alan Davis [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, January 30, 2007 1:29 PM
> > To: 'Kern Sibbald'
> > Cc: 'bacula-users@lists.sourceforge.net'
> > Subject: RE: [Bacula-users] Experience with extremely large fileset
> > include lists?
> >
> >
> > During the 38 minutes that it takes the FD to do it's set up the CPU
>
> is
>
> > running at nearly 100% for the FD process. After the FD begins sending
> > data to the SD the CPU use drops to around 10%, which is normal for a
> > backup on that server. The transfer rate is also about normal. That
>
> server
>
> > has a mix of very large db files and many small files - I expect the
> > backup to take at least 12 hours based on prior experience when using
>
> a
>
> > more "normal" fileset specification based on directory names rather
>
> than
>
> > individual files.
> >
> > ----
> > Alan Davis
> > Senior Architect
> > Ruckus Network, Inc.
> > 703.464.6578 (o)
> > 410.365.7175 (m)
> > [EMAIL PROTECTED]
> > alancdavis AIM
> >
> > > -----Original Message-----
> > > From: Kern Sibbald [mailto:[EMAIL PROTECTED]
> > > Sent: Tuesday, January 30, 2007 12:51 PM
> > > To: Alan Davis
> > > Cc: bacula-users@lists.sourceforge.net
> > > Subject: Re: [Bacula-users] Experience with extremely large fileset
> > > include lists?
> > >
> > > On Tuesday 30 January 2007 17:39, Alan Davis wrote:
> > > > I've modified the timeout in stored/job.c to allow the SD to wait
>
> 90
>
> > > > minutes instead of 30, recompiled and installed the modified SD.
> > > >
> > > > The test job takes about 38 minutes for the FD to process the
>
> fileset,
>
> > > > with the FD memory used :
> > > >
> > > > Heap: bytes=24,593,473 max_bytes=25,570,460 bufs=296,047
> > > > max_bufs=298,836
> > >
> > > Yes, 24MB is really large memory utilization.
> > >
> > > > The SD waited for the FD to connect and is running the job as
> >
> > expected.
> >
> > > I'll be interested to hear the results of running the job.  I
>
> suspect
>
> > that
> >
> > > it
> > > will be catastrophically slow.
> > >
> > > > ----
> > > > Alan Davis
> > > > Senior Architect
> > > > Ruckus Network, Inc.
> > > > 703.464.6578 (o)
> > > > 410.365.7175 (m)
> > > > [EMAIL PROTECTED]
> > > > alancdavis AIM
> > > >
> > > > > -----Original Message-----
> > > > > From: Alan Davis [mailto:[EMAIL PROTECTED]
> > > > > Sent: Tuesday, January 30, 2007 10:44 AM
> > > > > To: 'Kern Sibbald'; 'bacula-users@lists.sourceforge.net'
> > > > > Subject: RE: [Bacula-users] Experience with extremely large
>
> fileset
>
> > > > > include lists?
> > > > >
> > > > > Returning to the original thread...
> > > > >
> > > > > Just to make sure I'm being clear - my FileSet specification is:
> > > > >
> > > > > FileSet {
> > > > >   Name = "u2LgFileList"
> > > > >   Include {
> > > > >     Options {
> > > > >       signature = MD5
> > > > >     }
> > > > >     File = </local/etc/u2LgFileList.list
> > > > >
> > > > >   }
> > > > > }
> > > > >
> > > > > The file /local/etc/u2LgFileList.list has 29K+ entries in it.
> > > > > Note that this is /not/ an exclude list - it's explicitly
>
> listing
>
> > the
> >
> > > > > files to be backed up.
> > > > >
> > > > > The FD takes about 40 minutes to read in the file list.
> > > > > The SD times out in 30 minutes waiting for the FD.
> > > > >
> > > > > From my reading of the manual there are directives that set the
>
> time
>
> > > > that
> > > >
> > > > > the Director will wait for an FD to respond "FD Connect Timeout"
>
> and
>
> > > > the
> > > >
> > > > > time that the FD will wait for the SD to respond "SD Connect
> >
> > Timeout"
> >
> > > > as
> > > >
> > > > > well as the "Heartbeat Interval" that will keep the connection
>
> open
>
> > > > during
> > > >
> > > > > long backups.
> > > > >
> > > > > I've not found a directive to modify the length of time that the
>
> SD
>
> > > > will
> > > >
> > > > > wait for the FD to begin transferring data.
> > > > >
> > > > > This is the error message from the failed backup message. Note
>
> that
>
> > > > the
> > > >
> > > > > authorization rejection is /not/ the problem - a test backup
>
> that
>
> > > > > succeeded was used to verify proper authorization and
>
> communication
>
> > > > > between FD and SD.
> > > > >
> > > > > 29-Jan 16:33 athos-dir: Start Backup JobId 112,
> > > >
> > > > Job=u2FullBackupJob.2007-
> > > >
> > > > > 01-29_16.33.15
> > > > > 29-Jan 17:21 u2-fd: u2FullBackupJob.2007-01-29_16.33.15 Fatal
>
> error:
> > > > > Authorization key rejected by Storage daemon.
> > > > > Please see
> > > >
> > > > http://www.bacula.org/rel-manual/faq.html#AuthorizationErrors
> > > >
> > > > > for help.
> > > > > 29-Jan 17:21 u2-fd: u2FullBackupJob.2007-01-29_16.33.15 Fatal
>
> error:
> > > > > Failed to authenticate Storage daemon.
> > > > > 29-Jan 17:21 athos-dir: u2FullBackupJob.2007-01-29_16.33.15
>
> Fatal
>
> > > > error:
> > > > > Socket error on Storage command: ERR=No data available 29-Jan
>
> 17:21
>
> > > > athos-
> > > >
> > > > > dir: u2FullBackupJob.2007-01-29_16.33.15 Error: Bacula 1.39.22
> > > >
> > > > (09Sep06):
> > > > > 29-Jan-2007 17:21:22
> > > > >
> > > > > I think that the timeout is being specified in this code from
> > > >
> > > > stored/job.c
> > > >
> > > > >    gettimeofday(&tv, &tz);
> > > > >    timeout.tv_nsec = tv.tv_usec * 1000;
> > > > >    timeout.tv_sec = tv.tv_sec + 30 * 60;        /* wait 30
>
> minutes
>
> > */
> >
> > > > >    Dmsg1(100, "%s waiting on FD to contact SD\n", jcr->Job);
> > > > >    /*
> > > > >     * Wait for the File daemon to contact us to start the Job,
> > > > >     *  when he does, we will be released, unless the 30 minutes
> > > > >     *  expires.
> > > > >     */
> > > > >    P(mutex);
> > > > >    for ( ;!job_canceled(jcr); ) {
> > > > >       errstat = pthread_cond_timedwait(&jcr->job_start_wait,
>
> &mutex,
>
> > > > > &timeout);
> > > > >       if (errstat == 0 || errstat == ETIMEDOUT) {
> > > > >          break;
> > > > >       }
> > > > >    }
> > > > >    V(mutex);
> > > > >
> > > > >
> > > > >
> > > > > ----
> > > > > Alan Davis
> > > > > Senior Architect
> > > > > Ruckus Network, Inc.
> > > > > 703.464.6578 (o)
> > > > > 410.365.7175 (m)
> > > > > [EMAIL PROTECTED]
> > > > > alancdavis AIM
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kern Sibbald [mailto:[EMAIL PROTECTED]
> > > > > > Sent: Monday, January 29, 2007 4:06 PM
> > > > > > To: bacula-users@lists.sourceforge.net
> > > > > > Cc: Alan Davis
> > > > > > Subject: Re: [Bacula-users] Experience with extremely large
> >
> > fileset
> >
> > > > > > include lists?
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > On Monday 29 January 2007 21:19, Alan Davis wrote:
> > > > > > > Kern,
> > > > > > >
> > > > > > >  Thanks for the fast response. To clarify a bit - the file
>
> list
>
> > > > that I
> > > >
> > > > > > > would be using would be individual files, not directories.
>
> There
>
> > > > would
> > > >
> > > > > > > be no exclude list as only the files that I need backed up
>
> would
>
> > > > be
> > > >
> > > > > > > listed.
> > > > > >
> > > > > > Yes, my answer was based on that assumption.
> > > > > >
> > > > > > > I have about 30TB of data files spread over several hundred
> > > > >
> > > > > directories.
> > > > >
> > > > > > > A true incremental backup will spend large amounts of time
> > > >
> > > > determining
> > > >
> > > > > > > what files have been changed or added. The information about
>
> the
>
> > > > > > > modified or new files is stored in a db as a side-effect of
> > > >
> > > > processing
> > > >
> > > > > > > the files for release to production so building a file list
>
> is
>
> > > > > trivial.
> > > > >
> > > > > > > The only problem would be the FD's capability of handling a
>
> file
>
> > > > list
> > > >
> > > > > of
> > > > >
> > > > > > > 10K+ entries.
> > > > > >
> > > > > > All I can say is to try it, but I won't be surprised if it
>
> chews
>
> > up
> >
> > > > a
> > > >
> > > > > lot
> > > > >
> > > > > > of
> > > > > > CPU.
> > > > > >
> > > > > > However, doing an equivalent of an incremental backup by means
>
> of
>
> > an
> >
> > > > > > exclusion
> > > > > > list doesn't seem possible to me.
> > > > > >
> > > > > > Bacula is really quite fast in traversing a very large
>
> filesystem
>
> > > > during
> > > >
> > > > > > an
> > > > > > incremental backup.
> > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > ----
> > > > > > > Alan Davis
> > > > > > > Senior Architect
> > > > > > > Ruckus Network, Inc.
> > > > > > > 703.464.6578 (o)
> > > > > > > 410.365.7175 (m)
> > > > > > > [EMAIL PROTECTED]
> > > > > > > alancdavis AIM
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Kern Sibbald [mailto:[EMAIL PROTECTED]
> > > > > > > > Sent: Monday, January 29, 2007 2:47 PM
> > > > > > > > To: bacula-users@lists.sourceforge.net
> > > > > > > > Cc: Alan Davis
> > > > > > > > Subject: Re: [Bacula-users] Experience with extremely
>
> large
>
> > > > fileset
> > > >
> > > > > > > > include lists?
> > > > > > > >
> > > > > > > > On Monday 29 January 2007 18:17, Alan Davis wrote:
> > > > > > > > > I understand that one of the projects is to incorporate
> > > >
> > > > features
> > > >
> > > > > > > that
> > > > > > >
> > > > > > > > > will make very large exclude lists feasible, but does
>
> anyone
>
> > > > have
> > > >
> > > > > > > > > experience, good or bad, with very large include lists
>
> in a
>
> > > > > fileset?
> > > > >
> > > > > > > > > I'm looking at the possibility of building a backup list
> >
> > from
> >
> > > > a db
> > > >
> > > > > > > query
> > > > > > >
> > > > > > > > > that has the potential to return tens of thousands of
>
> files
>
> > > > stored
> > > >
> > > > > > > in
> > > > > > >
> > > > > > > > > hundreds of directories.
> > > > > > > >
> > > > > > > > For each file in the directories you specify (normally
>
> your
>
> > > > whole
> > > >
> > > > > > > > filesystem),
> > > > > > > > Bacula will do a linear search through the exclude list.
>
> Thus
>
> > > > it
> > > >
> > > > > > > could be
> > > > > > >
> > > > > > > > extremely CPU intensive.  For a large list (more than 1000
> > > >
> > > > files) I
> > > >
> > > > > > > > believe
> > > > > > > > it (the list) needs to be put into a hash tree, which is
>
> code
>
> > > > that
> > > >
> > > > > > > does
> > > > > > >
> > > > > > > > not
> > > > > > > > exist.
> > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > ----
> > > > > > > > >
> > > > > > > > > Alan Davis
> > > > > > > > >
> > > > > > > > > Senior Architect
> > > > > > > > >
> > > > > > > > > Ruckus Network, Inc.
> > > > > > > > >
> > > > > > > > > 703.464.6578 (o)
> > > > > > > > >
> > > > > > > > > 410.365.7175 (m)
> > > > > > > > >
> > > > > > > > > [EMAIL PROTECTED]
> > > > > > > > >
> > > > > > > > > alancdavis AIM
>
> ----------------------------------------------------------------------
>
> > > > > --
> > > > >
> > > > > > -
> > > > > >
> > > > > > > Take Surveys. Earn Cash. Influence the Future of IT
> > > > > > > Join SourceForge.net's Techsay panel and you'll get the
>
> chance
>
> > to
> >
> > > > > share
> > > > >
> > > > > > > your opinions on IT & business topics through brief surveys
>
> -
>
> > and
> >
> > > > earn
> > > >
> > > > > > cash
>
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE
>
> > > > V
> > > >
> > > > > > > _______________________________________________
> > > > > > > Bacula-users mailing list
> > > > > > > Bacula-users@lists.sourceforge.net
> > > > > > > https://lists.sourceforge.net/lists/listinfo/bacula-users

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to