I've completed my initial tests with very large include lists with a
successful outcome.

The final test list was intended to simulate a backup file list
generated by a db query to backup new files or files modified by a part
of our production process. The filesystems are in the 30TB range
containing millions of files in hundreds of directories. 

The file list was generated to include only regular files, no links,
devices files, sockets, etc. and excluded bare directory names. All
files were specified using their full path.

The file list had 292296 entries.

It took longer than the default FD-SD timeout of 30 minutes to ingest
the file list from the Director. I've modified the SD to change the
timeout from 30 minutes to 90 minutes to allow the SD to wait long
enough for the FD to begin sending data.

While the FD was reading the file list from the Director the FD was
using nearly 100% of the CPU. This usage then dropped back to a more
normal 10%.

Memory use on the FD was larger than normal, with max_bytes of
approximately 24MB.

Data transfer rates were normal for that server. Time to complete the
backup was no longer than expected for that server and amount of data.

Conclusion : Using a file list consisting of only the files to be backed
up and excluding bare directory entries (which would cause the full
content of the directory to be backed up) is possible and scales
reasonably into the 200,000+ file range. Larger file lists would need to
be tested to determine what, if any, the practical limit of the FD is.


----
Alan Davis
Senior Architect
Ruckus Network, Inc.
703.464.6578 (o)
410.365.7175 (m)
[EMAIL PROTECTED]
alancdavis AIM
 
> -----Original Message-----
> From: Alan Davis [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 30, 2007 1:29 PM
> To: 'Kern Sibbald'
> Cc: 'bacula-users@lists.sourceforge.net'
> Subject: RE: [Bacula-users] Experience with extremely large fileset
> include lists?
> 
> 
> During the 38 minutes that it takes the FD to do it's set up the CPU
is
> running at nearly 100% for the FD process. After the FD begins sending
> data to the SD the CPU use drops to around 10%, which is normal for a
> backup on that server. The transfer rate is also about normal. That
server
> has a mix of very large db files and many small files - I expect the
> backup to take at least 12 hours based on prior experience when using
a
> more "normal" fileset specification based on directory names rather
than
> individual files.
> 
> ----
> Alan Davis
> Senior Architect
> Ruckus Network, Inc.
> 703.464.6578 (o)
> 410.365.7175 (m)
> [EMAIL PROTECTED]
> alancdavis AIM
> 
> 
> > -----Original Message-----
> > From: Kern Sibbald [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, January 30, 2007 12:51 PM
> > To: Alan Davis
> > Cc: bacula-users@lists.sourceforge.net
> > Subject: Re: [Bacula-users] Experience with extremely large fileset
> > include lists?
> >
> > On Tuesday 30 January 2007 17:39, Alan Davis wrote:
> > > I've modified the timeout in stored/job.c to allow the SD to wait
90
> > > minutes instead of 30, recompiled and installed the modified SD.
> > >
> > > The test job takes about 38 minutes for the FD to process the
fileset,
> > > with the FD memory used :
> > >
> > > Heap: bytes=24,593,473 max_bytes=25,570,460 bufs=296,047
> > > max_bufs=298,836
> >
> > Yes, 24MB is really large memory utilization.
> >
> > >
> > > The SD waited for the FD to connect and is running the job as
> expected.
> >
> > I'll be interested to hear the results of running the job.  I
suspect
> that
> > it
> > will be catastrophically slow.
> >
> > >
> > >
> > >
> > > ----
> > > Alan Davis
> > > Senior Architect
> > > Ruckus Network, Inc.
> > > 703.464.6578 (o)
> > > 410.365.7175 (m)
> > > [EMAIL PROTECTED]
> > > alancdavis AIM
> > >
> > > > -----Original Message-----
> > > > From: Alan Davis [mailto:[EMAIL PROTECTED]
> > > > Sent: Tuesday, January 30, 2007 10:44 AM
> > > > To: 'Kern Sibbald'; 'bacula-users@lists.sourceforge.net'
> > > > Subject: RE: [Bacula-users] Experience with extremely large
fileset
> > > > include lists?
> > > >
> > > > Returning to the original thread...
> > > >
> > > > Just to make sure I'm being clear - my FileSet specification is:
> > > >
> > > > FileSet {
> > > >   Name = "u2LgFileList"
> > > >   Include {
> > > >     Options {
> > > >       signature = MD5
> > > >     }
> > > >     File = </local/etc/u2LgFileList.list
> > > >
> > > >   }
> > > > }
> > > >
> > > > The file /local/etc/u2LgFileList.list has 29K+ entries in it.
> > > > Note that this is /not/ an exclude list - it's explicitly
listing
> the
> > > > files to be backed up.
> > > >
> > > > The FD takes about 40 minutes to read in the file list.
> > > > The SD times out in 30 minutes waiting for the FD.
> > > >
> > > > From my reading of the manual there are directives that set the
time
> > >
> > > that
> > >
> > > > the Director will wait for an FD to respond "FD Connect Timeout"
and
> > >
> > > the
> > >
> > > > time that the FD will wait for the SD to respond "SD Connect
> Timeout"
> > >
> > > as
> > >
> > > > well as the "Heartbeat Interval" that will keep the connection
open
> > >
> > > during
> > >
> > > > long backups.
> > > >
> > > > I've not found a directive to modify the length of time that the
SD
> > >
> > > will
> > >
> > > > wait for the FD to begin transferring data.
> > > >
> > > > This is the error message from the failed backup message. Note
that
> > >
> > > the
> > >
> > > > authorization rejection is /not/ the problem - a test backup
that
> > > > succeeded was used to verify proper authorization and
communication
> > > > between FD and SD.
> > > >
> > > > 29-Jan 16:33 athos-dir: Start Backup JobId 112,
> > >
> > > Job=u2FullBackupJob.2007-
> > >
> > > > 01-29_16.33.15
> > > > 29-Jan 17:21 u2-fd: u2FullBackupJob.2007-01-29_16.33.15 Fatal
error:
> > > > Authorization key rejected by Storage daemon.
> > > > Please see
> > >
> > > http://www.bacula.org/rel-manual/faq.html#AuthorizationErrors
> > >
> > > > for help.
> > > > 29-Jan 17:21 u2-fd: u2FullBackupJob.2007-01-29_16.33.15 Fatal
error:
> > > > Failed to authenticate Storage daemon.
> > > > 29-Jan 17:21 athos-dir: u2FullBackupJob.2007-01-29_16.33.15
Fatal
> > >
> > > error:
> > > > Socket error on Storage command: ERR=No data available 29-Jan
17:21
> > >
> > > athos-
> > >
> > > > dir: u2FullBackupJob.2007-01-29_16.33.15 Error: Bacula 1.39.22
> > >
> > > (09Sep06):
> > > > 29-Jan-2007 17:21:22
> > > >
> > > > I think that the timeout is being specified in this code from
> > >
> > > stored/job.c
> > >
> > > >    gettimeofday(&tv, &tz);
> > > >    timeout.tv_nsec = tv.tv_usec * 1000;
> > > >    timeout.tv_sec = tv.tv_sec + 30 * 60;        /* wait 30
minutes
> */
> > > >
> > > >    Dmsg1(100, "%s waiting on FD to contact SD\n", jcr->Job);
> > > >    /*
> > > >     * Wait for the File daemon to contact us to start the Job,
> > > >     *  when he does, we will be released, unless the 30 minutes
> > > >     *  expires.
> > > >     */
> > > >    P(mutex);
> > > >    for ( ;!job_canceled(jcr); ) {
> > > >       errstat = pthread_cond_timedwait(&jcr->job_start_wait,
&mutex,
> > > > &timeout);
> > > >       if (errstat == 0 || errstat == ETIMEDOUT) {
> > > >          break;
> > > >       }
> > > >    }
> > > >    V(mutex);
> > > >
> > > >
> > > >
> > > > ----
> > > > Alan Davis
> > > > Senior Architect
> > > > Ruckus Network, Inc.
> > > > 703.464.6578 (o)
> > > > 410.365.7175 (m)
> > > > [EMAIL PROTECTED]
> > > > alancdavis AIM
> > > >
> > > > > -----Original Message-----
> > > > > From: Kern Sibbald [mailto:[EMAIL PROTECTED]
> > > > > Sent: Monday, January 29, 2007 4:06 PM
> > > > > To: bacula-users@lists.sourceforge.net
> > > > > Cc: Alan Davis
> > > > > Subject: Re: [Bacula-users] Experience with extremely large
> fileset
> > > > > include lists?
> > > > >
> > > > > Hello,
> > > > >
> > > > > On Monday 29 January 2007 21:19, Alan Davis wrote:
> > > > > > Kern,
> > > > > >
> > > > > >  Thanks for the fast response. To clarify a bit - the file
list
> > >
> > > that I
> > >
> > > > > > would be using would be individual files, not directories.
There
> > >
> > > would
> > >
> > > > > > be no exclude list as only the files that I need backed up
would
> > >
> > > be
> > >
> > > > > > listed.
> > > > >
> > > > > Yes, my answer was based on that assumption.
> > > > >
> > > > > > I have about 30TB of data files spread over several hundred
> > > >
> > > > directories.
> > > >
> > > > > > A true incremental backup will spend large amounts of time
> > >
> > > determining
> > >
> > > > > > what files have been changed or added. The information about
the
> > > > > > modified or new files is stored in a db as a side-effect of
> > >
> > > processing
> > >
> > > > > > the files for release to production so building a file list
is
> > > >
> > > > trivial.
> > > >
> > > > > > The only problem would be the FD's capability of handling a
file
> > >
> > > list
> > >
> > > > of
> > > >
> > > > > > 10K+ entries.
> > > > >
> > > > > All I can say is to try it, but I won't be surprised if it
chews
> up
> > >
> > > a
> > >
> > > > lot
> > > >
> > > > > of
> > > > > CPU.
> > > > >
> > > > > However, doing an equivalent of an incremental backup by means
of
> an
> > > > > exclusion
> > > > > list doesn't seem possible to me.
> > > > >
> > > > > Bacula is really quite fast in traversing a very large
filesystem
> > >
> > > during
> > >
> > > > > an
> > > > > incremental backup.
> > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > ----
> > > > > > Alan Davis
> > > > > > Senior Architect
> > > > > > Ruckus Network, Inc.
> > > > > > 703.464.6578 (o)
> > > > > > 410.365.7175 (m)
> > > > > > [EMAIL PROTECTED]
> > > > > > alancdavis AIM
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Kern Sibbald [mailto:[EMAIL PROTECTED]
> > > > > > > Sent: Monday, January 29, 2007 2:47 PM
> > > > > > > To: bacula-users@lists.sourceforge.net
> > > > > > > Cc: Alan Davis
> > > > > > > Subject: Re: [Bacula-users] Experience with extremely
large
> > >
> > > fileset
> > >
> > > > > > > include lists?
> > > > > > >
> > > > > > > On Monday 29 January 2007 18:17, Alan Davis wrote:
> > > > > > > > I understand that one of the projects is to incorporate
> > >
> > > features
> > >
> > > > > > that
> > > > > >
> > > > > > > > will make very large exclude lists feasible, but does
anyone
> > >
> > > have
> > >
> > > > > > > > experience, good or bad, with very large include lists
in a
> > > >
> > > > fileset?
> > > >
> > > > > > > > I'm looking at the possibility of building a backup list
> from
> > >
> > > a db
> > >
> > > > > > query
> > > > > >
> > > > > > > > that has the potential to return tens of thousands of
files
> > >
> > > stored
> > >
> > > > > > in
> > > > > >
> > > > > > > > hundreds of directories.
> > > > > > >
> > > > > > > For each file in the directories you specify (normally
your
> > >
> > > whole
> > >
> > > > > > > filesystem),
> > > > > > > Bacula will do a linear search through the exclude list.
Thus
> > >
> > > it
> > >
> > > > > > could be
> > > > > >
> > > > > > > extremely CPU intensive.  For a large list (more than 1000
> > >
> > > files) I
> > >
> > > > > > > believe
> > > > > > > it (the list) needs to be put into a hash tree, which is
code
> > >
> > > that
> > >
> > > > > > does
> > > > > >
> > > > > > > not
> > > > > > > exist.
> > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > ----
> > > > > > > >
> > > > > > > > Alan Davis
> > > > > > > >
> > > > > > > > Senior Architect
> > > > > > > >
> > > > > > > > Ruckus Network, Inc.
> > > > > > > >
> > > > > > > > 703.464.6578 (o)
> > > > > > > >
> > > > > > > > 410.365.7175 (m)
> > > > > > > >
> > > > > > > > [EMAIL PROTECTED]
> > > > > > > >
> > > > > > > > alancdavis AIM
> > >
> > >
----------------------------------------------------------------------
> > >
> > > > --
> > > >
> > > > > -
> > > > >
> > > > > > Take Surveys. Earn Cash. Influence the Future of IT
> > > > > > Join SourceForge.net's Techsay panel and you'll get the
chance
> to
> > > >
> > > > share
> > > >
> > > > > > your opinions on IT & business topics through brief surveys
-
> and
> > >
> > > earn
> > >
> > > > > cash
> > >
> > >
>
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE
> > > V
> > >
> > > > > > _______________________________________________
> > > > > > Bacula-users mailing list
> > > > > > Bacula-users@lists.sourceforge.net
> > > > > > https://lists.sourceforge.net/lists/listinfo/bacula-users




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to