I've completed my initial tests with very large include lists with a successful outcome.
The final test list was intended to simulate a backup file list generated by a db query to backup new files or files modified by a part of our production process. The filesystems are in the 30TB range containing millions of files in hundreds of directories. The file list was generated to include only regular files, no links, devices files, sockets, etc. and excluded bare directory names. All files were specified using their full path. The file list had 292296 entries. It took longer than the default FD-SD timeout of 30 minutes to ingest the file list from the Director. I've modified the SD to change the timeout from 30 minutes to 90 minutes to allow the SD to wait long enough for the FD to begin sending data. While the FD was reading the file list from the Director the FD was using nearly 100% of the CPU. This usage then dropped back to a more normal 10%. Memory use on the FD was larger than normal, with max_bytes of approximately 24MB. Data transfer rates were normal for that server. Time to complete the backup was no longer than expected for that server and amount of data. Conclusion : Using a file list consisting of only the files to be backed up and excluding bare directory entries (which would cause the full content of the directory to be backed up) is possible and scales reasonably into the 200,000+ file range. Larger file lists would need to be tested to determine what, if any, the practical limit of the FD is. ---- Alan Davis Senior Architect Ruckus Network, Inc. 703.464.6578 (o) 410.365.7175 (m) [EMAIL PROTECTED] alancdavis AIM > -----Original Message----- > From: Alan Davis [mailto:[EMAIL PROTECTED] > Sent: Tuesday, January 30, 2007 1:29 PM > To: 'Kern Sibbald' > Cc: 'bacula-users@lists.sourceforge.net' > Subject: RE: [Bacula-users] Experience with extremely large fileset > include lists? > > > During the 38 minutes that it takes the FD to do it's set up the CPU is > running at nearly 100% for the FD process. After the FD begins sending > data to the SD the CPU use drops to around 10%, which is normal for a > backup on that server. The transfer rate is also about normal. That server > has a mix of very large db files and many small files - I expect the > backup to take at least 12 hours based on prior experience when using a > more "normal" fileset specification based on directory names rather than > individual files. > > ---- > Alan Davis > Senior Architect > Ruckus Network, Inc. > 703.464.6578 (o) > 410.365.7175 (m) > [EMAIL PROTECTED] > alancdavis AIM > > > > -----Original Message----- > > From: Kern Sibbald [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, January 30, 2007 12:51 PM > > To: Alan Davis > > Cc: bacula-users@lists.sourceforge.net > > Subject: Re: [Bacula-users] Experience with extremely large fileset > > include lists? > > > > On Tuesday 30 January 2007 17:39, Alan Davis wrote: > > > I've modified the timeout in stored/job.c to allow the SD to wait 90 > > > minutes instead of 30, recompiled and installed the modified SD. > > > > > > The test job takes about 38 minutes for the FD to process the fileset, > > > with the FD memory used : > > > > > > Heap: bytes=24,593,473 max_bytes=25,570,460 bufs=296,047 > > > max_bufs=298,836 > > > > Yes, 24MB is really large memory utilization. > > > > > > > > The SD waited for the FD to connect and is running the job as > expected. > > > > I'll be interested to hear the results of running the job. I suspect > that > > it > > will be catastrophically slow. > > > > > > > > > > > > > > ---- > > > Alan Davis > > > Senior Architect > > > Ruckus Network, Inc. > > > 703.464.6578 (o) > > > 410.365.7175 (m) > > > [EMAIL PROTECTED] > > > alancdavis AIM > > > > > > > -----Original Message----- > > > > From: Alan Davis [mailto:[EMAIL PROTECTED] > > > > Sent: Tuesday, January 30, 2007 10:44 AM > > > > To: 'Kern Sibbald'; 'bacula-users@lists.sourceforge.net' > > > > Subject: RE: [Bacula-users] Experience with extremely large fileset > > > > include lists? > > > > > > > > Returning to the original thread... > > > > > > > > Just to make sure I'm being clear - my FileSet specification is: > > > > > > > > FileSet { > > > > Name = "u2LgFileList" > > > > Include { > > > > Options { > > > > signature = MD5 > > > > } > > > > File = </local/etc/u2LgFileList.list > > > > > > > > } > > > > } > > > > > > > > The file /local/etc/u2LgFileList.list has 29K+ entries in it. > > > > Note that this is /not/ an exclude list - it's explicitly listing > the > > > > files to be backed up. > > > > > > > > The FD takes about 40 minutes to read in the file list. > > > > The SD times out in 30 minutes waiting for the FD. > > > > > > > > From my reading of the manual there are directives that set the time > > > > > > that > > > > > > > the Director will wait for an FD to respond "FD Connect Timeout" and > > > > > > the > > > > > > > time that the FD will wait for the SD to respond "SD Connect > Timeout" > > > > > > as > > > > > > > well as the "Heartbeat Interval" that will keep the connection open > > > > > > during > > > > > > > long backups. > > > > > > > > I've not found a directive to modify the length of time that the SD > > > > > > will > > > > > > > wait for the FD to begin transferring data. > > > > > > > > This is the error message from the failed backup message. Note that > > > > > > the > > > > > > > authorization rejection is /not/ the problem - a test backup that > > > > succeeded was used to verify proper authorization and communication > > > > between FD and SD. > > > > > > > > 29-Jan 16:33 athos-dir: Start Backup JobId 112, > > > > > > Job=u2FullBackupJob.2007- > > > > > > > 01-29_16.33.15 > > > > 29-Jan 17:21 u2-fd: u2FullBackupJob.2007-01-29_16.33.15 Fatal error: > > > > Authorization key rejected by Storage daemon. > > > > Please see > > > > > > http://www.bacula.org/rel-manual/faq.html#AuthorizationErrors > > > > > > > for help. > > > > 29-Jan 17:21 u2-fd: u2FullBackupJob.2007-01-29_16.33.15 Fatal error: > > > > Failed to authenticate Storage daemon. > > > > 29-Jan 17:21 athos-dir: u2FullBackupJob.2007-01-29_16.33.15 Fatal > > > > > > error: > > > > Socket error on Storage command: ERR=No data available 29-Jan 17:21 > > > > > > athos- > > > > > > > dir: u2FullBackupJob.2007-01-29_16.33.15 Error: Bacula 1.39.22 > > > > > > (09Sep06): > > > > 29-Jan-2007 17:21:22 > > > > > > > > I think that the timeout is being specified in this code from > > > > > > stored/job.c > > > > > > > gettimeofday(&tv, &tz); > > > > timeout.tv_nsec = tv.tv_usec * 1000; > > > > timeout.tv_sec = tv.tv_sec + 30 * 60; /* wait 30 minutes > */ > > > > > > > > Dmsg1(100, "%s waiting on FD to contact SD\n", jcr->Job); > > > > /* > > > > * Wait for the File daemon to contact us to start the Job, > > > > * when he does, we will be released, unless the 30 minutes > > > > * expires. > > > > */ > > > > P(mutex); > > > > for ( ;!job_canceled(jcr); ) { > > > > errstat = pthread_cond_timedwait(&jcr->job_start_wait, &mutex, > > > > &timeout); > > > > if (errstat == 0 || errstat == ETIMEDOUT) { > > > > break; > > > > } > > > > } > > > > V(mutex); > > > > > > > > > > > > > > > > ---- > > > > Alan Davis > > > > Senior Architect > > > > Ruckus Network, Inc. > > > > 703.464.6578 (o) > > > > 410.365.7175 (m) > > > > [EMAIL PROTECTED] > > > > alancdavis AIM > > > > > > > > > -----Original Message----- > > > > > From: Kern Sibbald [mailto:[EMAIL PROTECTED] > > > > > Sent: Monday, January 29, 2007 4:06 PM > > > > > To: bacula-users@lists.sourceforge.net > > > > > Cc: Alan Davis > > > > > Subject: Re: [Bacula-users] Experience with extremely large > fileset > > > > > include lists? > > > > > > > > > > Hello, > > > > > > > > > > On Monday 29 January 2007 21:19, Alan Davis wrote: > > > > > > Kern, > > > > > > > > > > > > Thanks for the fast response. To clarify a bit - the file list > > > > > > that I > > > > > > > > > would be using would be individual files, not directories. There > > > > > > would > > > > > > > > > be no exclude list as only the files that I need backed up would > > > > > > be > > > > > > > > > listed. > > > > > > > > > > Yes, my answer was based on that assumption. > > > > > > > > > > > I have about 30TB of data files spread over several hundred > > > > > > > > directories. > > > > > > > > > > A true incremental backup will spend large amounts of time > > > > > > determining > > > > > > > > > what files have been changed or added. The information about the > > > > > > modified or new files is stored in a db as a side-effect of > > > > > > processing > > > > > > > > > the files for release to production so building a file list is > > > > > > > > trivial. > > > > > > > > > > The only problem would be the FD's capability of handling a file > > > > > > list > > > > > > > of > > > > > > > > > > 10K+ entries. > > > > > > > > > > All I can say is to try it, but I won't be surprised if it chews > up > > > > > > a > > > > > > > lot > > > > > > > > > of > > > > > CPU. > > > > > > > > > > However, doing an equivalent of an incremental backup by means of > an > > > > > exclusion > > > > > list doesn't seem possible to me. > > > > > > > > > > Bacula is really quite fast in traversing a very large filesystem > > > > > > during > > > > > > > > an > > > > > incremental backup. > > > > > > > > > > > Thanks. > > > > > > > > > > > > ---- > > > > > > Alan Davis > > > > > > Senior Architect > > > > > > Ruckus Network, Inc. > > > > > > 703.464.6578 (o) > > > > > > 410.365.7175 (m) > > > > > > [EMAIL PROTECTED] > > > > > > alancdavis AIM > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Kern Sibbald [mailto:[EMAIL PROTECTED] > > > > > > > Sent: Monday, January 29, 2007 2:47 PM > > > > > > > To: bacula-users@lists.sourceforge.net > > > > > > > Cc: Alan Davis > > > > > > > Subject: Re: [Bacula-users] Experience with extremely large > > > > > > fileset > > > > > > > > > > include lists? > > > > > > > > > > > > > > On Monday 29 January 2007 18:17, Alan Davis wrote: > > > > > > > > I understand that one of the projects is to incorporate > > > > > > features > > > > > > > > > that > > > > > > > > > > > > > > will make very large exclude lists feasible, but does anyone > > > > > > have > > > > > > > > > > > experience, good or bad, with very large include lists in a > > > > > > > > fileset? > > > > > > > > > > > > I'm looking at the possibility of building a backup list > from > > > > > > a db > > > > > > > > > query > > > > > > > > > > > > > > that has the potential to return tens of thousands of files > > > > > > stored > > > > > > > > > in > > > > > > > > > > > > > > hundreds of directories. > > > > > > > > > > > > > > For each file in the directories you specify (normally your > > > > > > whole > > > > > > > > > > filesystem), > > > > > > > Bacula will do a linear search through the exclude list. Thus > > > > > > it > > > > > > > > > could be > > > > > > > > > > > > > extremely CPU intensive. For a large list (more than 1000 > > > > > > files) I > > > > > > > > > > believe > > > > > > > it (the list) needs to be put into a hash tree, which is code > > > > > > that > > > > > > > > > does > > > > > > > > > > > > > not > > > > > > > exist. > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ---- > > > > > > > > > > > > > > > > Alan Davis > > > > > > > > > > > > > > > > Senior Architect > > > > > > > > > > > > > > > > Ruckus Network, Inc. > > > > > > > > > > > > > > > > 703.464.6578 (o) > > > > > > > > > > > > > > > > 410.365.7175 (m) > > > > > > > > > > > > > > > > [EMAIL PROTECTED] > > > > > > > > > > > > > > > > alancdavis AIM > > > > > > ---------------------------------------------------------------------- > > > > > > > -- > > > > > > > > > - > > > > > > > > > > > Take Surveys. Earn Cash. Influence the Future of IT > > > > > > Join SourceForge.net's Techsay panel and you'll get the chance > to > > > > > > > > share > > > > > > > > > > your opinions on IT & business topics through brief surveys - > and > > > > > > earn > > > > > > > > cash > > > > > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE > > > V > > > > > > > > > _______________________________________________ > > > > > > Bacula-users mailing list > > > > > > Bacula-users@lists.sourceforge.net > > > > > > https://lists.sourceforge.net/lists/listinfo/bacula-users ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users