Hello, Very interesting results, not too surprising. See below ...
On Wednesday 31 January 2007 15:52, Alan Davis wrote: > I've completed my initial tests with very large include lists with a > successful outcome. > > The final test list was intended to simulate a backup file list > generated by a db query to backup new files or files modified by a part > of our production process. The filesystems are in the 30TB range > containing millions of files in hundreds of directories. > > The file list was generated to include only regular files, no links, > devices files, sockets, etc. and excluded bare directory names. All > files were specified using their full path. > > The file list had 292296 entries. > > It took longer than the default FD-SD timeout of 30 minutes to ingest > the file list from the Director. I've modified the SD to change the > timeout from 30 minutes to 90 minutes to allow the SD to wait long > enough for the FD to begin sending data. > > While the FD was reading the file list from the Director the FD was > using nearly 100% of the CPU. Yes, the list structure of included and excluded files is really intended for very small numbers. I think you could get a *very* significant reduction in the time needed to "ingest" the file list from the Director by modifying line 699 of <bacula-source>/src/filed/job.c from: fileset->incexe->name_list.init(1, true); to fileset->incexe->name_list.init(10000, true); The 10,000 may be a bit of over kill, and you could very likely get almost equally good performance by setting it to 1000. It seems to me that setting it to 10,000 is worth a try. It is not something I would do in the mainstream code, but it would probably solve your performance problem. I would have no problem to set it to say 100 in the mainstream code. A better solution (probably), but one requiring more work would be to change the list from an alist (allocated, indexed list) to a dlist (doubly linked list). For an alist adding a new member is very expensive when the growth size of the list (first argument above) is exceeded. On the other hand, adding a new member is no more expensive than adding any other member for a dlist. I'll be interested how much the change I suggest above will improve things ... Best regards, Kern > This usage then dropped back to a more > normal 10%. > > Memory use on the FD was larger than normal, with max_bytes of > approximately 24MB. > > Data transfer rates were normal for that server. Time to complete the > backup was no longer than expected for that server and amount of data. > > Conclusion : Using a file list consisting of only the files to be backed > up and excluding bare directory entries (which would cause the full > content of the directory to be backed up) is possible and scales > reasonably into the 200,000+ file range. Larger file lists would need to > be tested to determine what, if any, the practical limit of the FD is. > > > ---- > Alan Davis > Senior Architect > Ruckus Network, Inc. > 703.464.6578 (o) > 410.365.7175 (m) > [EMAIL PROTECTED] > alancdavis AIM > > > -----Original Message----- > > From: Alan Davis [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, January 30, 2007 1:29 PM > > To: 'Kern Sibbald' > > Cc: 'bacula-users@lists.sourceforge.net' > > Subject: RE: [Bacula-users] Experience with extremely large fileset > > include lists? > > > > > > During the 38 minutes that it takes the FD to do it's set up the CPU > > is > > > running at nearly 100% for the FD process. After the FD begins sending > > data to the SD the CPU use drops to around 10%, which is normal for a > > backup on that server. The transfer rate is also about normal. That > > server > > > has a mix of very large db files and many small files - I expect the > > backup to take at least 12 hours based on prior experience when using > > a > > > more "normal" fileset specification based on directory names rather > > than > > > individual files. > > > > ---- > > Alan Davis > > Senior Architect > > Ruckus Network, Inc. > > 703.464.6578 (o) > > 410.365.7175 (m) > > [EMAIL PROTECTED] > > alancdavis AIM > > > > > -----Original Message----- > > > From: Kern Sibbald [mailto:[EMAIL PROTECTED] > > > Sent: Tuesday, January 30, 2007 12:51 PM > > > To: Alan Davis > > > Cc: bacula-users@lists.sourceforge.net > > > Subject: Re: [Bacula-users] Experience with extremely large fileset > > > include lists? > > > > > > On Tuesday 30 January 2007 17:39, Alan Davis wrote: > > > > I've modified the timeout in stored/job.c to allow the SD to wait > > 90 > > > > > minutes instead of 30, recompiled and installed the modified SD. > > > > > > > > The test job takes about 38 minutes for the FD to process the > > fileset, > > > > > with the FD memory used : > > > > > > > > Heap: bytes=24,593,473 max_bytes=25,570,460 bufs=296,047 > > > > max_bufs=298,836 > > > > > > Yes, 24MB is really large memory utilization. > > > > > > > The SD waited for the FD to connect and is running the job as > > > > expected. > > > > > I'll be interested to hear the results of running the job. I > > suspect > > > that > > > > > it > > > will be catastrophically slow. > > > > > > > ---- > > > > Alan Davis > > > > Senior Architect > > > > Ruckus Network, Inc. > > > > 703.464.6578 (o) > > > > 410.365.7175 (m) > > > > [EMAIL PROTECTED] > > > > alancdavis AIM > > > > > > > > > -----Original Message----- > > > > > From: Alan Davis [mailto:[EMAIL PROTECTED] > > > > > Sent: Tuesday, January 30, 2007 10:44 AM > > > > > To: 'Kern Sibbald'; 'bacula-users@lists.sourceforge.net' > > > > > Subject: RE: [Bacula-users] Experience with extremely large > > fileset > > > > > > include lists? > > > > > > > > > > Returning to the original thread... > > > > > > > > > > Just to make sure I'm being clear - my FileSet specification is: > > > > > > > > > > FileSet { > > > > > Name = "u2LgFileList" > > > > > Include { > > > > > Options { > > > > > signature = MD5 > > > > > } > > > > > File = </local/etc/u2LgFileList.list > > > > > > > > > > } > > > > > } > > > > > > > > > > The file /local/etc/u2LgFileList.list has 29K+ entries in it. > > > > > Note that this is /not/ an exclude list - it's explicitly > > listing > > > the > > > > > > > files to be backed up. > > > > > > > > > > The FD takes about 40 minutes to read in the file list. > > > > > The SD times out in 30 minutes waiting for the FD. > > > > > > > > > > From my reading of the manual there are directives that set the > > time > > > > > that > > > > > > > > > the Director will wait for an FD to respond "FD Connect Timeout" > > and > > > > > the > > > > > > > > > time that the FD will wait for the SD to respond "SD Connect > > > > Timeout" > > > > > > as > > > > > > > > > well as the "Heartbeat Interval" that will keep the connection > > open > > > > > during > > > > > > > > > long backups. > > > > > > > > > > I've not found a directive to modify the length of time that the > > SD > > > > > will > > > > > > > > > wait for the FD to begin transferring data. > > > > > > > > > > This is the error message from the failed backup message. Note > > that > > > > > the > > > > > > > > > authorization rejection is /not/ the problem - a test backup > > that > > > > > > succeeded was used to verify proper authorization and > > communication > > > > > > between FD and SD. > > > > > > > > > > 29-Jan 16:33 athos-dir: Start Backup JobId 112, > > > > > > > > Job=u2FullBackupJob.2007- > > > > > > > > > 01-29_16.33.15 > > > > > 29-Jan 17:21 u2-fd: u2FullBackupJob.2007-01-29_16.33.15 Fatal > > error: > > > > > Authorization key rejected by Storage daemon. > > > > > Please see > > > > > > > > http://www.bacula.org/rel-manual/faq.html#AuthorizationErrors > > > > > > > > > for help. > > > > > 29-Jan 17:21 u2-fd: u2FullBackupJob.2007-01-29_16.33.15 Fatal > > error: > > > > > Failed to authenticate Storage daemon. > > > > > 29-Jan 17:21 athos-dir: u2FullBackupJob.2007-01-29_16.33.15 > > Fatal > > > > > error: > > > > > Socket error on Storage command: ERR=No data available 29-Jan > > 17:21 > > > > > athos- > > > > > > > > > dir: u2FullBackupJob.2007-01-29_16.33.15 Error: Bacula 1.39.22 > > > > > > > > (09Sep06): > > > > > 29-Jan-2007 17:21:22 > > > > > > > > > > I think that the timeout is being specified in this code from > > > > > > > > stored/job.c > > > > > > > > > gettimeofday(&tv, &tz); > > > > > timeout.tv_nsec = tv.tv_usec * 1000; > > > > > timeout.tv_sec = tv.tv_sec + 30 * 60; /* wait 30 > > minutes > > > */ > > > > > > > Dmsg1(100, "%s waiting on FD to contact SD\n", jcr->Job); > > > > > /* > > > > > * Wait for the File daemon to contact us to start the Job, > > > > > * when he does, we will be released, unless the 30 minutes > > > > > * expires. > > > > > */ > > > > > P(mutex); > > > > > for ( ;!job_canceled(jcr); ) { > > > > > errstat = pthread_cond_timedwait(&jcr->job_start_wait, > > &mutex, > > > > > > &timeout); > > > > > if (errstat == 0 || errstat == ETIMEDOUT) { > > > > > break; > > > > > } > > > > > } > > > > > V(mutex); > > > > > > > > > > > > > > > > > > > > ---- > > > > > Alan Davis > > > > > Senior Architect > > > > > Ruckus Network, Inc. > > > > > 703.464.6578 (o) > > > > > 410.365.7175 (m) > > > > > [EMAIL PROTECTED] > > > > > alancdavis AIM > > > > > > > > > > > -----Original Message----- > > > > > > From: Kern Sibbald [mailto:[EMAIL PROTECTED] > > > > > > Sent: Monday, January 29, 2007 4:06 PM > > > > > > To: bacula-users@lists.sourceforge.net > > > > > > Cc: Alan Davis > > > > > > Subject: Re: [Bacula-users] Experience with extremely large > > > > fileset > > > > > > > > include lists? > > > > > > > > > > > > Hello, > > > > > > > > > > > > On Monday 29 January 2007 21:19, Alan Davis wrote: > > > > > > > Kern, > > > > > > > > > > > > > > Thanks for the fast response. To clarify a bit - the file > > list > > > > > that I > > > > > > > > > > > would be using would be individual files, not directories. > > There > > > > > would > > > > > > > > > > > be no exclude list as only the files that I need backed up > > would > > > > > be > > > > > > > > > > > listed. > > > > > > > > > > > > Yes, my answer was based on that assumption. > > > > > > > > > > > > > I have about 30TB of data files spread over several hundred > > > > > > > > > > directories. > > > > > > > > > > > > A true incremental backup will spend large amounts of time > > > > > > > > determining > > > > > > > > > > > what files have been changed or added. The information about > > the > > > > > > > > modified or new files is stored in a db as a side-effect of > > > > > > > > processing > > > > > > > > > > > the files for release to production so building a file list > > is > > > > > > trivial. > > > > > > > > > > > > The only problem would be the FD's capability of handling a > > file > > > > > list > > > > > > > > > of > > > > > > > > > > > > 10K+ entries. > > > > > > > > > > > > All I can say is to try it, but I won't be surprised if it > > chews > > > up > > > > > > a > > > > > > > > > lot > > > > > > > > > > > of > > > > > > CPU. > > > > > > > > > > > > However, doing an equivalent of an incremental backup by means > > of > > > an > > > > > > > > exclusion > > > > > > list doesn't seem possible to me. > > > > > > > > > > > > Bacula is really quite fast in traversing a very large > > filesystem > > > > > during > > > > > > > > > > an > > > > > > incremental backup. > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > ---- > > > > > > > Alan Davis > > > > > > > Senior Architect > > > > > > > Ruckus Network, Inc. > > > > > > > 703.464.6578 (o) > > > > > > > 410.365.7175 (m) > > > > > > > [EMAIL PROTECTED] > > > > > > > alancdavis AIM > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Kern Sibbald [mailto:[EMAIL PROTECTED] > > > > > > > > Sent: Monday, January 29, 2007 2:47 PM > > > > > > > > To: bacula-users@lists.sourceforge.net > > > > > > > > Cc: Alan Davis > > > > > > > > Subject: Re: [Bacula-users] Experience with extremely > > large > > > > > fileset > > > > > > > > > > > > include lists? > > > > > > > > > > > > > > > > On Monday 29 January 2007 18:17, Alan Davis wrote: > > > > > > > > > I understand that one of the projects is to incorporate > > > > > > > > features > > > > > > > > > > > that > > > > > > > > > > > > > > > > will make very large exclude lists feasible, but does > > anyone > > > > > have > > > > > > > > > > > > > experience, good or bad, with very large include lists > > in a > > > > > > fileset? > > > > > > > > > > > > > > I'm looking at the possibility of building a backup list > > > > from > > > > > > a db > > > > > > > > > > > query > > > > > > > > > > > > > > > > that has the potential to return tens of thousands of > > files > > > > > stored > > > > > > > > > > > in > > > > > > > > > > > > > > > > hundreds of directories. > > > > > > > > > > > > > > > > For each file in the directories you specify (normally > > your > > > > > whole > > > > > > > > > > > > filesystem), > > > > > > > > Bacula will do a linear search through the exclude list. > > Thus > > > > > it > > > > > > > > > > > could be > > > > > > > > > > > > > > > extremely CPU intensive. For a large list (more than 1000 > > > > > > > > files) I > > > > > > > > > > > > believe > > > > > > > > it (the list) needs to be put into a hash tree, which is > > code > > > > > that > > > > > > > > > > > does > > > > > > > > > > > > > > > not > > > > > > > > exist. > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ---- > > > > > > > > > > > > > > > > > > Alan Davis > > > > > > > > > > > > > > > > > > Senior Architect > > > > > > > > > > > > > > > > > > Ruckus Network, Inc. > > > > > > > > > > > > > > > > > > 703.464.6578 (o) > > > > > > > > > > > > > > > > > > 410.365.7175 (m) > > > > > > > > > > > > > > > > > > [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > alancdavis AIM > > ---------------------------------------------------------------------- > > > > > > -- > > > > > > > > > > > - > > > > > > > > > > > > > Take Surveys. Earn Cash. Influence the Future of IT > > > > > > > Join SourceForge.net's Techsay panel and you'll get the > > chance > > > to > > > > > > > share > > > > > > > > > > > > your opinions on IT & business topics through brief surveys > > - > > > and > > > > > > earn > > > > > > > > > > cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE > > > > > V > > > > > > > > > > > _______________________________________________ > > > > > > > Bacula-users mailing list > > > > > > > Bacula-users@lists.sourceforge.net > > > > > > > https://lists.sourceforge.net/lists/listinfo/bacula-users ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users