Hello, niedz., 10 lut 2019 o 20:54 <jes...@krogh.cc> napisał(a):
> Hi > > This has been an like-to-have for years, we are ending up braking up > volumes artificially to support bacula, because of the single threaded > nature of the filedaemon. We have LTO6, spooling and 10Gbit network. When > a full backup end up spanning 3 weeks run time - it get very very > painfull. (Example below) > > 10-Feb 07:22 bacula-dir JobId 201646: Error: Bacula bacula-dir 7.0.5 > (28Jul14): > Build OS: x86_64-pc-linux-gnu ubuntu 16.04 > JobId: 201646 > Job: Abe_Daily_RTP.2019-02-01_21.03.30_01 > Backup Level: Full (upgraded from Incremental) > Client: "abe-fd" 7.0.5 (28Jul14) > x86_64-pc-linux-gnu,ubuntu,16.04 > FileSet: "Abe Set RTP" 2019-01-16 21:03:01 > Pool: "Full-Pool" (From Job FullPool override) > Catalog: "MyCatalog" (From Client resource) > Storage: "LTO-5" (From Job resource) > Scheduled time: 01-Feb-2019 21:03:30 > Start time: 02-Feb-2019 05:38:30 > End time: 10-Feb-2019 07:22:30 > Elapsed time: 8 days 1 hour 44 mins > Priority: 10 > FD Files Written: 3,096,049 > SD Files Written: 0 > FD Bytes Written: 3,222,203,306,821 (3.222 TB) > SD Bytes Written: 0 (0 B) > Rate: 4620.0 KB/s > Software Compression: None > VSS: no > Encryption: no > Accurate: no > Volume name(s): > > 005641L5|005746L5|006211L5|006143L5|006125L5|006217L5|006221L5|005100L5|006158L5|006135L5|006175L5|006240L5|005291L5|006297L5|007543L6|007125L6|007180L6|007105L6|005538L5|005050L5|006254L5 > Volume Session Id: 3874 > Volume Session Time: 1544207587 > Last Volume Bytes: 1,964,015,354,880 (1.964 TB) > Non-fatal FD errors: 1 > SD Errors: 0 > FD termination status: Error > SD termination status: Running > Termination: *** Backup Error *** > > Average filesize is 1MB here... > > Underlying disk/filesystems are typically composed of 12/24/36 or more > spinning disks. Disk systems today really need parallelism to perform. > > When dealing with large files, kernel readahead makes thing work nice, but > when someone dumps 100.000 2KB files it slows down to single disk iops > speed. > > True single job parallelism would of course be awesome - multiple > spools, multiple drives, multiple streams over a single fileset. > But that is also very complex. > > I have two “suggestions” for less intrusive benefits. > > 1) When reading a catalog, loop over all files and issue a > posix_fasvise WILLNEED on the first 1MB of the file. > > I have prototyped this outside bacula and it seem to work very > nicely and should be a small non-intrusive patch. It will allow the IO > stack to issue concurrently around the smaller files caching them in > memory. I have inspected the sourcecode and cannot find traces that this > should be in place allready. > The code in question should be available in src/findlib/bfile.c: 1055 #if defined(HAVE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED) 1056 /* If not RDWR or WRONLY must be Read Only */ 1057 if (bfd->fid != -1 && !(flags & (O_RDWR|O_WRONLY))) { 1058 int stat = posix_fadvise(bfd->fid, 0, 0, POSIX_FADV_WILLNEED); 1059 Dmsg3(400, "Did posix_fadvise WILLNEED on %s fid=%d stat=%d\n", fname, bfd->fid, stat); 1060 } 1061 #endif It calls posix_fasvise with POSIX_FADV_WILLNEED flag but touch beginning of the file and not a first 1MB as you requested. > 2) Thread out the filedaemon > Implement a X MB buffer in the filedaemon. could be 16 slots of > max 5MB, for files smaller than 5MB this serves as staging area > for the thread, haning it over to the master process. > Yes, this can be tuned in a lot of ways, but most of us with large > filesystems would easily sacrifice 5-10GB memory on the server, just for > speeding up this stuff. > I do not fully understand your idea. Why do you need a buffer and what is a "master process"? All I understand is that you want to add multiple threads to a single job, right? > > This is more intrusive but can be isolated fully to the filedaemon. > > If someone is willing to help some of this along the way please let me > know and lets see if we can make ends meet. > > Potentially others would like to co-fund here? I feel it unlikely > that we are theonly ones with the need. > > Basics of our installation ~10PB on tape, 0,5 PB live data under > backup, Quantum Scalar i6000 library with 6xLTO6 and 1100 slots. > > Our current bacula catalog has survived since 2006’ish and 5 LTO > generations - pretty impressive by itself. > > I think you should contact Bacula Systems to arrange this kind of sponsored development. I had a lot of discussions with Kern and Eric about "single job multithreaded processing" in the last 5 or 6 years. We designed in our minds almost unlimited number of ideas for this project. -- Radosław Korzeniewski rados...@korzeniewski.net
_______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel