Repost from - devel. Hi
We are ending up breaking up volumes artificially to support bacula, because of the single threaded nature of the filedaemon. We have LTO6, spooling and 10Gbit network. When a full backup end up spanning 3 weeks run time - it get very very painfull. (Example below) 10-Feb 07:22 bacula-dir JobId 201646: Error: Bacula bacula-dir 7.0.5 (28Jul14): Build OS: x86_64-pc-linux-gnu ubuntu 16.04 JobId: 201646 Job: Abe_Daily_RTP.2019-02-01_21.03.30_01 Backup Level: Full (upgraded from Incremental) Client: "abe-fd" 7.0.5 (28Jul14) x86_64-pc-linux-gnu,ubuntu,16.04 FileSet: "Abe Set RTP" 2019-01-16 21:03:01 Pool: "Full-Pool" (From Job FullPool override) Catalog: "MyCatalog" (From Client resource) Storage: "LTO-5" (From Job resource) Scheduled time: 01-Feb-2019 21:03:30 Start time: 02-Feb-2019 05:38:30 End time: 10-Feb-2019 07:22:30 Elapsed time: 8 days 1 hour 44 mins Priority: 10 FD Files Written: 3,096,049 SD Files Written: 0 FD Bytes Written: 3,222,203,306,821 (3.222 TB) SD Bytes Written: 0 (0 B) Rate: 4620.0 KB/s Software Compression: None VSS: no Encryption: no Accurate: no Volume name(s): 005641L5|005746L5|006211L5|006143L5|006125L5|006217L5|006221L5|005100L5|006158L5|006135L5|006175L5|006240L5|005291L5|006297L5|007543L6|007125L6|007180L6|007105L6|005538L5|005050L5|006254L5 Volume Session Id: 3874 Volume Session Time: 1544207587 Last Volume Bytes: 1,964,015,354,880 (1.964 TB) Non-fatal FD errors: 1 SD Errors: 0 FD termination status: Error SD termination status: Running Termination: *** Backup Error *** Average filesize is 1MB here... Underlying disk/filesystems are typically composed of 12/24/36 or more spinning disks. Disk systems today really need parallelism to perform. When dealing with large files, kernel readahead makes thing work nice, but when someone dumps 100.000 2KB files it slows down to single disk iops speed. I have a few suggestions for how to get parallism into this. 1) When reading a catalog, loop over all files and issue a posix_fadvise WILLNEED on the first 1MB of the file. I have prototyped this outside bacula and it seem to work very nicely and should be a small non-intrusive patch. It will allow the IO stack to issue concurrently around the smaller files caching them in memory. 2) Thread out the filedaemon Implement a X MB buffer in the filedaemon. could be 16 slots of max 5MB, for files smaller than 5MB this serves as staging area for the thread, haning it over to the master process. Yes, this can be tuned in a lot of ways, but most of us with large filesystems would easily sacrifice 5-10GB memory on the server, just for speeding up this stuff. This is more intrusive but can be isolated fully to the filedaemon. 3) Allow a single job to spool/de-spool concurrently. Currently spooling slows down individual job execution, but it would be relatively simple just to assign 2 spool-buffers to the. One that can be spooled into while the other despools and so on. If someone is willing to help some of this along the way please let me know and lets see if we can make ends meet. Potentially others would like to co-fund here? I feel it unlikely that we are the only ones with the need. Thanks. -- Jesper _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users