On 08/03/18 09:22, Kern Sibbald wrote: > Hello Phil, > > What is the writeset limit code in attribute spooling and why doesn't it > work?
I dug through all the code last year sometime and within the MySQL database code, there is some code that is *supposed* to spool attribute writes into a temporary table, then commit them to the actual catalog in chunks every so often. However, that latter functionality doesn't actually work; instead, it saves up all of the attribute writes until the end of the job, then dumps the entire temporary table into the catalog at once. On any but small jobs, this will exceed Galera's hard-maximum writeset limit (128K rows if memory serves), and as a result any job that backs up more than 128K files using a Galera cluster for Catalog will fail at the end of the job. Simply setting AttributeSpooling = no solves this problem. (Though having the attribute spooling size limit work properly so that it *does* commit periodic batches of attribute records would probably perform better.) If I knew C++ and the code well enough to offer a rewrite, my general approach would probably be to partition that temporary table into, say, 10000-row partitions on a primary ID (settable by a configuration directive, say AttributeSpoolChunkSize), then each time I start writing rows to a new partition, flush the oldest partition into the Catalog and drop it. Dropping a partition is an atomic operation that is much faster than deleting the individual rows it contains. Ideally of course one would want a separate flush thread to do this, while the foreground thread continues spooling attribute data into the next partition. Another approach would be to have the attribute spooler thread simply create new temporary tables as it goes, writing AttributeSpoolChunkSize rows into each table, then handing them off to a catalog writer thread that despools them and drops them. > Turning off attribute spooling apparently creates other problems > (strange error messages). Interesting. I have not seen any odd errors resulting from disabling it. What I *have* seen is that there is no long hang at the end of the job while the job's attributes are flushed into the Catalog. I haven't been able to benchmark overall job performance with attribute spooling enabled vs. disables, because with attribute spooling enabled I can only run it against standalone MySQL. What I do recall is that small jobs under the 128K-file barrier ran with no problems whatsoever. -- Phil Stracchino Babylon Communications ph...@caerllewys.net p...@co.ordinate.org Landline: +1.603.293.8485 Mobile: +1.603.998.6958 ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel