Hello Phil,

I do not remember any code that does a partial spool.  It could be the vestige of some old attempt to break up the commit.  The current code spools everything then despools it at the end of the job. During insertion of the attribute records it limits the size of the commit to something like 25,000 items (if I remember right).  This is sort of like a despooling, but not quite.  Note, it is possible, that this is only done for Postgresql.  In which case, if you can confirm that it is really submitting 128K records or more, perhaps someone should submit a feature request to implement the
equivalent of what we do in the postgres catalog driver.

On 08/03/2018 03:51 PM, Phil Stracchino wrote:
On 08/03/18 09:22, Kern Sibbald wrote:
Hello Phil,

What is the writeset limit code in attribute spooling and why doesn't it
work?
I dug through all the code last year sometime and within the MySQL
database code, there is some code that is *supposed* to spool attribute
writes into a temporary table, then commit them to the actual catalog in
chunks every so often.  However, that latter functionality doesn't
actually work; instead, it saves up all of the attribute writes until
the end of the job, then dumps the entire temporary table into the
catalog at once.  On any but small jobs, this will exceed Galera's
hard-maximum writeset limit (128K rows if memory serves), and as a
result any job that backs up more than 128K files using a Galera cluster
for Catalog will fail at the end of the job.

Simply setting AttributeSpooling = no solves this problem.  (Though
having the attribute spooling size limit work properly so that it *does*
commit periodic batches of attribute records would probably perform better.)


If I knew C++ and the code well enough to offer a rewrite, my general
approach would probably be to partition that temporary table into, say,
10000-row partitions on a primary ID (settable by a configuration
directive, say AttributeSpoolChunkSize), then each time I start writing
rows to a new partition, flush the oldest partition into the Catalog and
drop it.  Dropping a partition is an atomic operation that is much
faster than deleting the individual rows it contains.  Ideally of course
one would want a separate flush thread to do this, while the foreground
thread continues spooling attribute data into the next partition.
Another approach would be to have the attribute spooler thread simply
create new temporary tables as it goes, writing AttributeSpoolChunkSize
rows into each table, then handing them off to a catalog writer thread
that despools them and drops them.

Turning off attribute spooling apparently creates other problems
(strange error messages).
Interesting.  I have not seen any odd errors resulting from disabling
it.  What I *have* seen is that there is no long hang at the end of the
job while the job's attributes are flushed into the Catalog.

I haven't been able to benchmark overall job performance with attribute
spooling enabled vs. disables, because with attribute spooling enabled I
can only run it against standalone MySQL.  What I do recall is that
small jobs under the 128K-file barrier ran with no problems whatsoever.




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to