On 07/24/17 11:26, Kern Sibbald wrote: > Hello Phil, > > Well, Bacula does not send millions of rows at a time. It does already > batch data together and submits up to a maximum of 500,000 records at > one time. It has never been necessary to change that number because any > respectable database should be able to handle a batch of 500,000 records > at a time (about 50-100MB of data)-- even a cluster, in my opinion. I > suspect that the Galera guys are very sadly mistaken to suggest an > optimum of only 1,000 at a time for the size of the datasets we see some > Bacula customers using, and I will be *extremely* surprised if this > problem shows up in Oracle MySQL.
1000 rows is not a hard limit; it is a recommended value for performance in Galera clusters. You can use larger transactions, but the cost is higher memory usage and a higher chance of write commit conflicts. My employer had one client who was ignoring that recommendation and had ten threads simultaneously writing blocks of 100,000 records at a time, and then couldn't understand why they were getting write conflicts on 2 out of 3 transactions. The default hard limit for Galera is 128K rows and 2GB total writeset size. But it turns out that transactions that large seriously impact both performance and memory use. > That said, you can experiment if you wish and try changing the maximum > number of rows (changes). It is in <bacula>/src/cats/sql_create.c at > line 870. If this is something that comes up frequently, we certainly > could put the maximum on a directive in the Catalog resource or perhaps > even elsewhere. I tested setting the limit down from 500,000 to 1000 (actually at line 866, not 870), and it had no visible impact. However, it didn't solve the problem, either. The point of failure is a little before that, starting at line 824 in bdb_write_batch_file_records( ): if (!db_sql_query(jcr->db_batch, "INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) " "SELECT batch.FileIndex, batch.JobId, Path.PathId, " "Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq " "FROM batch " "JOIN Path ON (batch.Path = Path.Path) " "JOIN Filename ON (batch.Name = Filename.Name)", NULL, NULL)) { Jmsg1(jcr, M_FATAL, 0, "Fill File table %s\n", jcr->db_batch->errmsg); goto bail_out; } This query causes a 'wsrep_max_ws_rows exceeded' failure on several of my differential or even incremental jobs, even if the batch size limit is set to 1000 at line 866. This puzzled me, so I devised a way to trace the number of rows in the batch table as the job progresses, and with the batch limit set to 1000 rows at sql_create.c:866, on a differential backup of my workstation the row count of the batch table (and thus of the JOIN in that query) peaked at 115,989 rows. I believe the intention is that the batch table should be being flushed into the main tables whenever its row count exceeds the value set at line 866, but that isn't happening. It appears to be flushed only at the end of every job. I hate to say this, but your batch size limit isn't working. Also, looking at the DB code in general, I have to ask: Why are you explicitly locking tables instead of trusting the DB engine to handle any necessary locking? In any modern SQL DB engine, explicit table locks by the application are nearly always a bad idea. Not only do you very seldom need to explicitly lock tables, but 99% of the time, you shouldn't. Even MySQL's ancient legacy (and all-but-officially deprecated) MyISAM storage engine will manage all necessary locks for you. As an experiment, I removed the table locks and unlocks from bdb_write_batch_file_records( ), and ran several small incremental jobs that I know will not exceed wsrep_max_ws_rows because they ran successfully last night - and it worked just fine. I'm pretty certain that you'll find, at least on MySQL using InnoDB, and almost certainly in PostgreSQL as well, that none of those table locks are actually needed, and in fact it is highly likely they actively harm performance. The only place I can see it being likely they may be required is in SQLite, and I believe SQLite support is deprecated for Bacula and scheduled for removal...? -- Phil Stracchino Babylon Communications ph...@caerllewys.net p...@co.ordinate.org Landline: +1.603.293.8485 Mobile: +1.603.998.6958 ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel