On 07/24/17 11:26, Kern Sibbald wrote:
> Hello Phil,
> 
> Well, Bacula does not send millions of rows at a time.  It does already 
> batch data together and submits up to a maximum of 500,000 records at 
> one time.  It has never been necessary to change that number because any 
> respectable database should be able to handle a batch of 500,000 records 
> at a time (about 50-100MB of data)-- even a cluster, in my opinion.   I 
> suspect that the Galera guys are very sadly mistaken to suggest an 
> optimum of only 1,000 at a time for the size of the datasets we see some 
> Bacula customers using, and I will be *extremely* surprised if this 
> problem shows up in Oracle MySQL.

1000 rows is not a hard limit; it is a recommended value for performance
in Galera clusters.  You can use larger transactions, but the cost is
higher memory usage and a higher chance of write commit conflicts.  My
employer had one client who was ignoring that recommendation and had ten
threads simultaneously writing blocks of 100,000 records at a time, and
then couldn't understand why they were getting write conflicts on 2 out
of 3 transactions.

The default hard limit for Galera is 128K rows and 2GB total writeset
size.  But it turns out that transactions that large seriously impact
both performance and memory use.


> That said, you can experiment if you wish and try changing the maximum 
> number of rows (changes).  It is in <bacula>/src/cats/sql_create.c at 
> line 870.  If this is something that comes up frequently, we certainly 
> could put the maximum on a directive in the Catalog resource or perhaps 
> even elsewhere.


I tested setting the limit down from 500,000 to 1000 (actually at line
866, not 870), and it had no visible impact.  However, it didn't solve
the problem, either.  The point of failure is a little before that,
starting at line 824 in bdb_write_batch_file_records( ):

   if (!db_sql_query(jcr->db_batch,
"INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5,
DeltaSeq) "
    "SELECT batch.FileIndex, batch.JobId, Path.PathId, "
           "Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq "
      "FROM batch "
      "JOIN Path ON (batch.Path = Path.Path) "
      "JOIN Filename ON (batch.Name = Filename.Name)",
                     NULL, NULL))
   {
      Jmsg1(jcr, M_FATAL, 0, "Fill File table %s\n", jcr->db_batch->errmsg);
      goto bail_out;
   }

This query causes a 'wsrep_max_ws_rows exceeded' failure on several of
my differential or even incremental jobs, even if the batch size limit
is set to 1000 at line 866.

This puzzled me, so I devised a way to trace the number of rows in the
batch table as the job progresses, and with the batch limit set to 1000
rows at sql_create.c:866, on a differential backup of my workstation the
row count of the batch table (and thus of the JOIN in that query) peaked
at 115,989 rows.  I believe the intention is that the batch table should
be being flushed into the main tables whenever its row count exceeds the
value set at line 866, but that isn't happening.  It appears to be
flushed only at the end of every job.

I hate to say this, but your batch size limit isn't working.




Also, looking at the DB code in general, I have to ask:  Why are you
explicitly locking tables instead of trusting the DB engine to handle
any necessary locking?  In any modern SQL DB engine, explicit table
locks by the application are nearly always a bad idea.  Not only do you
very seldom need to explicitly lock tables, but 99% of the time, you
shouldn't.  Even MySQL's ancient legacy (and all-but-officially
deprecated) MyISAM storage engine will manage all necessary locks for you.

As an experiment, I removed the table locks and unlocks from
bdb_write_batch_file_records( ), and ran several small incremental jobs
that I know will not exceed wsrep_max_ws_rows because they ran
successfully last night - and it worked just fine.  I'm pretty certain
that you'll find, at least on MySQL using InnoDB, and almost certainly
in PostgreSQL as well, that none of those table locks are actually
needed, and in fact it is highly likely they actively harm performance.
The only place I can see it being likely they may be required is in
SQLite, and I believe SQLite support is deprecated for Bacula and
scheduled for removal...?




-- 
  Phil Stracchino
  Babylon Communications
  ph...@caerllewys.net
  p...@co.ordinate.org
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to