Hello Phil,


On 07/24/2017 08:33 PM, Phil Stracchino wrote:
On 07/24/17 11:26, Kern Sibbald wrote:
Hello Phil,

Well, Bacula does not send millions of rows at a time.  It does already
batch data together and submits up to a maximum of 500,000 records at
one time.  It has never been necessary to change that number because any
respectable database should be able to handle a batch of 500,000 records
at a time (about 50-100MB of data)-- even a cluster, in my opinion.   I
suspect that the Galera guys are very sadly mistaken to suggest an
optimum of only 1,000 at a time for the size of the datasets we see some
Bacula customers using, and I will be *extremely* surprised if this
problem shows up in Oracle MySQL.
1000 rows is not a hard limit; it is a recommended value for performance
in Galera clusters.
Yes, I understood that.  I maintain what I wrote.

You can use larger transactions, but the cost is
higher memory usage and a higher chance of write commit conflicts.  My
employer had one client who was ignoring that recommendation and had ten
threads simultaneously writing blocks of 100,000 records at a time, and
then couldn't understand why they were getting write conflicts on 2 out
of 3 transactions.

The default hard limit for Galera is 128K rows and 2GB total writeset
size.  But it turns out that transactions that large seriously impact
both performance and memory use.


That said, you can experiment if you wish and try changing the maximum
number of rows (changes).  It is in <bacula>/src/cats/sql_create.c at
line 870.  If this is something that comes up frequently, we certainly
could put the maximum on a directive in the Catalog resource or perhaps
even elsewhere.

I tested setting the limit down from 500,000 to 1000 (actually at line
866, not 870), and it had no visible impact.
You seem to be using an older Bacula.  I am referring to version 9.0.

However, it didn't solve
the problem, either.  The point of failure is a little before that,
starting at line 824 in bdb_write_batch_file_records( ):

    if (!db_sql_query(jcr->db_batch,
"INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5,
DeltaSeq) "
     "SELECT batch.FileIndex, batch.JobId, Path.PathId, "
            "Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq "
       "FROM batch "
       "JOIN Path ON (batch.Path = Path.Path) "
       "JOIN Filename ON (batch.Name = Filename.Name)",
                      NULL, NULL))
    {
       Jmsg1(jcr, M_FATAL, 0, "Fill File table %s\n", jcr->db_batch->errmsg);
       goto bail_out;
    }

This query causes a 'wsrep_max_ws_rows exceeded' failure on several of
my differential or even incremental jobs, even if the batch size limit
is set to 1000 at line 866.
Yes, of course, that is where the actual error occurs. I gave you the location at which bdb_write_batch_file_records() was called, because that is where the check on the number of inserts occurs.

This puzzled me, so I devised a way to trace the number of rows in the
batch table as the job progresses, and with the batch limit set to 1000
rows at sql_create.c:866, on a differential backup of my workstation the
row count of the batch table (and thus of the JOIN in that query) peaked
at 115,989 rows.  I believe the intention is that the batch table should
be being flushed into the main tables whenever its row count exceeds the
value set at line 866, but that isn't happening.  It appears to be
flushed only at the end of every job.

I hate to say this, but your batch size limit isn't working.
It is possible that the person who implemented the code forgot to add an update of the counter at all the appropriate spots.




Also, looking at the DB code in general, I have to ask:  Why are you
explicitly locking tables instead of trusting the DB engine to handle
any necessary locking?  In any modern SQL DB engine, explicit table
locks by the application are nearly always a bad idea.  Not only do you
very seldom need to explicitly lock tables, but 99% of the time, you
shouldn't.  Even MySQL's ancient legacy (and all-but-officially
deprecated) MyISAM storage engine will manage all necessary locks for you.

As an experiment, I removed the table locks and unlocks from
bdb_write_batch_file_records( ), and ran several small incremental jobs
that I know will not exceed wsrep_max_ws_rows because they ran
successfully last night - and it worked just fine.  I'm pretty certain
that you'll find, at least on MySQL using InnoDB, and almost certainly
in PostgreSQL as well, that none of those table locks are actually
needed, and in fact it is highly likely they actively harm performance.
The only place I can see it being likely they may be required is in
SQLite, and I believe SQLite support is deprecated for Bacula and
scheduled for removal...?
If you want to try to remove table locks, you are completely on your own, and I will postulate 99% that you will encounter conflicts. If the "table locks" you mention are calls to bdb_lock(), these are not table locks but a global database lock that are important to the proper functioning of the batch insert code as implemented in Bacula. That said, it is remotely possible that they may be less critical due to some recent changes in how we use the database, but I am not prepared to try to remove them especially since we are not seeing performance problems even with monster databases.

Best regards,
Kern


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to