> On Jan 11, 2016, at 3:03 PM, Dan Langille <[email protected]> wrote: > > >> On Jan 11, 2016, at 8:35 AM, Eric Bollengier <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hello Dan, >> >>> >>> I have a test job with 15,764,45 rows in the Batch table. >>> >>> With a change in SQL, I went from 45 minutes to 15 minutes. >>> >>> Old SQL: >>> >>> SELECT batch_testing.FileIndex, batch_testing.JobId, Path.PathId, >>> Filename.FilenameId, batch_testing.LStat, >>> batch_testing.MD5, batch_testing.DeltaSeq >>> FROM batch_testing JOIN Path ON (batch_testing.Path = Path.Path) >>> JOIN Filename ON (batch_testing.Name = >>> Filename.Name); >>> >>> >>> New SQL: >>> >>> explain analyze >>> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, >>> DeltaSeq) >>> SELECT BT.FileIndex, >>> BT.JobId, >>> (SELECT Path.PathId FROM Path WHERE Path.path >>> = BT.Path), >>> (SELECT Filename.FilenameId FROM Filename WHERE Filename.name >>> = BT.Name) , >>> BT.LStat, >>> BT.MD5, >>> BT.DeltaSeq >>> FROM batch_testing BT; >>> >>> NOTE: I have used the table Batch_Testing... because I was testing. :) >>> >> >> Thanks for your feedback. >> >> The difference makes me think that it might be a cache issue. Can you >> give us the detailed execution plan for both queries ? explain (analyze >> and buffers) and your server settings? That might be probably done with >> pg_autoexplain I think. > > The explain analyze output is in the URL: > https://gist.github.com/dlangille/58a414a260f4c6706d7e > <https://gist.github.com/dlangille/58a414a260f4c6706d7e> > > I also just added postgresql.conf
I suggest we ignore what I was trying above.
I am still looking at this, mostly because of this issue:
07-Feb 18:01 crey-sd JobId 230447: Sending spooled attrs to the Director.
Despooling 422,031,119 bytes ...
08-Feb 01:31 bacula-dir JobId 230447: Bacula bacula-dir 7.4.0 (16Jan16):
FD Files Written: 1,365,074
FD Bytes Written: 332,924,576,236 (332.9 GB)
The above is the original job. The following is the copy to tape:
09-Feb 00:24 crey-sd JobId 230519: Sending spooled attrs to the Director.
Despooling 422,031,119 bytes ...
09-Feb 06:57 bacula-dir JobId 230516: Bacula bacula-dir 7.4.0 (16Jan16):
SD Files Written: 1,365,074
SD Bytes Written: 333,216,925,325 (333.2 GB)
The first job spends about 5.5 hours getting attributes into the database. The
second job spends about 6.5 hours.
I did some testing, first by creating and populating a batch_testing table:
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
INSERT INTO batch_testing (fileindex, jobid, path, name, lstat, md5, deltaseq)
SELECT F.fileindex,
666,
(SELECT P.path FROM path P WHERE P.pathid = F.pathid),
(SELECT FN.name FROM filename FN WHERE FN.filenameid = F.filenameid),
F.lstat,
F.md5,
F.deltaseq
FROM file F
WHERE F.jobid = 230519;
That took about 45 seconds
Then I did the insert into the File table:
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq)
SELECT BT.FileIndex, BT.JobId, P.PathId, FN.FilenameId, BT.LStat, BT.MD5,
BT.DeltaSeq
FROM batch_testing BT JOIN Path P ON (BT.Path = P.Path)
JOIN Filename FN ON (BT.Name = FN.Name);
That took about 122 minutes. Let's say 2 hours.
How do we account for the other 3.5+ hours taken by the jobs? Is that
population of the Batch table?
Do we need a more efficient method for putting the data into the db?
Should I be looking at sql_batch_insert in cats/postgresql.c?
I know it's not batch_fill_path_query or batch_fill_filename_query (see
cats/sql_cmds.c). I ran tests on that. See the two photos at
https://twitter.com/DLangille/status/697920332741660675
<https://twitter.com/DLangille/status/697920332741660675>
FYI the above plans were created with this visual planner:
http://tatiyants.com/pev/#/plans <http://tatiyants.com/pev/#/plans>
Prefix your queries with the EXPLAIN I show above, then paste the JSON into the
plan.
re: https://twitter.com/DLangille/status/697860500386947073
I have reconfirmed that my install has batch insert enabled
(https://services.unixathome.org/poudriere/data/102amd64-default-master-list/latest-per-pkg/bacula-server-7.4.0.log
<https://services.unixathome.org/poudriere/data/102amd64-default-master-list/latest-per-pkg/bacula-server-7.4.0.log>)
--
Dan Langille - BSDCan / PGCon
[email protected]
signature.asc
Description: Message signed with OpenPGP using GPGMail
------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
