> On Jan 11, 2016, at 3:03 PM, Dan Langille <d...@langille.org> wrote: > > >> On Jan 11, 2016, at 8:35 AM, Eric Bollengier <e...@baculasystems.com >> <mailto:e...@baculasystems.com>> wrote: >> >> Hello Dan, >> >>> >>> I have a test job with 15,764,45 rows in the Batch table. >>> >>> With a change in SQL, I went from 45 minutes to 15 minutes. >>> >>> Old SQL: >>> >>> SELECT batch_testing.FileIndex, batch_testing.JobId, Path.PathId, >>> Filename.FilenameId, batch_testing.LStat, >>> batch_testing.MD5, batch_testing.DeltaSeq >>> FROM batch_testing JOIN Path ON (batch_testing.Path = Path.Path) >>> JOIN Filename ON (batch_testing.Name = >>> Filename.Name); >>> >>> >>> New SQL: >>> >>> explain analyze >>> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, >>> DeltaSeq) >>> SELECT BT.FileIndex, >>> BT.JobId, >>> (SELECT Path.PathId FROM Path WHERE Path.path >>> = BT.Path), >>> (SELECT Filename.FilenameId FROM Filename WHERE Filename.name >>> = BT.Name) , >>> BT.LStat, >>> BT.MD5, >>> BT.DeltaSeq >>> FROM batch_testing BT; >>> >>> NOTE: I have used the table Batch_Testing... because I was testing. :) >>> >> >> Thanks for your feedback. >> >> The difference makes me think that it might be a cache issue. Can you >> give us the detailed execution plan for both queries ? explain (analyze >> and buffers) and your server settings? That might be probably done with >> pg_autoexplain I think. > > The explain analyze output is in the URL: > https://gist.github.com/dlangille/58a414a260f4c6706d7e > <https://gist.github.com/dlangille/58a414a260f4c6706d7e> > > I also just added postgresql.conf
I suggest we ignore what I was trying above. I am still looking at this, mostly because of this issue: 07-Feb 18:01 crey-sd JobId 230447: Sending spooled attrs to the Director. Despooling 422,031,119 bytes ... 08-Feb 01:31 bacula-dir JobId 230447: Bacula bacula-dir 7.4.0 (16Jan16): FD Files Written: 1,365,074 FD Bytes Written: 332,924,576,236 (332.9 GB) The above is the original job. The following is the copy to tape: 09-Feb 00:24 crey-sd JobId 230519: Sending spooled attrs to the Director. Despooling 422,031,119 bytes ... 09-Feb 06:57 bacula-dir JobId 230516: Bacula bacula-dir 7.4.0 (16Jan16): SD Files Written: 1,365,074 SD Bytes Written: 333,216,925,325 (333.2 GB) The first job spends about 5.5 hours getting attributes into the database. The second job spends about 6.5 hours. I did some testing, first by creating and populating a batch_testing table: EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) INSERT INTO batch_testing (fileindex, jobid, path, name, lstat, md5, deltaseq) SELECT F.fileindex, 666, (SELECT P.path FROM path P WHERE P.pathid = F.pathid), (SELECT FN.name FROM filename FN WHERE FN.filenameid = F.filenameid), F.lstat, F.md5, F.deltaseq FROM file F WHERE F.jobid = 230519; That took about 45 seconds Then I did the insert into the File table: EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT BT.FileIndex, BT.JobId, P.PathId, FN.FilenameId, BT.LStat, BT.MD5, BT.DeltaSeq FROM batch_testing BT JOIN Path P ON (BT.Path = P.Path) JOIN Filename FN ON (BT.Name = FN.Name); That took about 122 minutes. Let's say 2 hours. How do we account for the other 3.5+ hours taken by the jobs? Is that population of the Batch table? Do we need a more efficient method for putting the data into the db? Should I be looking at sql_batch_insert in cats/postgresql.c? I know it's not batch_fill_path_query or batch_fill_filename_query (see cats/sql_cmds.c). I ran tests on that. See the two photos at https://twitter.com/DLangille/status/697920332741660675 <https://twitter.com/DLangille/status/697920332741660675> FYI the above plans were created with this visual planner: http://tatiyants.com/pev/#/plans <http://tatiyants.com/pev/#/plans> Prefix your queries with the EXPLAIN I show above, then paste the JSON into the plan. re: https://twitter.com/DLangille/status/697860500386947073 I have reconfirmed that my install has batch insert enabled (https://services.unixathome.org/poudriere/data/102amd64-default-master-list/latest-per-pkg/bacula-server-7.4.0.log <https://services.unixathome.org/poudriere/data/102amd64-default-master-list/latest-per-pkg/bacula-server-7.4.0.log>) -- Dan Langille - BSDCan / PGCon d...@langille.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel