>>>>> "Eric" == Eric Bollengier <eric.bolleng...@baculasystems.com> writes:
Eric> Hello, Eric> On 07/21/2011 04:47 AM, Dan Langille wrote: >> Moving this over to -devel after my original post on users. >> >> On Jul 20, 2011, at 10:40 PM, Dan Langille wrote: >> >>> On Jul 19, 2011, at 5:45 PM, Roy Sigurd Karlsbakk wrote: >>> >>>>> Starting a file-based restore job, this led to postgresql spending a >>>>> full 15 minutes in a query. Any idea what can be done to fix this? >>>>> >>>>> http://paste.ubuntu.com/647576/ has a full explain/analyze of the >>>>> query, but so far, I haven't figured out where the problem is. Anyone >>>>> with an idea of how to tune this? >>>> >>>> create index file_filenameid on file(filenameid); >> >> Devs: any objection to making this permanent on at least the PostgreSQL >> catalog? >> >> There is really no downside to this. Eric> Yes I have an objection, it will slow down all backups to speed Eric> up very special restore case. User's don't care about backups, only restores. So focus on making the restores efficient and quick, because that's where the user issues are. Eric> I think that the problem is more on the database tuning or on Eric> the query itself. I have the same kind of query in Bweb and it Eric> runs instantly (that displays all version of a file for a Eric> client) on very large catalog. Maybe it's time for bacula to re-think it's DB schema. For example, the Path table is horribly inefficient. It replicates redundant data. For example, from my setup (using Mysql I admit) at home, I have around 189,000 paths. Just the first 40 show me part of the trouble: | 22 | /home/john/src/CueCat/cuecat-0.8.0/contrib/ | 23 | /home/john/src/CueCat/cuecat-0.8.0/cuecat_RS232_pod/images/ | 24 | /home/john/src/CueCat/cuecat-0.8.0/cuecat_RS232_pod/ | 25 | /home/john/src/CueCat/cuecat-0.8.0/old/ | 26 | /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/drivers/char/ | 27 | /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/drivers/input/ | 28 | /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/drivers/ | 29 | /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/init/ | 30 | /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/ | 31 | /home/john/src/CueCat/cuecat-0.8.0/ | 32 | /home/john/src/CueCat/foocat-barcode-0.1.3.1/contrib/ | 33 | /home/john/src/CueCat/foocat-barcode-0.1.3.1/ | 34 | /home/john/src/CueCat/ | 35 | /home/john/src/Cyclades/cyc_async-6.5.5/common/cyclades-z/ | 36 | /home/john/src/Cyclades/cyc_async-6.5.5/common/cyclom-y/ | 37 | /home/john/src/Cyclades/cyc_async-6.5.5/common/lib/ Just for lines 22-34, you have a *ton* of redundant info. It should instead be a tree structure with the Schema of: Field Type --------- ------- PathID int(10) ParentID int(10) Path blob and you'd also make traversing the structure much more efficient and simpler. And the size of those BLOB Path entries would shrink as well, more than offsetting the size of the ParentID you add. Eric> When you add new indexes on the File table it leads to support Eric> problems where people are complaining about backup speed... So change how you do inserts on backups to batch them up, or wrap them in a transaction or something. But keep restores quick and efficient! >>> bacula=# \d file >>> Table "public.file" >>> Column | Type | Modifiers >>> ------------+---------+------------------------------------------------------- >>> fileid | bigint | not null default nextval('file_fileid_seq'::regclass) >>> fileindex | integer | not null default 0 >>> jobid | integer | not null >>> pathid | integer | not null >>> markid | integer | not null default 0 >>> lstat | text | not null >>> md5 | text | not null >>> filenameid | integer | not null >>> Indexes: >>> "file_pkey" PRIMARY KEY, btree (fileid) >>> "file_filenameid_idx" btree (filenameid) >>> "file_jobid_idx" btree (jobid) >>> "file_jpfid_idx" btree (jobid, pathid, filenameid) >>> "file_pathid" btree (pathid) >>> "file_pathid_idx" btree (pathid) >>> "testing" btree (fileid) Eric> Interesting to have two indexes on fileid, and two indexes on pathid :-) John ------------------------------------------------------------------------------ 5 Ways to Improve & Secure Unified Communications Unified Communications promises greater efficiencies for business. UC can improve internal communications as well as offer faster, more efficient ways to interact with customers and streamline customer service. Learn more! http://www.accelacomm.com/jaw/sfnl/114/51426253/ _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel