Re: [Bacula-devel] [Bacula-users] Database performance issue

John Stoffel Thu, 21 Jul 2011 13:37:33 -0700

>>>>> "Eric" == Eric Bollengier <eric.bolleng...@baculasystems.com> writes:


Eric> Hello,
Eric> On 07/21/2011 04:47 AM, Dan Langille wrote:
>> Moving this over to -devel after my original post on users.
>> 
>> On Jul 20, 2011, at 10:40 PM, Dan Langille wrote:
>> 
>>> On Jul 19, 2011, at 5:45 PM, Roy Sigurd Karlsbakk wrote:
>>> 
>>>>> Starting a file-based restore job, this led to postgresql spending a
>>>>> full 15 minutes in a query. Any idea what can be done to fix this?
>>>>> 
>>>>> http://paste.ubuntu.com/647576/ has a full explain/analyze of the
>>>>> query, but so far, I haven't figured out where the problem is. Anyone
>>>>> with an idea of how to tune this?
>>>> 
>>>> create index file_filenameid on file(filenameid);
>> 
>> Devs: any objection to making this permanent on at least the PostgreSQL 
>> catalog?
>> 
>> There is really no downside to this.

Eric> Yes I have an objection, it will slow down all backups to speed
Eric> up very special restore case.

User's don't care about backups, only restores.  So focus on making
the restores efficient and quick, because that's where the user issues
are.

Eric> I think that the problem is more on the database tuning or on
Eric> the query itself. I have the same kind of query in Bweb and it
Eric> runs instantly (that displays all version of a file for a
Eric> client) on very large catalog.

Maybe it's time for bacula to re-think it's DB schema.  For example,
the Path table is horribly inefficient.  It replicates redundant
data.  For example, from my setup (using Mysql I admit) at home, I
have around 189,000 paths.  Just the first 40 show me part of the trouble:

|     22 | /home/john/src/CueCat/cuecat-0.8.0/contrib/
|     23 | /home/john/src/CueCat/cuecat-0.8.0/cuecat_RS232_pod/images/
|     24 | /home/john/src/CueCat/cuecat-0.8.0/cuecat_RS232_pod/
|     25 | /home/john/src/CueCat/cuecat-0.8.0/old/
|     26 | /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/drivers/char/
|     27 | 
/home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/drivers/input/
|     28 | /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/drivers/
|     29 | /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/init/
|     30 | /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/
|     31 | /home/john/src/CueCat/cuecat-0.8.0/
|     32 | /home/john/src/CueCat/foocat-barcode-0.1.3.1/contrib/
|     33 | /home/john/src/CueCat/foocat-barcode-0.1.3.1/
|     34 | /home/john/src/CueCat/
|     35 | /home/john/src/Cyclades/cyc_async-6.5.5/common/cyclades-z/
|     36 | /home/john/src/Cyclades/cyc_async-6.5.5/common/cyclom-y/
|     37 | /home/john/src/Cyclades/cyc_async-6.5.5/common/lib/

Just for lines 22-34, you have a *ton* of redundant info.  It should
instead be a tree structure with the Schema of:

        Field     Type
        --------- -------
        PathID    int(10)
        ParentID  int(10)
        Path      blob

and you'd also make traversing the structure much more efficient and
simpler.  And the size of those BLOB Path entries would shrink as
well, more than offsetting the size of the ParentID you add.

Eric> When you add new indexes on the File table it leads to support
Eric> problems where people are complaining about backup speed...

So change how you do inserts on backups to batch them up, or wrap them
in a transaction or something.  But keep restores quick and efficient!

>>> bacula=# \d file
>>> Table "public.file"
>>> Column   |  Type   |                       Modifiers
>>> ------------+---------+-------------------------------------------------------
>>> fileid     | bigint  | not null default nextval('file_fileid_seq'::regclass)
>>> fileindex  | integer | not null default 0
>>> jobid      | integer | not null
>>> pathid     | integer | not null
>>> markid     | integer | not null default 0
>>> lstat      | text    | not null
>>> md5        | text    | not null
>>> filenameid | integer | not null
>>> Indexes:
>>> "file_pkey" PRIMARY KEY, btree (fileid)
>>> "file_filenameid_idx" btree (filenameid)
>>> "file_jobid_idx" btree (jobid)
>>> "file_jpfid_idx" btree (jobid, pathid, filenameid)
>>> "file_pathid" btree (pathid)
>>> "file_pathid_idx" btree (pathid)
>>> "testing" btree (fileid)

Eric> Interesting to have two indexes on fileid, and two indexes on pathid :-)

John

------------------------------------------------------------------------------
5 Ways to Improve & Secure Unified Communications
Unified Communications promises greater efficiencies for business. UC can 
improve internal communications as well as offer faster, more efficient ways
to interact with customers and streamline customer service. Learn more!
http://www.accelacomm.com/jaw/sfnl/114/51426253/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] [Bacula-users] Database performance issue

Reply via email to