Hello, (as the original message sent was lost I resend this)
Having a solid background in Oracle (incl. partitioning) and some experience with PostgreSQL (never used partitioning) I decided to make a short study. Partitioning would be the easiest and most reasonable solution for the "file table becomes huge" *if* it worked well (well as in Oracle). The "jobid" field in the "file" table seems to be the obvious partitioning key for me (as far as I read sqlcmds.c, at least) for range partitions (maybe with adding date limit to the "job" table in queries). However, for PostgreSQL, the partitioning is far from convenient-and-complete; they currently discuss whether to add Oracle-like partitioning support; anyway, the current state is not so good. Managing partitions looks crazy for me; to insert data, triggers (or rules - more expensive per row but cheaper per transaction) are required. Worst of all, documentation (of 8.3) states that only constant values in queries can be used for "partition pruning" (i.e. utilizing only "needed" partitions), so no "join conditions" can be utilized for this purpose. I believe it will take a year or more for partitions on PostgreSQL to become "reasonable" (however, they still can be used in spite of being ugly). MySQL, on the other hand, is much smarter on partitions (at least based on documentation). Defining and managing of partitions sounds reasonable. Partition pruning has been implemented in 5.1.6, but it also works for constants only. SQLite does not support partitioning at all AFAIK. So, partitioning the file table by jobid won't work. The only way to utilize partitions is to introduce a new column in the "file" table (don't shoot me! just an integer, maybe). Say, year-number as a partition key or whatever based on the retention period? And here goes the thing that I should have stared with ;-). What are the worst queries in question? what are the problems (problematic scenarios)? I cannot answer myself because my first setup is less than 10Mb so far... If we could assume that clientid is always known (at least in "problematic" queries) that would be of some help maybe (to add it to the "file" table again)... Alex P.S. I've not found any database model picture in the Development documentation so I created one (reverse engineering a PostgreSQL setup and adding foreign keys manually to the picture). There seem to be many unused tables (marked yellow; at least, they are empty in my setup) and unused fields (especially "unused id fields"; foreign keys for them are drawn in red or just missing in the picture); the "red" table "mediatype" is just a total lie. Have I missed something? P.P.S. I am afraid that it is the picture attachment that blocked my sending so I've removed it. Does anybody need the data model picture? Kern Sibbald wrote: >>> Another suggestion that I have for the problem of growing File >>> tables is a sort of compromise. Suppose that we implement two File >>> retention periods. One as currently exists that defines when the >>> records are deleted, and a new period that defines when the records >>> are moved out of the File table and placed in a secondary table >>> perhaps called OldFiles. This would allow users to keep the >>> efficiency for active files high but at the same time allow the >>> delete retention period to be quite long. The database would still >>> grow, but there would be a lot less overhead. Actually the name of >>> the table for these "expired" File records could even be defined on >>> a client by client or Job by Job basis which would allow for having >>> multiple "OldFiles" tables. >>> >>> Another advantage of my suggestion would be that within Bacula >>> itself, switching from using the File table to using the OldFiles >>> table could be made totally automatic (it will require a bit of >>> code, but no massive changes). External programs would still >>> function normally in most cases, but if they wanted to access older >>> data, they would need some modification. >>> >> using this scheme, an admin could configure Bacula to only keep thcsqe >> most current full backup and incrementals in the main (fast) table, >> and move the historic information the the OldFiles table. this would >> allow more optimisation for the DBA than basing it on partitioning in >> the database, I think? >> > > Yes, I need to look at how partitioning works. I have a feeling it will not > solve any of the problems if really gigantic database where some of the data > is used all the time, and other data is almost never used. I haven't given > up on the File and OldFiles table idea. However, I need to research > partitioning because if it really solve the problem correctly, all the > better -- we can focus on the many missing features ... > > Kern > ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
