> On 04/10/10, James Harper (james.har...@bendigoit.com.au) wrote: > > > > > > I have developed a catalogue snapshot facility in python to snapshot > > > one job's catalogue and dump it to disk. > ... > > How much smaller is the catalogue subset vs the full catalogue? > > Good question. > > I'm not able to answer that question fully at present as I don't have enough > jobs in my current database to know. > > My currrent database has the following jobs in it: > > jobid | jobfiles | jobgigs > -------+----------+--------- > 1 | 7706717 | 6833.90 > 8 | 3965507 | 4480.83 > 9 | 1273459 | 129.87 > 50 | 646336 | 512.07 > 60 | 7845561 | 6990.67 > > A full pg_dump of the catalogue is 2.8G. The output of the catalogue snapshot > for job 60 is 1.6G. Naturally, the full pg_dump of the whole database will > continue to grow over time. > > (The job 60 cataloge file compresses to about 300MB with bzip2 -9). > > I'm a little suprised that the proportion of job 60 to the whole is so high. > Job 60 is similar to job 1, but I don't expect they share much information. > I'll have to look into that. >
If jobid 60 and job 1 were the same backup job then a lot of the information may be shared in the filename table. Even if they are backups of similar servers then they will share a lot of filename data and that filename data has to come with the extracted catalogue so you might not be saving that much. James ------------------------------------------------------------------------------ Virtualization is moving to the mainstream and overtaking non-virtualized environment for deploying applications. Does it make network security easier or more difficult to achieve? Read this whitepaper to separate the two and get a better understanding. http://p.sf.net/sfu/hp-phase2-d2d _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users