Nordgren, Bryce L -FS <bnordgren <at> fs.fed.us> writes: > > > > -----Original Message----- > > Compression method is part of the fileset and not stored anywhere in the > > database. > > Eeep. I've been changing the compression method in the fileset definition > to experiment (e.g., the same fileset, over time, has been "no > compression", "lz4", "lzfast", etc.) Does this mean that I will be unable > to retrieve data from any job where the compression method on the storage > volume does not match the current definition on my director? > No because the data is saved in so called data stream which have a certain type and compressed streams like lz4/lzo/gzip etc have a header which describes what compression is used so there is no need to store it in the database. You could even mix and match different compression methods in theory in one fileset (as some work better then others on particular data.) We can even rewrite data streams on the storage daemon nowadays so you can compress them on the client and so send them quickly to the SD but decompress them there and write them to a hardware compression LTO drive.
> If so, I want to request that compression method be stored in the > database with every job!! :) No need to do that as there can be multiple methods in one backup. > > > > > you could also try lz4, lz4hc and lzfast. > > > > > > They're on the list. I'm more or less just changing the compression > > > method between nightly incrementals to observe the impact. That takes > > > some time... :) > > > > > Indeed also when you have these vast amounts of data > > LZ4 results: 839,555,606,105 SD bytes written in 6 hours 36 mins 36 secs > at 89.6% compression ratio. That's a gain of 289% over "no compression", > at an effective backup rate of 339 MB/s. :) Sounds quite ok. LZ4 is also a very low cpu compressor at the cost of not compressing to much but it is one of the compressors who is the smartest and fastest when it comes to uncompressable data gzip is notorious for creating bigger data on already compressed data. > > > That all depends of course largely on the filesystem used. Filesystems > > like ZFS have no real problem when you have different separate > > sequential streams. For the SD you should use spooling so it doesn't > > start interleaving the data when you run multiple backup jobs. > > For now, both the FD and SD are dealing with XFS on CentOS6 boxes. > Is xfs ok with separate sequential streams too? I have no idea how XFS does in this area but the newer filesystems are quite ok with multiple parallel reading streams as they prefetch the data and understand these kind of sequential reading patterns. > > What if you opened a pipe to the command line utility? > Seems like this would be a much less invasive method of > approaching this problem, even allowing the FD to stay > single-threaded. Also, you could mess with the buffer > size via command line options. According to the plots > on the pigz website, buffer size seems to make > a pretty big difference...No sense reinventing the wheel. But passing on data within a threaded program is probably ways faster then writing and reading it from pipes which means you read and write your data multiple times while you will probably never reach memory speeds that you do get with how we now do internal data gathering and compressing and encrypting data. Keep in mind that we do multiple steps e.g. compress before encrypt because encrypted data doesn't compress very well. But you can always write a C or Python file daemon plugin to see if it brings anything. -- Marco van Wieringen [email protected] Bareos GmbH & Co. KG Phone: +49-221-63069389 http://www.bareos.com Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646 Komplementär: Bareos Verwaltungs-GmbH Geschäftsführer: Stephan Dühr, M. Außendorf, J. Steffens, P. Storz, M. v. Wieringen -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
