Nordgren, Bryce L -FS <bnordgren <at> fs.fed.us> writes:

> 
> 
> > -----Original Message-----
> > Compression method is part of the fileset and not stored anywhere in the
> > database.
> 
> Eeep. I've been changing the compression method in the fileset definition
> to experiment (e.g., the same fileset, over time, has been "no
> compression", "lz4", "lzfast", etc.) Does this mean that I will be unable
> to retrieve data from any job where the compression method on the storage
> volume does not match the current definition on my director?
> 
No because the data is saved in so called data stream which have a
certain type and compressed streams like lz4/lzo/gzip etc have a
header which describes what compression is used so there is no need
to store it in the database. You could even mix and match different
compression methods in theory in one fileset (as some work better then
others on particular data.) We can even rewrite data streams on the
storage daemon nowadays so you can compress them on the client and
so send them quickly to the SD but decompress them there and write
them to a hardware compression LTO drive.

> If so, I want to request that compression method be stored in the
> database with every job!! :)
No need to do that as there can be multiple methods in one backup.

> 
> > > > you could also try lz4, lz4hc and lzfast.
> > >
> > > They're on the list. I'm more or less just changing the compression
> > > method between nightly incrementals to observe the impact. That takes
> > > some time... :)
> > >
> > Indeed also when you have these vast amounts of data 
> 
> LZ4 results: 839,555,606,105 SD bytes written in 6 hours 36 mins 36 secs
> at 89.6% compression ratio. That's a gain of 289% over "no compression",
> at an effective backup rate of 339 MB/s. :)
Sounds quite ok. LZ4 is also a very low cpu compressor at the cost of
not compressing to much but it is one of the compressors who is the
smartest and fastest when it comes to uncompressable data gzip is notorious
for creating bigger data on already compressed data.

> 
> > That all depends of course largely on the filesystem used. Filesystems
> > like ZFS have no real problem when you have different separate
> > sequential streams. For the SD you should use spooling so it doesn't
> > start interleaving the data when you run multiple backup jobs.
> 
> For now, both the FD and SD are dealing with XFS on CentOS6 boxes.
> Is xfs ok with separate sequential streams too?
I have no idea how XFS does in this area but the newer filesystems
are quite ok with multiple parallel reading streams as they prefetch
the data and understand these kind of sequential reading patterns.

> 
> What if you opened a pipe to the command line utility?
> Seems like this would be a much less invasive method of
> approaching this problem, even allowing the FD to stay
> single-threaded. Also, you could mess with the buffer
> size via command line options. According to the plots
> on the pigz website, buffer size seems to make
> a pretty big difference...No sense reinventing the wheel.
But passing on data within a threaded program is probably ways
faster then writing and reading it from pipes which means you read
and write your data multiple times while you will probably never
reach memory speeds that you do get with how we now do internal
data gathering and compressing and encrypting data. Keep in mind
that we do multiple steps e.g. compress before encrypt because
encrypted data doesn't compress very well.

But you can always write a C or Python file daemon plugin to see
if it brings anything.

-- 
Marco van Wieringen                   [email protected]
Bareos GmbH & Co. KG                  Phone: +49-221-63069389
http://www.bareos.com                     

Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
Komplementär: Bareos Verwaltungs-GmbH
Geschäftsführer: Stephan Dühr, M. Außendorf, J. Steffens,
                 P. Storz, M. v. Wieringen

-- 
You received this message because you are subscribed to the Google Groups 
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to