> -----Original Message-----
> Compression method is part of the fileset and not stored anywhere in the
> database.

Eeep. I've been changing the compression method in the fileset definition to 
experiment (e.g., the same fileset, over time, has been "no compression", 
"lz4", "lzfast", etc.) Does this mean that I will be unable to retrieve data 
from any job where the compression method on the storage volume does not match 
the current definition on my director?

If so, I want to request that compression method be stored in the database with 
every job!! :)


> > > you could also try lz4, lz4hc and lzfast.
> >
> > They're on the list. I'm more or less just changing the compression
> > method between nightly incrementals to observe the impact. That takes
> > some time... :)
> >
> Indeed also when you have these vast amounts of data :-)

LZ4 results: 839,555,606,105 SD bytes written in 6 hours 36 mins 36 secs at 
89.6% compression ratio. That's a gain of 289% over "no compression", at an 
effective backup rate of 339 MB/s. :)

> That all depends of course largely on the filesystem used. Filesystems like
> ZFS have no real problem when you have different separate sequential
> streams. For the SD you should use spooling so it doesn't start interleaving
> the data when you run multiple backup jobs.

For now, both the FD and SD are dealing with XFS on CentOS6 boxes. Is xfs ok 
with separate sequential streams too?

> There are however some thing that are currently making this not something
> we can easily achieve and needs some rather serious redesign which most
> likely will occur some day.
>
> Currently the FD reads the data in chunks of 64 Kb and then compresses that
> data and then sends it to the SD. To get things to be parallel you probably
> need something like a pipeline which changes the whole backup process a bit
> e.g. :
>
> - thread that reads the data and pushes it onto a work queue
> - thread(s) that do compression and encryption and push it onto a next
>   work queue.
> - thread that gets the data from the second work queue and sends it to
>   the SD.
>
> The current setup however has single buffers for compression and
> encryption  so it needs quite some work. Also the 64 Kb chunks are probably
> something that needs to change, and to get things send to the SD in order
> e.g. maybe some pipeline is faster then an other is also something to care
> about.
>
> We have one serious advantage and that is that the compression and
> encryption routines are nicely separated in the lib and with little effort 
> they
> should be ok to used with multiple threads. That was one of the many
> refactoring jobs we performed when forking from Bacula.

What if you opened a pipe to the command line utility? Seems like this would be 
a much less invasive method of approaching this problem, even allowing the FD 
to stay single-threaded. Also, you could mess with the buffer size via command 
line options. According to the plots on the pigz website, buffer size seems to 
make a pretty big difference...No sense reinventing the wheel.

Bryce





This electronic message contains information generated by the USDA solely for 
the intended recipients. Any unauthorized interception of this message or the 
use or disclosure of the information it contains may violate the law and 
subject the violator to civil or criminal penalties. If you believe you have 
received this message in error, please notify the sender and delete the email 
immediately.

-- 
You received this message because you are subscribed to the Google Groups 
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to