Re: [Bacula-users] Co-sponsor of client-side parallelism?

Radosław Korzeniewski Wed, 20 Feb 2019 06:23:41 -0800

Hello,

śr., 20 lut 2019 o 13:29 Josh Fisher <jfis...@pvct.com> napisał(a):


> Note that posix_fadvise() only affects caching and read-ahead at the OS
> level. While the use of posix_fadvise() may indeed improve i/o performance
> for particular use cases, it is not parallelism and does not cause multiple
> user-space threads to be executed in parallel. I believe that Kern is
> referring to a multi-threaded approach in the bacula-fd, where multiple
> threads are executing in parallel to read and process files.
>
> Also, I believe that bacula-fd already does make use of posix_fadvise().
>
Yes, I mentioned about it in my previous email.

> I would think that a reader-writer approach would be possible. A single
> writer thread would perform all i/o with the SD while multiple reader
> threads would read and process single files at a time. A single management
> thread would manage the list of files to be backed up and spawn reader
> threads to process them. This could improve FD performance, particularly
> when compression and/or encryption is being used.
>
This topic has a lot of branches and detail levels causing a high level of
misunderstanding, i.e.
- concurrent data scan (finding what to backup)
- concurrent data read at directory (or filesystem) level
- concurrent data read at file level
- concurrent data read at block level
- concurrent data processing (i.e. compression, see *1 below)
- asynchronous IO for data read (single thread)
- multiple network streams to single storage
- single network stream to multiple storages = multiple network streams
- multiple network streams to multiple storages
- support for high latency networks - single thread
- support for high latency networks - multiple threads
- automatic concurrency scaling (i.e. by a number of available cpu or
system utilization)
- manual concurrency scaling

*1) you cannot make a concurrent (threaded) encryption with CBC encryption
mode used by Bacula, you can switch to CTR, but the required code does not
exist in Bacula, AFAIR.

> I am not sure this approach is always a good thing. It depends on the
> client hardware. When backing up weak clients using compression or
> encryption, it would bring them to their knees, although a mechanism to
> limit the number of reader threads that may be spawned would fix that.
> Also, with weak clients, the real problem is slow disks on the clients, and
> no amount of parallelism will fix that.
>
We have a following chain:
1. find data
2. read data
3. process data
4. send data over the network
5. receive and process data
6. write data
where every single point can slow down the whole process. Some points can
by optimized, some are fixed on hardware limitations. The optimization
could be achieved with concurrent (threaded) execution or with some clever
tricks, i.e. instead of finding files on NFS, ask network storage to
prepare the list of files for you - this is how BEE Incremental Accelerator
is working.

best regards
-- 
Radosław Korzeniewski
rados...@korzeniewski.net

_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Co-sponsor of client-side parallelism?

Reply via email to