Re: [Bacula-devel] Correct usage of posix_fadvise in SD?

Marc Cousin Mon, 29 Sep 2008 08:18:15 -0700

Le Monday 29 September 2008 15:44:09 Kern Sibbald, vous avez écrit :
> On Monday 29 September 2008 14:13:16 Brice Figureau wrote:
> > Hi,
> >
> > I was looking to the 2.4.2 SD spooling code lately (this was part of
> > understanding why despooling performances were not that good on my
> > hardware), when I noticed the following usage of posix_fadvise while
> > sequentially reading the spool file (despool_data):
> >
> > #if defined(HAVE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED)
> >    posix_fadvise(rdcr->spool_fd, 0, 0, POSIX_FADV_WILLNEED);
> > #endif
> >
> > I don't understand why we're telling the kernel to page cache the spool
> > file we're reading since we won't reuse those data.
> > Moreover, there is no "DONTNEED" call after despool_data to let the
> > kernel know it can trash what we read.
> >
> > I thought that something along the line of this in despool_data:
> > #if defined(HAVE_POSIX_FADVISE) && defined(POSIX_FADV_SEQUENTIAL)
> >    posix_fadvise(rdcr->spool_fd, 0, 0, POSIX_FADV_SEQUENTIAL);
> > #endif
> > #if defined(HAVE_POSIX_FADVISE) && defined(POSIX_FADV_NOREUSE)
> >    posix_fadvise(rdcr->spool_fd, 0, 0, POSIX_FADV_NOREUSE);
> > #endif
> >
> > And a few POSIX_FADV_DONTNEED after each block read with the correct
> > offset and length to tell the pagecache we don't need this part anymore.
> >
> > Does it make sense?
> > Or did I miss something?
>
> Why don't you run some tests measuring the performance of your proposed
> changes versus what is there now.  That would give a much more definitive
> answer than I can ...
>
> Regards,
>
> Kern
>


Hi,
First a thing about what Brice said : I don't think there would be something 
to gain for bacula in using NOREUSE. It would be more for the other programs 
running concurrently with it, in order to help the file daemon not trashing 
the filesystem cache.

This said ...

I've also been  working on posix_fadvise these days ...

I wanted to find out if it was possible to speedup backups of really small 
files...

Bacula, like most backup software, is very good at reading big files, because 
of the read ahead features mentionned all around in this thread : you start a 
big file, the os reads ahead, so bacula reads the file efficiently, as about 
everything it needs is already cached.

For small files, it's not the case :
the file daemon spends all its time opening a file, reading it, closing it, 
going to the next ... Of course, the OS can do no readahead, and the 
performance is extremely poor.

Here's what I've been trying to model :

Before doing the real backup, you tell the OS what you're going to read in the 
next few seconds. To do this, you need to open the files you're going to read 
in advance, posix_fadvice them (WILL_NEED), and have a bunch of them done 
before the real work from bacula comes. I've already discussed it a bit with 
Eric : it might not be very hard to do this using a fifo of opened files in 
the file daemon.

For now, here's how I've done it (proof of concept, I didn't code anything in 
bacula as I didn't know where to start :) )

I've done a C program that behaves exactly like the unix find command, but 
preloads the files before printing them to stdout.

Then I've compared the result from my program (fadvise) and find running 
these :

./fadvise | cpio -o --file=/tmp/test
find dir/ | cpio -o --file=/tmp/test2

Of course, before each run, I've reset the linux cache (
echo 1  > /proc/sys/vm/drop_caches), in order to measure the same thing on 
both runs :)

The purpose is to use the pipe as a cheap (but not that smart) replacement of 
the real fifo for my tests.

time ./fadvise | cpio -o --file=/tmp/test2

real    16m25.809s
user    0m8.977s
sys     1m19.657s

time find dir/ | cpio -o --file=/tmp/test2

real    25m6.516s
user    0m10.621s
sys     2m36.558s


The results are for 2,000,000 files, dispatched into 1,000 directories. Each 
file is 200 bytes (some sort of worst case scenario). It's on my work PC, so 
nothing fancy, only 2 SATA drives (one for the test directory, the other one 
contains /tmp).

Of course, if it were to be implemented, there should be some sort of 
safeguards added into the real code : there is no need to preload big files 
(where should the threshold be put ?), there is no need to preload thousands 
of files (it may be counter effective, if some files may be evicted from the 
cache before the file deamon comes to read them), there is no need to preload 
a file we already know hasn't changed ...


Anyhow, I just wanted start a discussion about this, because I feel bacula 
could be better on this use case, and this way of doing things may be a 
solution. Of course, we won't get the throughput of big files at least 
because the files won't be contiguous on disk, but at least the OS will be 
able to do a bit of reordering for all these reads.

Cheers

Marc

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] Correct usage of posix_fadvise in SD?

Reply via email to