I have 1 sparse DB file at 100MB, it is not worth supporting that for 1/4 
backup speeds over my 16TB backup run.
If any of my users (only 2 of them) were to use TB sized sparse files they are 
not getting those files back on restore and a complementary ban.

Sure if you have lots of large VMs, then yes keep sparse support enabled for 
those DLEs but i would not have it enabled by default when it comes at such a 
large cost.
For me block de-duplication is a larger issue but sparse support does nothing 
for that anyways.

Anton "exuvo" Olsson
   [email protected]

On 2024-09-06 22:58, Jon LaBadie wrote:
On Sat, Sep 07, 2024 at 01:19:03AM +1000, meku wrote:
Thank you for the amazing tip. I also benchmarked similar results: amgtar
was averaging 140MB/s with --sparse, and with sparse disabled it now
averages 600MB/s. I expect this will have a huge improvement on backup
times.

On Sat, 31 Aug 2024 at 10:57, Exuvo <[email protected]> wrote:

I have been trying to figure out why tar run by amanda was so much slower
than my manual tar runs.
The culprit is tar --sparse (which is on by default in amgtar) which for
me maxes out 1 CPU core and reduces tar's read speed to around 130MB/s for
me on a ZFS filesystem with 1GB files.
I turned that option off and now it can read at 500MB/s with only 30% CPU
usage.

I suspect this will also resolve the slow read speeds i have with lots of
tiny files as tar was also capped on CPU there but i had assumed it was
blocking on IO.


Take these cautionary comments as coming from someone uncertain of their 
accuracy.
A sparse file is one which has some blocks to which no data has been written.

For example, I had a UNIX Sys Adm whom I despised going round in circles trying
to figure something out.  These were the days of GB sized disks being large.
Being a control freak, he set the user file size limit (ulimit -f) to about 4MB.
If you needed larger files you had to get dispensation from him.

I wrote a C program that simply created a file, wrote one byte at the start.  It then 
"seeked" to 1TB, wrote one more byte, and closed the file.  To most command 
line programs it was a 1TB file, bigger than the
user limit and 4 times bigger than the entire disk.  Do a wc, a cat, ls -l,
etc and it was 1TB.  But it only used 2 of the millions of disk blocks 
available.  If you cat'ted the file, the system supplied a TB of null
bytes between the two single characters I actually wrote to the file.

So what would happen if you amanda-backed-up that file?  Without the
--sparse flag of amgtar the system would efficiently supply a 1TB file with 
lots of null bytes.  If you compressed your backup, gzip would shrink those 
nulls to almost nothing.  But what would happen if you
restored that file from backup?  You better have lots of room, you
will get the entire 1TB file, null bytes included.  Nothing in the
backup says it was originally sparse.


I suggest you consider whether to use --sparse on a DLE by DLE basis.
You may have some file systems (likely dedicated to a single application) that 
are likely to use sparse files.  Database managers for
example.  Maybe Virtual Machines.  Maybe ???

Check the effect of --sparse on them.  Time/speed, size, and VERY
important, restoration.

You may also want to delve deeper into gtar's use of --sparse.  It is
configurable.  Here are 3 pieces from the man page.


  --hole-detection=METHOD
        Use METHOD to detect holes in sparse files.  This option implies 
--sparse.
        Valid values for METHOD are seek and raw.  Default is seek with fallback
        to raw when not applicable.

  --sparse-version=MAJOR[.MINOR]
        Set which version of the sparse format to use.  This option implies
        --sparse.  Valid argument values are 0.0, 0.1, and 1.0. For a
        detailed discussion of sparse formats, refer to the GNU Tar Manual,
        appendix D, "Sparse Formats".  Using the info reader, it can be
        accessed running the following command: info tar 'Sparse Formats'.

  -S, --sparse
        Handle sparse files efficiently.  Some files in the file system may have
        segments which were actually never written (quite often these are 
database
        files created by such systems as DBM).  When given this option, tar 
attempts
        to determine if the file is sparse prior to archiving it, and if so, to
        reduce the resulting archive size by not dumping empty parts of the 
file.

Good Luck,
Jon

Reply via email to