Just to follow up on this. Amanda backups have been running smoothly for a week
now.
For this one DLE, I set up amgtar and disabled the sparse option. It ran, but took most of Saturday
to complete. Then, having a full backup of that, I broke it up into 6 DLE's using excludes and
includes. I added one a day back into the disklist. It now has them all and can spread the fulls
over the week. Backups for the last couple of days have completed around 4am.
As a followup, in case anyone cares to discuss technicalities and examples, has anyone run into this
before? It seems any site doing lots of sizable scanned images, or GIS systems with tiff maps, would
have run into it. I don't know how often sparse file treatment is an important thing. Database files
can be sparse, but proper procedure is to use the database tools (e.g. mysqldump) for backups and
not to just backup the data directory. It's not clear to me exactly what gnutar is doing with sparse
or why it is so inefficient (timewise). I don't think these tif files are sparse. They are just
large. And gnutar is not just doubling the time as described in
http://www.gnu.org/software/tar/manual/html_node/sparse.html. I was experiencing on the order of 400
times as much time for the sparse option compared to when I removed the sparse option.
[ Recalling details from earlier messages -- Amanda 3.3.2 with gtar 1.23 (/usr/sfw/bin/gtar) on
Solaris 10 on a T5220 (UltraSPARC, 8 core, 32G memory) with multipath SAS interface to J4500 for
storage using zfs raidz with 2TB drives. Nightly backups go out to an AIT5 tape library on an
Ultra320 LVD SCSI interface. Backing up on the order of 100 DLEs from 5 machines over GigE on this
Amanda server. Problem DLE was on localhost on the J4500. ]
On 4/5/13 3:16 PM, Jean-Louis Martineau wrote:
On 04/05/2013 12:09 PM, Chris Hoogendyk wrote:
OK, folks, it is the "--sparse" option that Amanda is putting on the gtar. This is
/usr/sfw/bin/tar version 1.23 on Solaris 10. I have a test script that runs the runtar and a test
directory with just 10 of the tif files in it.
Without the "--sparse" option, time tells me that it takes 0m0.57s to run the
script.
With the "--sparse" option, time tells me that it takes 3m14.91s to run the
script.
Scale that from 10 to 1300 tif files, and I have serious issues.
Now what? Can I tell Amanda not to do that? What difference will it make? Is
this a bug in gtar?
Use the amgtar application instead of the GNUTAR program, it allow to disable
the sparse option.
tar can't know where are the holes, it must read them.
You WANT the sparse option, otherwise your backup will be large because tar
fill the holes with 0.
Your best option is to use the calcsize or server estimate.
Jean-Louis
--
---------------
Chris Hoogendyk
-
O__ ---- Systems Administrator
c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst
<[email protected]>
---------------
Erdös 4