Just to follow up on this. Amanda backups have been running smoothly for a week 
now.

For this one DLE, I set up amgtar and disabled the sparse option. It ran, but took most of Saturday to complete. Then, having a full backup of that, I broke it up into 6 DLE's using excludes and includes. I added one a day back into the disklist. It now has them all and can spread the fulls over the week. Backups for the last couple of days have completed around 4am.

As a followup, in case anyone cares to discuss technicalities and examples, has anyone run into this before? It seems any site doing lots of sizable scanned images, or GIS systems with tiff maps, would have run into it. I don't know how often sparse file treatment is an important thing. Database files can be sparse, but proper procedure is to use the database tools (e.g. mysqldump) for backups and not to just backup the data directory. It's not clear to me exactly what gnutar is doing with sparse or why it is so inefficient (timewise). I don't think these tif files are sparse. They are just large. And gnutar is not just doubling the time as described in http://www.gnu.org/software/tar/manual/html_node/sparse.html. I was experiencing on the order of 400 times as much time for the sparse option compared to when I removed the sparse option.

[ Recalling details from earlier messages -- Amanda 3.3.2 with gtar 1.23 (/usr/sfw/bin/gtar) on Solaris 10 on a T5220 (UltraSPARC, 8 core, 32G memory) with multipath SAS interface to J4500 for storage using zfs raidz with 2TB drives. Nightly backups go out to an AIT5 tape library on an Ultra320 LVD SCSI interface. Backing up on the order of 100 DLEs from 5 machines over GigE on this Amanda server. Problem DLE was on localhost on the J4500. ]


On 4/5/13 3:16 PM, Jean-Louis Martineau wrote:
On 04/05/2013 12:09 PM, Chris Hoogendyk wrote:
OK, folks, it is the "--sparse" option that Amanda is putting on the gtar. This is /usr/sfw/bin/tar version 1.23 on Solaris 10. I have a test script that runs the runtar and a test directory with just 10 of the tif files in it.

Without the "--sparse" option, time tells me that it takes 0m0.57s to run the 
script.

With the "--sparse" option, time tells me that it takes 3m14.91s to run the 
script.

Scale that from 10 to 1300 tif files, and I have serious issues.

Now what? Can I tell Amanda not to do that? What difference will it make? Is 
this a bug in gtar?
Use the amgtar application instead of the GNUTAR program, it allow to disable 
the sparse option.

tar can't know where are the holes, it must read them.

You WANT the sparse option, otherwise your backup will be large because tar 
fill the holes with 0.

Your best option is to use the calcsize or server estimate.

Jean-Louis


--
---------------

Chris Hoogendyk

-
   O__  ---- Systems Administrator
  c/ /'_ --- Biology & Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst

<[email protected]>

---------------

Erdös 4

Reply via email to