Re: all estimate timed out

Brian Cuttler Thu, 04 Apr 2013 06:47:47 -0700

Chris,

sorry for the email trouble, this is a new phenomenon and I
don't know what is causing it, if you can identify the bad
header please let me know. We updated our mailhost a few months
ago, but my MUA (mutt) has not changed nor has my editor (emacs).


My "large" directories are exceptions, even here, and I am educating
the users to do things differently. However I do have lots of files
on zfs in general...

I don't believe that gzip is used in the estimate phase, I think
that it produces "raw" dump size for dump scheduling and that tape
allocation is left for later in the process. If gzip is used you
should see it in # ps, or top (or prstat), you could always  start
a dump after disabling estimate and see if that phase runs any better.
Since you can be sure of finishing estimate phase by checking
# amstatus, you can always abort the dump if you don't want a
non-compressed backup. (Jean-Louis will know off-hand)

How does the dump phase perform?


On Wed, Apr 03, 2013 at 05:42:12PM -0400, Chris Hoogendyk wrote:
> For some reason, the headers in the particular message from the list (from 
> Brian) are causing my mail client or something to completely strip the 
> message so that it is blank when I reply. That is, I compose a message, it 
> looks good, and I send it. But then I get a blank bcc, brian gets a blank 
> message, and the list gets a blank message. Weird. So I'm replying to 
> Christoph Scheeder's message and pasting in the contents for replying to 
> Brian. That will put the list thread somewhat out of order, but better than 
> completely disconnecting from the thread. Here goes (for the third time):
> 
> ---------------
> 
> So, Brian, this is the puzzle. Your file systems have a reason for being 
> difficult. They have "several hundred thousand files PER directory."
> 
> The filesystem that is causing me trouble, as I indicated, only has 2806 
> total files and 140 total directories. That's basically nothing.
> 
> So, is this gzip choking on tif files? Is gzip even involved when sending 
> estimates? If I remove compression will it fix this? I could break it up 
> into multiple DLE's, but Amanda will still need estimates of all the pieces.
> 
> Or is it something entirely different? And, if so, how should I go about 
> looking for it?
> 
> 
> 
> On 4/3/13 1:14 PM, Brian Cuttler wrote:
> >Chris,
> >
> >for larger file systems I've moved to "server estimate", less
> >accurate but takes the entire estimate phase out of the equation.
> >
> >We have had a lot of success with pig zip rather than regular
> >gzip, is it'll take advantage of the mutiple CPUs and give
> >parallelization during compression, which is often our bottleneck
> >during actual dumping. In one system I cut DLE dump time from
> >13 to 8 hours, a huge savings (I think those where the numbers,
> >I can look them up...).
> >
> >ZFS will allow unlimited capacity, and enough files per directory
> >to choke access, we have backups that run very badly here, with
> >litterally several hundred thousand files PER directory, and
> >multiple such directories.
> >
> >For backups themselves, I do use snapshots where I can on my
> >ZFS file systems.
> >
> >On Wed, Apr 03, 2013 at 11:26:01AM -0400, Chris Hoogendyk wrote:
> >>This seems like an obvious "read the FAQ" situation, but . . .
> >>
> >>I'm running Amanda 3.3.2 on a Sun T5220 with Solaris 10 and a J4500 "jbod"
> >>disk array with multipath SAS. It all should be fast and is on the local
> >>server, so there isn't any network path outside localhost for the DLE's
> >>that are giving me trouble. They are zfs on raidz1 with five 2TB drives.
> >>Gnutar is v1.23. This server is successfully backing up several other
> >>servers as well as many more DLE's on the localhost. Output to an AIT5 
> >>tape
> >>library.
> >>
> >>I've upped the etimeout to 1800 and the dtimeout to 3600, which both seem
> >>outrageously long (jumped from the default 5 minutes to 30 minutes, and
> >>from the default 30 minutes to an hour).
> >>
> >>The filesystem (DLE) that is giving me trouble (hasn't backed up in a
> >>couple of weeks) is /export/herbarium, which looks like:
> >>
> >>    marlin:/export/herbarium# df -k .
> >>    Filesystem            kbytes    used   avail capacity  Mounted on
> >>    J4500-pool1/herbarium
> >>                          2040109465 262907572 1777201893    13%
> >>                          /export/herbarium
> >>    marlin:/export/herbarium# find . -type f | wc -l
> >>         2806
> >>    marlin:/export/herbarium# find . -type d | wc -l
> >>          140
> >>    marlin:/export/herbarium#
> >>
> >>
> >>So, it is only 262G and only has 2806 files. Shouldn't be that big a deal.
> >>They are typically tif scans.
> >>
> >>One thought that hits me is: possibly, because it is over 200G of tif
> >>scans, compression is causing trouble? But this is just getting estimates,
> >>output going to /dev/null.
> >>
> >>Here is a segment from the very end of the sendsize debug file from April 
> >>1
> >>(the debug file ends after these lines):
> >>
> >>Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: .....
> >>Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: estimate time for
> >>/export/herbarium level 0: 26302.500
> >>Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: estimate size for
> >>/export/herbarium level 0: 262993150 KB
> >>Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: waiting for runtar
> >>"/export/herbarium" child
> >>Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: after runtar
> >>/export/herbarium wait
> >>Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: getting size via gnutar for
> >>/export/herbarium level 1
> >>Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: Spawning
> >>"/usr/local/libexec/amanda/runtar runtar daily
> >>/usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric-owner
> >>--directory /export/herbarium --one-file-system --listed-incremental
> >>/usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new
> >>--sparse --ignore-failed-read --totals ." in pipeline
> >>Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: Total bytes written:
> >>77663795200 (73GiB, 9.5MiB/s)
> >>Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: .....
> >>Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: estimate time for
> >>/export/herbarium level 1: 7827.571
> >>Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: estimate size for
> >>/export/herbarium level 1: 75843550 KB
> >>Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: waiting for runtar
> >>"/export/herbarium" child
> >>Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: after runtar
> >>/export/herbarium wait
> >>Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: done with amname
> >>/export/herbarium dirname /export/herbarium spindle 45002
> 
> -- 
> ---------------
> 
> Chris Hoogendyk
> 
> -
>    O__  ---- Systems Administrator
>   c/ /'_ --- Biology & Geology Departments
>  (*) \(*) -- 140 Morrill Science Center
> ~~~~~~~~~~ - University of Massachusetts, Amherst
> 
> <hoogen...@bio.umass.edu>
> 
> ---------------
> 
> Erd?s 4
> 
---
   Brian R Cuttler                 brian.cutt...@wadsworth.org
   Computer Systems Support        (v) 518 486-1697
   Wadsworth Center                (f) 518 473-6384
   NYS Department of Health        Help Desk 518 473-0773

Re: all estimate timed out

Reply via email to