Re: all estimate timed out

Brian Cuttler Thu, 04 Apr 2013 10:40:20 -0700

Reply using thunderbird rather than mutt.

Any way to vet the zfs file system? Make sure its sane and doesn'tcontain some kind of a bad

link causing a loop?

If you where to run the command used by estimate, which I believedisplays in the debug file,can you run that successfully on the command line? If you run itverbose, can you see where

its hangs or where it slows down?

On 4/4/2013 12:34 PM, Chris Hoogendyk wrote:

Still getting blank emails on a test reply (just to myself) to Brian'semails. So, I'm replying to my own email to the list and then pastingin the reply to Brian. It's clearly a weirdness in the headers comingfrom Brian, but it could also be some misbehavior in response to thoseby my mail client -- Thunderbird 17.0.5.
I changed the dump type to not use compression. If tif files are notgoing to compress anyway, then I might as well not even ask Amanda totry. However, it never gets to the dump, because it gets "all estimatetimed out."
I will try breaking it into multiple DLE's and also changing it to"server estimate". But, until I know what is really causing theproblem, I'm not optimistic about the possibility of a successful dump.
As I said, everything else runs without trouble, including DLE's thatare different zfs filesystems on the same zpool.
On 4/4/13 9:39 AM, Brian Cuttler wrote:
Chris,

sorry for the email trouble, this is a new phenomenon and I
don't know what is causing it, if you can identify the bad
header please let me know. We updated our mailhost a few months
ago, but my MUA (mutt) has not changed nor has my editor (emacs).

My "large" directories are exceptions, even here, and I am educating
the users to do things differently. However I do have lots of files
on zfs in general...

I don't believe that gzip is used in the estimate phase, I think
that it produces "raw" dump size for dump scheduling and that tape
allocation is left for later in the process. If gzip is used you
should see it in # ps, or top (or prstat), you could always  start
a dump after disabling estimate and see if that phase runs any better.
Since you can be sure of finishing estimate phase by checking
# amstatus, you can always abort the dump if you don't want a
non-compressed backup. (Jean-Louis will know off-hand)

How does the dump phase perform?


On Wed, Apr 03, 2013 at 05:42:12PM -0400, Chris Hoogendyk wrote:
For some reason, the headers in the particular message from the list(from
Brian) are causing my mail client or something to completely strip the
message so that it is blank when I reply. That is, I compose amessage, itlooks good, and I send it. But then I get a blank bcc, brian gets ablank
message, and the list gets a blank message. Weird. So I'm replying to
Christoph Scheeder's message and pasting in the contents forreplying toBrian. That will put the list thread somewhat out of order, butbetter thancompletely disconnecting from the thread. Here goes (for the thirdtime):
---------------
So, Brian, this is the puzzle. Your file systems have a reason forbeing
difficult. They have "several hundred thousand files PER directory."
The filesystem that is causing me trouble, as I indicated, only has2806
total files and 140 total directories. That's basically nothing.
So, is this gzip choking on tif files? Is gzip even involved whensendingestimates? If I remove compression will it fix this? I could breakit upinto multiple DLE's, but Amanda will still need estimates of all thepieces.
Or is it something entirely different? And, if so, how should I goabout
looking for it?



On 4/3/13 1:14 PM, Brian Cuttler wrote:
Chris,

for larger file systems I've moved to "server estimate", less
accurate but takes the entire estimate phase out of the equation.

We have had a lot of success with pig zip rather than regular
gzip, is it'll take advantage of the mutiple CPUs and give
parallelization during compression, which is often our bottleneck
during actual dumping. In one system I cut DLE dump time from
13 to 8 hours, a huge savings (I think those where the numbers,
I can look them up...).

ZFS will allow unlimited capacity, and enough files per directory
to choke access, we have backups that run very badly here, with
litterally several hundred thousand files PER directory, and
multiple such directories.

For backups themselves, I do use snapshots where I can on my
ZFS file systems.

On Wed, Apr 03, 2013 at 11:26:01AM -0400, Chris Hoogendyk wrote:
This seems like an obvious "read the FAQ" situation, but . . .
I'm running Amanda 3.3.2 on a Sun T5220 with Solaris 10 and aJ4500 "jbod"disk array with multipath SAS. It all should be fast and is on thelocalserver, so there isn't any network path outside localhost for theDLE'sthat are giving me trouble. They are zfs on raidz1 with five 2TBdrives.
Gnutar is v1.23. This server is successfully backing up several other
servers as well as many more DLE's on the localhost. Output to anAIT5
tape
library.
I've upped the etimeout to 1800 and the dtimeout to 3600, whichboth seemoutrageously long (jumped from the default 5 minutes to 30minutes, and
>from the default 30 minutes to an hour).
The filesystem (DLE) that is giving me trouble (hasn't backed up in a
couple of weeks) is /export/herbarium, which looks like:

    marlin:/export/herbarium# df -k .
    Filesystem            kbytes    used   avail capacity  Mounted on
    J4500-pool1/herbarium
                          2040109465 262907572 1777201893    13%
                          /export/herbarium
    marlin:/export/herbarium# find . -type f | wc -l
         2806
    marlin:/export/herbarium# find . -type d | wc -l
          140
    marlin:/export/herbarium#
So, it is only 262G and only has 2806 files. Shouldn't be that biga deal.
They are typically tif scans.

One thought that hits me is: possibly, because it is over 200G of tif
scans, compression is causing trouble? But this is just gettingestimates,
output going to /dev/null.
Here is a segment from the very end of the sendsize debug filefrom April
1
(the debug file ends after these lines):

Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: .....
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: estimate time for
/export/herbarium level 0: 26302.500
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: estimate size for
/export/herbarium level 0: 262993150 KB
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: waiting for runtar
"/export/herbarium" child
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: after runtar
/export/herbarium wait
Mon Apr 1 08:05:49 2013: thd-32a58: sendsize: getting size viagnutar for
/export/herbarium level 1
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: Spawning
"/usr/local/libexec/amanda/runtar runtar daily
/usr/local/etc/amanda/tools/gtar --create --file /dev/null--numeric-owner
--directory /export/herbarium --one-file-system --listed-incremental
/usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new
--sparse --ignore-failed-read --totals ." in pipeline
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: Total bytes written:
77663795200 (73GiB, 9.5MiB/s)
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: .....
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: estimate time for
/export/herbarium level 1: 7827.571
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: estimate size for
/export/herbarium level 1: 75843550 KB
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: waiting for runtar
"/export/herbarium" child
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: after runtar
/export/herbarium wait
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: done with amname
/export/herbarium dirname /export/herbarium spindle 45002

Re: all estimate timed out

Reply via email to