Darin,
On Wed, May 05, 2010 at 02:19:35PM -0400, Darin Perusich wrote:
> Brian,
>
> I've found that on systems with alot of ZFS file systems I needed to
> increase the {e,d,c}timeout values. Below are the values I have set and
> I'm backing up some 150+ ZFS filesystems of varying sizes from a single
> server, dumps are a mixture of suntar and zfs-sendrecv.
>
> etimeout 1600
> dtimeout 1800
> ctimeout 500
I'm willing to up the values but I have no confidence in it.
I have yet another system... (there must still be 20 amanda servers
here) which has zfs-snapshot for clients that are both on the server
and on another solaris box, the non-server client has 120+ DLE and
runs with the same or smaller values than on the systems that are
having issues but only a half dozen DLEs.
Also, the issues seem to be specific to certain DLE, in both
cases the busiest of the DLEs, not necessarily the largest
but the most active.
I've tried running only the single client with single DLE
# amdump cascade cascade /cascadep/export/confocal
it still fails.
Also the other DLEs on the system dump concurrently with the
problem/timed-out DLE and they complete successfully.
I'm wondering if something in ZFS is blocking on the most active
zfs mountpoints, or rather above a certain threshhold.
thanks,
Brian
> On 05/05/2010 02:00 PM, Brian Cuttler wrote:
> >
> > Dustin,
> > Dan,
> >
> > I wondered about dtimout, but we have a very similar issue
> > on another system and I tried that.
> >
> > In both cases the DLE with the problem is the largest, no
> > not the largest - the most active ZFS mountpoint on the box.
> > The other DLEs have no issues, not even intermittently and
> > raising the dtimeout limit for the other box just extended
> > the time to failure, it didn't aleviate the issue.
> >
> > Dan - I'm starting to think we have some ZFS issue on both
> > dorldom1z1 and on cascade, but I don't know if the following
> > error messages contain anything you can sink your teeth into.
> >
> > Interestingly we ran great on dorldom1z1 for a couple of weeks
> > after we rebooted it (the zone, not the whole box).
> >
> > I'm not yet proposing we reboot cascade (Dustin - its not in
> > a non-global zone, cascade has no zones, or perhaps one says
> > that it has only the global-zone).
> >
> > Brian
> >
> > On Wed, May 05, 2010 at 10:47:49AM -0500, Dustin J. Mitchell wrote:
> >> What is the error? I suspect this:
> >>
> >> 1273071472.718696: dumper: security_seterror(handle=8074f08,
> >> driver=fef09394 (BSD) error=timeout waiting for REP)
> >>
> >> which likely means that you're exceeding the dtimeout for that DLE for
> >> whatever reason. Try increasing that?
> >>
> >> Dustin
> >>
> >> --
> >> Open Source Storage Engineer
> >> http://www.zmanda.com
> > ---
> > Brian R Cuttler [email protected]
> > Computer Systems Support (v) 518 486-1697
> > Wadsworth Center (f) 518 473-6384
> > NYS Department of Health Help Desk 518 473-0773
> >
> >
> >
> > IMPORTANT NOTICE: This e-mail and any attachments may contain
> > confidential or sensitive information which is, or may be, legally
> > privileged or otherwise protected by law from further disclosure. It
> > is intended only for the addressee. If you received this in error or
> > from someone who was not authorized to send it to you, please do not
> > distribute, copy or use it or any attachments. Please notify the
> > sender immediately by reply e-mail and delete this from your
> > system. Thank you for your cooperation.
> >
> >
>
> --
> Darin Perusich
> Unix Systems Administrator
> Cognigen Corporation
> 395 Youngs Rd.
> Williamsville, NY 14221
> Phone: 716-633-3463
> Email: [email protected]
---
Brian R Cuttler [email protected]
Computer Systems Support (v) 518 486-1697
Wadsworth Center (f) 518 473-6384
NYS Department of Health Help Desk 518 473-0773
IMPORTANT NOTICE: This e-mail and any attachments may contain
confidential or sensitive information which is, or may be, legally
privileged or otherwise protected by law from further disclosure. It
is intended only for the addressee. If you received this in error or
from someone who was not authorized to send it to you, please do not
distribute, copy or use it or any attachments. Please notify the
sender immediately by reply e-mail and delete this from your
system. Thank you for your cooperation.