Dustin,
Dan,
I wondered about dtimout, but we have a very similar issue
on another system and I tried that.
In both cases the DLE with the problem is the largest, no
not the largest - the most active ZFS mountpoint on the box.
The other DLEs have no issues, not even intermittently and
raising the dtimeout limit for the other box just extended
the time to failure, it didn't aleviate the issue.
Dan - I'm starting to think we have some ZFS issue on both
dorldom1z1 and on cascade, but I don't know if the following
error messages contain anything you can sink your teeth into.
Interestingly we ran great on dorldom1z1 for a couple of weeks
after we rebooted it (the zone, not the whole box).
I'm not yet proposing we reboot cascade (Dustin - its not in
a non-global zone, cascade has no zones, or perhaps one says
that it has only the global-zone).
Brian
On Wed, May 05, 2010 at 10:47:49AM -0500, Dustin J. Mitchell wrote:
> What is the error? I suspect this:
>
> 1273071472.718696: dumper: security_seterror(handle=8074f08,
> driver=fef09394 (BSD) error=timeout waiting for REP)
>
> which likely means that you're exceeding the dtimeout for that DLE for
> whatever reason. Try increasing that?
>
> Dustin
>
> --
> Open Source Storage Engineer
> http://www.zmanda.com
---
Brian R Cuttler [email protected]
Computer Systems Support (v) 518 486-1697
Wadsworth Center (f) 518 473-6384
NYS Department of Health Help Desk 518 473-0773
IMPORTANT NOTICE: This e-mail and any attachments may contain
confidential or sensitive information which is, or may be, legally
privileged or otherwise protected by law from further disclosure. It
is intended only for the addressee. If you received this in error or
from someone who was not authorized to send it to you, please do not
distribute, copy or use it or any attachments. Please notify the
sender immediately by reply e-mail and delete this from your
system. Thank you for your cooperation.