[ha-clusters-discuss] Fwd: [Bug 13774] zfs hangs on a 10 disk raidz pool

opensolaris_user hello Thu, 11 Mar 2010 11:24:01 -0800

Hi Cluster Team,

We are currently running in to a ZFS hang regularly and after this happens
the node can end up with corrupted pool causing complete data loss. Other
than that since reboot doesn't work we end up with corrupted boot partition
causing boot to panic. We are using clustering with a single node
configuration and aim to expand to 2 node HA configuration.  Looking at the
stack traces and our application logs it is clear that a "zfs list -t all"
command causes the pool to be stuck.  The system works all the time with out
any issues except when we run in to this hang.


I tried to analyze the root cause and i see that the zfs list thread was
stuck in i/o wait. It seems that this is a ZFS hand and not related to
clustering. We even tried to disable cluster disk path monitoring and still
run in to this issue.

If we can get some insight as to why this is a invalid cluster configuration
and how this can lead to the ZFS hang we would appreciate that.  I have
filed 2 bugs but the following bug explains the situation much better:

http://defect.opensolaris.org/bz/show_bug.cgi?id=15058

Any insight/help with this issue is highly appreciated.

Thanks,
Satya


---------- Forwarded message ----------
From: <bugzi...@defect.opensolaris.org>
Date: Thu, Mar 11, 2010 at 10:19 AM
Subject: [Bug 13774] zfs hangs on a 10 disk raidz pool
To: opensolarisuser2009 at gmail.com


http://defect.opensolaris.org/bz/show_bug.cgi?id=13774


manthavish <vishwanath.mantha at sun.com> changed:

          What    |Removed                     |Added
----------------------------------------------------------------------------
                CC|                            |vishwanath.mantha at sun.com


--
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You reported the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20100311/f88d0da9/attachment-0001.html>

[ha-clusters-discuss] Fwd: [Bug 13774] zfs hangs on a 10 disk raidz pool

Reply via email to