Hi Satya, Hmmm... indeed sounds like people from ZFS and SC should be providing you with a little more justification/information about your situation then they have.
If i may, it think that apart from looking at this from Storage perspective, on the Solaris/ZFS/SC side, you might wanna open an Escalation with SUN using your support contract, assuming you have one. Otherwise, you are liable to just being thrown around, given that this is a rather deep issue to debug. SUN (Oracle now), support engineers are trained for situations like these and know how to collect detailed data from the system to help with deeper analysis. Just looking at kernel thread stacks takes you only so far. Brute force kernel coredump analysis takes you a little further at the cost of LOT of effort. Doing targetted debugging (with dtrace scripts for example, debug kernel modules for another example), with several iterations of back and forth is what it takes to nail down deeper issues like this. Hope that didn't sound too much like a pushback. It was intended to be good faith feedback on how you are trying to go about this problem. Regards, -ashu opensolaris_user hello wrote: > Hi Ashu, > > Thank you very much for the response. > > In http://defect.opensolaris.org/bz/show_bug.cgi?id=15058 i indicated > that "zfs list -t all" thread was the oldest idle thread and it seem to > be stuck for whatever reason. All the threads that are in biowait() > seem to be much later than that thread. It could be that the > arc_read_no_lock() thread could be the cause of the other i/o waits. > While i agree that biowait() on the scdpmd could indicate some thing got > stuck at the scsi layer which may or may not be because of the zfs list > thread, but since it was the oldest idle thread i was interested in > knowing why it is stuck there for ever. > > The cluster team had indicated it is a invalid configuration, but did > not give any further details as to why that is the case and how we can > modify the configuration to prevent this. If you think that ZFS team > needs to take a look please assign it to ZFS and i can follow up on > zfs-discuss. > > Once again appreciate your response. We will try to follow up from the > storage perspective as well. > > Regards, > Satya > > > > On Fri, Mar 19, 2010 at 4:57 PM, Ashutosh Tripathi > <Ashutosh.Tripathi at sun.com <mailto:Ashutosh.Tripathi at sun.com>> wrote: > > Hi Satya, > > While i don't know why is ZFS I/O hung in biowait(), > from past experience, i can tell you that biowait() issues > tend to be very hard to debug. In many cases, these actually > turn out to be issues related to storage, ie the storage/storage-driver > simply take too long (or loose track of) for a given I/O. At the upper > layer (Solaris/Filesystem), there is nothing the system can do, > except to wait for the I/O to complete. > > Note that this is different from a SCSI timeout, ie > that a SCSI packet sent by the server to the storage gets lost, > so the host never gets a ACK back. In that case, the SCSI > command is retried. Here, i am talking about a case where > the SCSI command has been ACKed properly by the storage, > it just never gets back with the completed I/O. > > While your mention of the "zfs list -t all" command > sounds a bit suspicious, when i actually look at the thread > stack you posted in the CR, it lists a bunch of java threads > and scdpmd threads stuck behind a biowait(). So, at least in > that case, the hang could be independent of the zfs list > command (it is always possible that the zfs list is triggering > a particular pattern of I/O which leads to this...). > > Anyhow, where does that leave you... Have you tried > approaching your storage vendor with this problem? The leading > question to them would be: Why isn't the storage completing this > particular I/O request from the host? > > HTH, > -ashu > > > opensolaris_user hello wrote: > > > Hi Cluster Team, > > We are currently running in to a ZFS hang regularly and after > this happens the node can end up with corrupted pool causing > complete data loss. Other than that since reboot doesn't work we > end up with corrupted boot partition causing boot to panic. We > are using clustering with a single node configuration and aim to > expand to 2 node HA configuration. Looking at the stack traces > and our application logs it is clear that a "zfs list -t all" > command causes the pool to be stuck. The system works all the > time with out any issues except when we run in to this hang. > > I tried to analyze the root cause and i see that the zfs list > thread was stuck in i/o wait. It seems that this is a ZFS hang > and not related to clustering. We even tried to disable cluster > disk path monitoring and still run in to this issue. > > If we can get some insight as to why this is a invalid cluster > configuration and how this can lead to the ZFS hang we would > appreciate that. I have filed 2 bugs but the following bug > explains the situation much better: > > http://defect.opensolaris.org/bz/show_bug.cgi?id=15058 > > Any insight/help with this issue is highly appreciated. > > Thanks, > Satya > > > > ------------------------------------------------------------------------ > > _______________________________________________ > ha-clusters-discuss mailing list > ha-clusters-discuss at opensolaris.org > <mailto:ha-clusters-discuss at opensolaris.org> > http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss > >