Hi Ashu,

Thank you very much for the response.

 In http://defect.opensolaris.org/bz/show_bug.cgi?id=15058 i indicated that
"zfs list -t all" thread was the oldest idle thread and it seem to be stuck
for whatever reason.  All the threads that are in biowait() seem to be much
later than that thread. It could be that the arc_read_no_lock() thread could
be the cause of the other i/o waits. While i agree that biowait() on the
scdpmd could indicate some thing got stuck at the scsi layer which may or
may not be because of the zfs list thread, but since it was the oldest idle
thread i was interested in knowing why it is stuck there for ever.

The cluster team had indicated it is a invalid configuration, but did not
give any further details as to why that is the case and how we can modify
the configuration to prevent this. If you think that ZFS team needs to take
a look please assign it to ZFS and i can follow up on zfs-discuss.

Once again appreciate your response. We will try to follow up from the
storage perspective as well.

Regards,
Satya



On Fri, Mar 19, 2010 at 4:57 PM, Ashutosh Tripathi <
Ashutosh.Tripathi at sun.com> wrote:

> Hi Satya,
>
>        While i don't know why is ZFS I/O hung in biowait(),
> from past experience, i can tell you that biowait() issues
> tend to be very hard to debug. In many cases, these actually
> turn out to be issues related to storage, ie the storage/storage-driver
> simply take too long (or loose track of) for a given I/O. At the upper
> layer (Solaris/Filesystem), there is nothing the system can do,
> except to wait for the I/O to complete.
>
>        Note that this is different from a SCSI timeout, ie
> that a SCSI packet sent by the server to the storage gets lost,
> so the host never gets a ACK back. In that case, the SCSI
> command is retried. Here, i am talking about a case where
> the SCSI command has been ACKed properly by the storage,
> it just never gets back with the completed I/O.
>
>        While your mention of the "zfs list -t all" command
> sounds a bit suspicious, when i actually look at the thread
> stack you posted in the CR, it lists a bunch of java threads
> and scdpmd threads stuck behind a biowait(). So, at least in
> that case, the hang could be independent of the zfs list
> command (it is always possible that the zfs list is triggering
> a particular pattern of I/O which leads to this...).
>
>        Anyhow, where does that leave you... Have you tried
> approaching your storage vendor with this problem? The leading
> question to them would be: Why isn't the storage completing this
> particular I/O request from the host?
>
> HTH,
> -ashu
>
>
> opensolaris_user hello wrote:
>
>>
>> Hi Cluster Team,
>>
>> We are currently running in to a ZFS hang regularly and after this happens
>> the node can end up with corrupted pool causing complete data loss. Other
>> than that since reboot doesn't work we end up with corrupted boot partition
>> causing boot to panic. We are using clustering with a single node
>> configuration and aim to expand to 2 node HA configuration.  Looking at the
>> stack traces and our application logs it is clear that a "zfs list -t all"
>> command causes the pool to be stuck.  The system works all the time with out
>> any issues except when we run in to this hang.
>>
>> I tried to analyze the root cause and i see that the zfs list thread was
>> stuck in i/o wait. It seems that this is a ZFS hang and not related to
>> clustering. We even tried to disable cluster disk path monitoring and still
>> run in to this issue.
>>
>> If we can get some insight as to why this is a invalid cluster
>> configuration and how this can lead to the ZFS hang we would appreciate
>> that.  I have filed 2 bugs but the following bug explains the situation much
>> better:
>>
>> http://defect.opensolaris.org/bz/show_bug.cgi?id=15058
>>
>> Any insight/help with this issue is highly appreciated.
>>
>> Thanks,
>> Satya
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> ha-clusters-discuss mailing list
>> ha-clusters-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20100319/2e44db77/attachment.html>

Reply via email to