Peter Buckingham wrote:
Hi Eric,

eric kustarz wrote:
The first thing i would do is see if any I/O is happening ('zpool iostat 1'). If there's none, then perhaps the machine is hung (which you then would want to grab a couple of '::threadlist -v 10's from mdb to figure out if there are hung threads).

there seems to be no IO after the initial IO according to zpool iostat. When we run zpool status it hangs:

HON hcb116 ~ $ zpool status
   pool: tank  state: ONLINE
   scrub: none requested
   <hang>

I'll send you the mdb output privately since it's quite big.

60 seconds should be plenty of time for the async write(s) to complete. We try to push out txg (transaction groups) every 5 seconds. However, if the system is overloaded, then the txgs could take longer.

That's what I would have thought.

They 'sync' hanging is intriguing. Perhaps the system is just overloaded and sync command is making it worse. Seeing what 'fsync' would do would be interesting.

I've not tried this yet.

What else is the machine doing?

we are running the honeycomb environment (you can see when I send you the mdb output).

is there some issue for the zpool mirrors if one of the slices
disappears or is unresponsive after the pool has been brought online?

This can be a problem if an IO issued to the device never completes
(i.e., hangs).  This can hang up the pool.  A well-behaved device/driver
should eventually time out the IO, but we have seen instances where
this never seems to happen.

-Mark
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to