On 07/02/2012 09:55 AM, Fabio M. Di Nitto wrote: > From: Lon Hohberger <[email protected]> > > Qdiskd hsitorically has required significant tuning to work around > delays which occur during multipath failover, overloaded I/O, and LUN > trespasses in both device-mapper-multipath and EMC PowerPath > environments. > > This patch goes a very long way towards eliminating false evictions > when these conditions occur by making qdiskd whine to the other > cluster members when it detects hung system calls. When a cluster > member whines, it indicates the source of the problem (which system > call is hung), and the act of receiving a whine from a host indicates > that qdiskd is operational, but that I/O is hung. Hung I/O is different > from losing storage entirely (where you get I/O errors). > > Possible problems: > > - Receive queue getting very full, causing messages to become blocked on > a node where I/O is hung. 1) that would take a very long time, and 2) > node should get evicted at that point anyway. > > Resolves: rhbz#782900 > > this version of the patch is a backport of: > e2937eb33f224f86904fead08499a6178868ca6a > 34d2872fb7e60be1594158acaaeb8acd74f78d22 > > There is a minor change vs original patch based on how qdiskd > in RHEL5 handles cman connection. We add an extra call to cman_alive > in main qdisk_loop to make sure data are not stalled on the > cman port, and data_callback to qdiskd_whine executed. > > Signed-off-by: Lon Hohberger <[email protected]> > Signed-off-by: Fabio M. Di Nitto <[email protected]>
Re-ack :)
