On Mon, Dec 17, 2018 at 09:58:47AM -0500, Bob Peterson wrote: > Dave Teigland recommended. Unless I'm mistaken, Dave has said that GFS2 > should never withdraw; it should always just kernel panic (Dave, correct > me if I'm wrong). At least this patch confines that behavior to a small > subset of withdraws.
The basic idea is that you want to get a malfunctioning node out of the way as quickly as possible so others can recover and carry on. Escalating a partial failure into a total node failure is the best way to do that in this case. Specialized recovery paths run from a partially failed node won't be as reliable, and are prone to blocking all the nodes. I think a reasonable alternative to this is to just sit in an infinite retry loop until the i/o succeeds. Dave