Re: [Cluster-devel] [GFS2 PATCH] gfs2: Panic when an io error occurs writing

Mark Syms Mon, 17 Dec 2018 09:16:21 -0800

On Mon, Dec 17, 2018 at 09:58:47AM -0500, Bob Peterson wrote:
> Dave Teigland recommended. Unless I'm mistaken, Dave has said that 
> GFS2 should never withdraw; it should always just kernel panic (Dave, 
> correct me if I'm wrong). At least this patch confines that behavior 
> to a small subset of withdraws.


The basic idea is that you want to get a malfunctioning node out of the way as 
quickly as possible so others can recover and carry on.  Escalating a partial 
failure into a total node failure is the best way to do that in this case.  
Specialized recovery paths run from a partially failed node won't be as 
reliable, and are prone to blocking all the nodes.

I think a reasonable alternative to this is to just sit in an infinite retry 
loop until the i/o succeeds.

Dave
[Mark Syms] I would hope that this code would only trigger after some effort 
has been put into  retrying as panicing the host on the first I/O failure seems 
like a sure fire way to get unhappy users (and in our case paying customers). 
As Edvin points out there may be other filesystems that may be able to cleanly 
unmount and thus avoid having to check everything on restart.

Re: [Cluster-devel] [GFS2 PATCH] gfs2: Panic when an io error occurs writing

Reply via email to