Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting

Paul Clements Mon, 18 Nov 2013 14:18:14 -0800

On Sun, Nov 17, 2013 at 12:19 PM, Alex Bligh <[email protected]> wrote:


>
> On 17 Nov 2013, at 09:46, Wouter Verhelst wrote:
>
> >>
> >> In order for nbd to seamlessly handle this situation, we'd have to do a
> >> reconnect in-kernel
> >
> > This would be fairly complicated, since all the connection and
> > negotiation currently happens in userspace. I'm not sure I want to go
> > down that route.
> >
> >> (or have a callout to userland to reconnect)
> >
> > That sounds interesting, too. How would you do that?
> >
> >> and
> >> then we'd have to retry any I/Os that may have failed in the meantime
> >> (or just let them fail, but that probably is not as useful).
> >
>
> Would another option be as follows:


>
1. When persistency is required, a new persist flag is specified to
>    the kernel by the client.
>

yes


>
> 2. On a connection failure, if the persist flag is set, don't
>    clear up and return with a specific error number. The fd is
>    still open (as still owned by the process), but (by assumption)
>    unusable.
>

right


>
> 3. In persist mode, The block device only gets torn down when
>    the fd closes / userland process terminates (whichever is
>    easier, detection method TBD). Until then all writes block.
>
>
Yes, I'm thinking maybe a .release would allow the driver to see process
exit/fd close and do the tear down in that case (for example, a SIGKILL on
nbd-client will not allow it to reconnect as the client will die
immediately upon return to userland).

Blocking the writes will hopefully be straightforward -- however, there is
a fair amount of cleanup required to change the current behavior of failing
I/Os to blocking instead. Also, we'll have to retry any requests that are
left on the nbd->queue_head (awaiting response from the server), followed
by resuming handling of requests that are on the main request queue.



> 4. A newer nbd client detects the errno in persist mode, opens another
>    fd, and calls the NBD_DOIT ioctl passing the old fd as an
>    additional parameter (or does a new ioctl first to associate
>    the new fd with the old fd).


Probably a new ioctl, as the routine will be slightly different on
reconnect...


> A new kernel then detects this,
>    closes the old fd, and 'takes over' the existing block device
>    with the new fd.
>
>
Right, some rearrangement of the ioctls would be required too...we'd
probably want alternate versions of SET_SOCK and DO_IT that are re-entrant
(right now those will error on an already-configured device, and they're
doing some setup and teardown that is unneeded in the reconnect case).

------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk

_______________________________________________
Nbd-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting

Reply via email to