Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting

Jan Kara Wed, 13 Nov 2013 23:58:58 -0800

On Wed 13-11-13 05:59:11, Denys Fedoryshchenko wrote:
> Hi
> 
> On 2013-11-12 23:46, Jan Kara wrote:
> >Hello,
> >
> >On Tue 12-11-13 16:34:07, Denys Fedoryshchenko wrote:
> >>I just did some fault testing for test nbd setup, and found that if
> >>i reboot nbd server i will get immediately BUG() message on nbd
> >>client and filesystem that i cannot unmount, and any operations on
> >>it will freeze and lock processes trying to access it.
> >  So how exactly did you do the fault testing? Because it seems
> >something
> >has discarded the block device under filesystem's toes and the
> >superblock
> >buffer_head got unmapped. Didn't something call NBD_CLEAR_SOCK ioctl?
> >Because that calls kill_bdev() which would do exactly that...
> 
> Client side:
> modprobe nbd
> nbd-client 2.2.2.29 /dev/nbd0 -name export1
> nbd-client 2.2.2.29 /dev/nbd1 -name export2
> nbd-client 2.2.2.29 /dev/nbd2 -name export3
> mount /dev/nbd0 /mnt/disk1
> mount /dev/nbd1 /mnt/disk2
> mount /dev/nbd2 /mnt/disk3
> 
> On server i have config:
> [generic]
> [export1]
>         exportname = /dev/sda1
> [export2]
>         exportname = /dev/sdb1
> [export3]
>         exportname = /dev/sdc1
> 
> Steps to reproduce:
> 1)Start some large file copy on client side to /mnt/disk1/
> 2)Reboot server. It reboots quite fast, just few seconds, server
> system will get ip before nbd-server process started listening, so
> probably nbd-client will see connection refused.
> 3)seems when client gets connection refused - it is going mad
> 
> I can try to capture traffic dump, or do any other debug operation,
> please let me know, what i should run :)
> P.S. I noticed maybe i should run persist mode, but anyway it should
> not crash like this i think.
  OK, no need for further debugging. I see what's going on. In NBD_DO_IT
ioctl() nbd calls kill_bdev() after the kthread returned - and that happens
in your case as we can see from "queue cleared" messages.


Now there is a question how to fix this. Filesystems don't really expect
device buffers to disappear under us as they do when nbd calls kill_bdev().
Also that never happens with normal block devices - if a similar situation
happens to SCSI / SATA disk, corresponding block devices hang around
refusing any IO until the filesystem is unmounted and at that point they
disappear (device's refcount - bd_openers - reaches zero). It would be good
if NBD behaved the same way - maybe we should return from NBD_DO_IT ioctl
only after bd_openers drops to 1 (not zero because the nbd client has the
device open as well for the ioctl if I'm right)?

                                                                Honza
---
Jan Kara <[email protected]>
SUSE Labs, CR

------------------------------------------------------------------------------
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
_______________________________________________
Nbd-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] 3.12 BUG() on ext4, kernel crash on nbd-client when nbd server rebooting

Reply via email to