Upd: i've try do removing disk by 'right' way: # echo 1 > /sys/block/sdf/device/delete
All okay and system don't crush immediately on 'sync' call and can work some time without problem, but after some call, which i can repeat by: # apt-get update testing system get kernel crush (on which i delete one of raid1 btrfs device), i've get following dmesg: ---- Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: Modules linked in: 8021q garp mrp stp llc binfmt_misc gpio_ich coretemp kvm_intel lpc_ich ipmi_ssif kvm amdkfd amd_iommu_v2 serio_raw radeon ttm i5000_edac drm_kms_helper drm edac_core i2c_algo_bit i5k_amb ioatdma dca shpchp 8250_fintek joydev mac_hid ipmi_si ipmi_msghandler bonding autofs4 btrfs xor raid6_pq ses enclosure hid_generic psmouse usbhid hid mptsas mptscsih e1000e mptbase scsi_transport_sas ptp pps_core Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: CPU: 3 PID: 99 Comm: kworker/u16:4 Not tainted 4.0.4-040004-generic #201505171336 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: Hardware name: Intel S5000VSA/S5000VSA, BIOS S5000.86B.15.00.0101.110920101604 11/09/2010 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: Workqueue: btrfs-endio btrfs_endio_helper [btrfs] Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: task: ffff88009ab31400 ti: ffff88009ab40000 task.ti: ffff88009ab40000 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: RIP: 0010:[<ffffffffc0477d50>] [<ffffffffc0477d50>] repair_io_failure+0x1c0/0x200 [btrfs] Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: RSP: 0018:ffff88009ab43bb8 EFLAGS: 00010206 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: RAX: 0000000000000000 RBX: ffff88009b1d3f30 RCX: ffff88009b53f9c0 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: RDX: ffff88044902f400 RSI: 0000000000000000 RDI: ffff88009b53f9c0 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: RBP: ffff88009ab43c18 R08: 0000000000000000 R09: 0000000000000000 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: R10: ffff880448c1b090 R11: 0000000000000000 R12: 0000000039070000 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: R13: ffff880439599e68 R14: 0000000000001000 R15: ffff88009a860000 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: FS: 0000000000000000(0000) GS:ffff88045fcc0000(0000) knlGS:0000000000000000 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: CR2: 00007f640a27e675 CR3: 0000000098b4b000 CR4: 00000000000407e0 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: Stack: Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: 0000000000000000 000000009a860de0 ffffea0002644380 00000003d2ee8000 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: 0000000000008000 ffff88009b53f9c0 ffff88009ab43c18 ffff88009b1d3f30 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: ffff88044c44a3c0 ffff88009b0c1190 0000000000000000 ffff88009a860000 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: Call Trace: Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: [<ffffffffc0477f30>] clean_io_failure+0x1a0/0x1b0 [btrfs] Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: [<ffffffffc0478218>] end_bio_extent_readpage+0x2d8/0x3d0 [btrfs] Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: [<ffffffff8137b2c3>] bio_endio+0x53/0xa0 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: [<ffffffff8137b322>] bio_endio_nodec+0x12/0x20 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: [<ffffffffc044efb8>] end_workqueue_fn+0x48/0x60 [btrfs] Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: [<ffffffffc0488b2e>] normal_work_helper+0x7e/0x1b0 [btrfs] Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: [<ffffffffc0488d32>] btrfs_endio_helper+0x12/0x20 [btrfs] Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: [<ffffffff81092204>] process_one_work+0x144/0x490 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: [<ffffffff81092c6e>] worker_thread+0x11e/0x450 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: [<ffffffff81092b50>] ? create_worker+0x1f0/0x1f0 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: [<ffffffff81098999>] kthread+0xc9/0xe0 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: [<ffffffff810988d0>] ? flush_kthread_worker+0x90/0x90 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: [<ffffffff817f08d8>] ret_from_fork+0x58/0x90 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: [<ffffffff810988d0>] ? flush_kthread_worker+0x90/0x90 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: Code: 44 00 00 4c 89 ef e8 b0 34 f0 c0 31 f6 4c 89 e7 e8 06 05 01 00 ba fb ff ff ff e9 c7 fe ff ff ba fb ff ff ff e9 bd fe ff ff 0f 0b <0f> 0b 49 8b 4c 24 30 48 8b b3 58 fe ff ff 48 83 c1 10 48 85 f6 Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: RIP [<ffffffffc0477d50>] repair_io_failure+0x1c0/0x200 [btrfs] Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: RSP <ffff88009ab43bb8> Jun 17 12:00:41 srv-lab-ceph-node-01 kernel: ---[ end trace 0361c6fdca5f7ee2 ]--- --- Another test case: i've delete device: echo 1 > /sys/block/sdf/device/delete after i reinsert this device (remove and insert again in server) Server found sdg device, all that okay but kernel crush with following stuck trace: --- Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: kernel BUG at /home/kernel/COD/linux/fs/btrfs/extent_io.c:2057! Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: invalid opcode: 0000 [#1] SMP Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: Modules linked in: 8021q garp mrp stp llc binfmt_misc gpio_ich coretemp kvm_intel amdkfd amd_iommu_v2 ipmi_ssif kvm radeon lpc_ich serio_raw ttm i5000_edac edac_core drm_kms_helper drm i5k_amb ioatdma i2c_algo_bit joydev 8250_fintek ipmi_si dca ipmi_msghandler mac_hid shpchp bonding autofs4 btrfs xor raid6_pq ses enclosure hid_generic psmouse mptsas usbhid mptscsih hid mptbase scsi_transport_sas e1000e ptp pps_core Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: CPU: 2 PID: 72 Comm: kworker/u16:2 Not tainted 4.0.4-040004-generic #201505171336 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: Hardware name: Intel S5000VSA/S5000VSA, BIOS S5000.86B.15.00.0101.110920101604 11/09/2010 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: Workqueue: btrfs-endio btrfs_endio_helper [btrfs] Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: task: ffff88044d215a00 ti: ffff880449b1c000 task.ti: ffff880449b1c000 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: RIP: 0010:[<ffffffffc02a9d50>] [<ffffffffc02a9d50>] repair_io_failure+0x1c0/0x200 [btrfs] Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: RSP: 0018:ffff880449b1fbb8 EFLAGS: 00010206 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: RAX: 0000000000000000 RBX: ffff88044c3ac308 RCX: ffff88044c5ef3c0 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: RDX: ffff880449117400 RSI: 0000000000000000 RDI: ffff88044c5ef3c0 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: RBP: ffff880449b1fc18 R08: 0000000000000000 R09: 0000000000000000 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: R10: ffff880448ce0090 R11: 0000000000000000 R12: 000000003999a000 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: R13: ffff88043999a568 R14: 0000000000001000 R15: ffff880449510000 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: FS: 0000000000000000(0000) GS:ffff88045fc80000(0000) knlGS:0000000000000000 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: CR2: 00007fbfbe12cf00 CR3: 0000000449b4e000 CR4: 00000000000407e0 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: Stack: Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: 0000000000000000 0000000049510de0 ffffea0010f40540 00000003f7ed4000 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: 000000000000c000 ffff88044c5ef3c0 ffff880449b1fc18 ffff88044c3ac308 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: ffff88044b1acc80 ffff880448dcbfa0 0000000000000000 ffff880449510000 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: Call Trace: Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: [<ffffffffc02a9f30>] clean_io_failure+0x1a0/0x1b0 [btrfs] Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: [<ffffffffc02aa218>] end_bio_extent_readpage+0x2d8/0x3d0 [btrfs] Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: [<ffffffff8137b2c3>] bio_endio+0x53/0xa0 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: [<ffffffff8137b322>] bio_endio_nodec+0x12/0x20 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: [<ffffffffc0280fb8>] end_workqueue_fn+0x48/0x60 [btrfs] Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: [<ffffffffc02bab2e>] normal_work_helper+0x7e/0x1b0 [btrfs] Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: [<ffffffffc02bad32>] btrfs_endio_helper+0x12/0x20 [btrfs] Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: [<ffffffff81092204>] process_one_work+0x144/0x490 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: [<ffffffff81092c6e>] worker_thread+0x11e/0x450 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: [<ffffffff81092b50>] ? create_worker+0x1f0/0x1f0 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: [<ffffffff81098999>] kthread+0xc9/0xe0 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: [<ffffffff810988d0>] ? flush_kthread_worker+0x90/0x90 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: [<ffffffff817f08d8>] ret_from_fork+0x58/0x90 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: [<ffffffff810988d0>] ? flush_kthread_worker+0x90/0x90 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: Code: 44 00 00 4c 89 ef e8 b0 14 0d c1 31 f6 4c 89 e7 e8 06 05 01 00 ba fb ff ff ff e9 c7 fe ff ff ba fb ff ff ff e9 bd fe ff ff 0f 0b <0f> 0b 49 8b 4c 24 30 48 8b b3 58 fe ff ff 48 83 c1 10 48 85 f6 Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: RIP [<ffffffffc02a9d50>] repair_io_failure+0x1c0/0x200 [btrfs] Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: RSP <ffff880449b1fbb8> Jun 17 12:08:35 srv-lab-ceph-node-01 kernel: ---[ end trace 90ec36112ab1f744 ]--- P.S. I just think about case where i have 2 slots for disk in server, and i want replace one disk, which failed (overheated and just 'burned' or something else) without server downtime -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html