On Wed, 15 Aug 2012 10:21:48 -0500, Anthony Plack wrote: > Okay, this is the second occurrence of this bug. I have searched Google, and > while there are two posting for exten_io, I am not sure if they match. > > Running Gentoo with kernel 3.5 on a dual core AMD. The machine has 19 drives > of varied types. I am running rsync from an xfs volume (on two md arrays) to > the btrfs volume and moving 8.2T. This is the first time some of these > drives have been exercised. Four of the drives are in an external cage with > a SATA multiplexer running across an eSATA cable. > > On the btrfs volume, the metadata is RAID1, but the data is RAID0. > > To me the most troubling issue is that the bug causes the system to become > unresponsive whenever accessing the btrfs volume. Any btrfs command will > hang at the prompt. umount would similarly hang. On Aug 10th, I let the > prompts sit for 48 hours with no progress because I did not desire to take > the box down for other processes. All attempts to kill the processes come > back with no impact on the process, they are just zombies in the system. The > system does not seem to have excessive CPU or memory consumption. > > After the first event, I have learned what is forcing the situation. There > are two used "ST3000DM001-9YN1 CC9D" Seagate drives which are posting some > errors in the console. The multiplexer is responding to these errors by > shutting down the drive. If I reboot the box, the multiplexer will show one > drive as off-line. I was successful in removing and reseating the drive. > The bad block count is up, but not that high for a 3T drive (200s). > "shutdown" command would also hang on this first event. I unmounted all the > other volumes, and had to hard reboot the server. > > The second event, suspecting the multiplexer did it again, I hot unplugged > the second drive (/dev/sdi) which was missing from lsscsi. The drive is back > online (as /dev/sdt) but btrfs is not detecting the shift and is still hung. > I have the original rsync stuck. This time, I was able to get btrfs command > to operate without hanging. In addition, the drive is accessible, but the > rsync commands are hung. > > When I attempted to scrub the volume, I posted another trace in the log. > > > > Okay details.... > > > Trace Failure 1: > Aug 10 06:22:48 fatdrive kernel: [131136.506053] kernel BUG at > fs/btrfs/extent_io.c:1884! > Aug 10 06:22:48 fatdrive kernel: [131136.506070] invalid opcode: 0000 [#1] > SMP > Aug 10 06:22:48 fatdrive kernel: [131136.506087] CPU 1 > Aug 10 06:22:48 fatdrive kernel: [131136.506090] Modules linked in: btrfs > lzo_compress lzo_decompress zlib_deflate crc32c libcrc32c r8168(O) nfsd xfs > exportfs shpchp pci_hotplug r8169 k10temp mii kvm_amd kvm > Aug 10 06:22:48 fatdrive kernel: [131136.506168] > Aug 10 06:22:48 fatdrive kernel: [131136.506184] Pid: 8458, comm: > btrfs-endio-wri Tainted: G W O 3.5.0-gentoo #2 BIOSTAR Group TA880G > HD/TA880G HD > Aug 10 06:22:48 fatdrive kernel: [131136.506219] RIP: > 0010:[<ffffffffa02b9231>] [<ffffffffa02b9231>] repair_io_failure+0x1a1/0x1e0 > [btrfs] > Aug 10 06:22:48 fatdrive kernel: [131136.506270] RSP: 0018:ffff8800889dd970 > EFLAGS: 00010246 > Aug 10 06:22:48 fatdrive kernel: [131136.506288] RAX: ffff8800889dd9a0 RBX: > 0000000000000000 RCX: 0000007879ea8000 > Aug 10 06:22:48 fatdrive kernel: [131136.506318] RDX: 0000000000001000 RSI: > 0000007879ea8000 RDI: ffff880215754108 > Aug 10 06:22:48 fatdrive kernel: [131136.506347] RBP: ffff8800889dd9f0 R08: > ffffea0000ef8a80 R09: 0000000000000000 > Aug 10 06:22:48 fatdrive kernel: [131136.506378] R10: 57ffe641d6ef8a80 R11: > 0000000000000001 R12: 0000000000000000 > Aug 10 06:22:48 fatdrive kernel: [131136.506407] R13: ffff8800889dd990 R14: > 0000007879ea8000 R15: 0000000000001000 > Aug 10 06:22:48 fatdrive kernel: [131136.506439] FS: 00007f7f6959e700(0000) > GS:ffff88021fc40000(0000) knlGS:0000000000000000 > Aug 10 06:22:48 fatdrive kernel: [131136.506469] CS: 0010 DS: 0000 ES: 0000 > CR0: 000000008005003b > Aug 10 06:22:48 fatdrive kernel: [131136.506490] CR2: 00007f12b121f624 CR3: > 0000000198bac000 CR4: 00000000000007e0 > Aug 10 06:22:48 fatdrive kernel: [131136.506521] DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > Aug 10 06:22:48 fatdrive kernel: [131136.506556] DR3: 0000000000000000 DR6: > 00000000ffff0ff0 DR7: 0000000000000400 > Aug 10 06:22:48 fatdrive kernel: [131136.506584] Process btrfs-endio-wri > (pid: 8458, threadinfo ffff8800889dc000, task ffff88018a3c0770) > Aug 10 06:22:48 fatdrive kernel: [131136.506614] Stack: > Aug 10 06:22:48 fatdrive kernel: [131136.506628] ffffea0000ef8a80 > 0000007879ea8000 ffff880215754108 ffffea0000ef8a80 > Aug 10 06:22:48 fatdrive kernel: [131136.506659] 0000000000000000 > 0000000000000000 ffff8800889dd9a0 ffff8800889dd9a0 > Aug 10 06:22:48 fatdrive kernel: [131136.506690] 0000000000000000 > 0000000000000000 ffff880200000001 0000000000000000 > Aug 10 06:22:48 fatdrive kernel: [131136.506721] Call Trace: > Aug 10 06:22:48 fatdrive kernel: [131136.506746] [<ffffffffa02b9b91>] > repair_eb_io_failure+0x81/0xa0 [btrfs] > Aug 10 06:22:48 fatdrive kernel: [131136.506770] [<ffffffffa029119a>] > btree_read_extent_buffer_pages.constprop.115+0x11a/0x120 [btrfs] [...]
This issue is already fixed with commit c0901581 which is part of Linux 3.6 RC1: http://permalink.gmane.org/gmane.comp.file-systems.btrfs/18594 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
