I'm seeing three separate problems:

May 19 03:25:39 lcg-lrz-dc10 kernel: [903095.585150] megasas:
megasas_aen_polling waiting for controller reset to finish for scsi0
May 19 03:25:50 lcg-lrz-dc10 kernel: [903106.581205] sd 0:0:14:0:
Device offlined - not ready after error recovery

I don't know if that's controller related or drive related. In either
case it's hardware related. And then:

May 28 16:40:43 lcg-lrz-dc10 kernel: [1727608.170703] BTRFS: bdev
/dev/sdm errs: wr 12, rd 0, flush 0, corrupt 0, gen 0
May 28 16:40:50 lcg-lrz-dc10 kernel: [1727615.608552] BTRFS: bdev
/dev/sdm errs: wr 12, rd 1, flush 0, corrupt 0, gen 0
...
May 28 16:43:16 lcg-lrz-dc10 kernel: [1727761.077607] BTRFS: bdev
/dev/sdm errs: wr 28, rd 21596, flush 0, corrupt 0, gen 0

This is just the fs saying it can't write to one particular drive, and
then also many read failures. And then:


May 28 21:03:06 lcg-lrz-dc10 kernel: [1743336.369569] BTRFS: lost page
write due to I/O error on /dev/sdm
May 28 21:03:07 lcg-lrz-dc10 kernel: [1743337.093299] sd 0:0:14:0:
rejecting I/O to offline device
May 28 21:03:07 lcg-lrz-dc10 kernel: [1743337.094348] BTRFS (device
sdp): bad tree block start 3328214216270427953 3448651776

So another lost write to the same drive, sdm, and then new problem
which is bad tree block on a different drive sdp. And then:

May 28 21:03:07 lcg-lrz-dc10 kernel: [1743337.096927] BTRFS: error -5
while searching for dev_stats item for device /dev/sdm!
May 28 21:03:07 lcg-lrz-dc10 kernel: [1743337.097314] BTRFS warning
(device sdp): Skipping commit of aborted transaction.

It still hasn't given up on sdm (which seems kinda odd by now that
there are thousands of read errors and the kernel thinks it's offline
anyway), but then now has to deal with problems with sdp. The
resulting stack trace though suggests a umount was in progress?


May 28 22:55:56 lcg-lrz-dc10 kernel: [1750099.616565] CPU: 4 PID:
134844 Comm: umount Tainted: G        W       4.0.0-trunk-amd64 #1
Debian 4.0-1~exp1



https://bugs.launchpad.net/ubuntu/+source/linux/+bug/891115
That's an old bug, kernel 3.2 era. But ultimately it looks like it was
hardware related.


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to