On Tue, Feb 23, 2016 at 11:22:47PM +0000, Duncan wrote: > Forgot to mention, tho you're probably already considering it, if this is > the same raid5-backed btrfs you were complaining about being slow in the > other thread,
No, that's another one :) This one was remade from scratch after the filesystem on it got corrupted. 5 x 4TB swraid5 64GB SSD bcache dmcrypt btrfs Smart is 100% for all 5 drives, and they passed an extensive test before I built the new raid and filesystem on them. > and considering redoing with bcache to an ssd added, as > seems very likely, if it /is/ actually storage device or bus errors, that > could be one reason the previous one was getting so slow... Maybe it > wasn't btrfs after all. Good thinking, although in this case, it's a different filesystem. This filesystem is however on a Sata port multiplier with a 2 meter cable to an external disk array. As a result, bandwidth to it is going to be slow-ish, and the long cable could be adding I/O errors. On Tue, Feb 23, 2016 at 11:17:06PM +0000, Duncan wrote: > I believe all formal documentation of what the error counters actually > mean is developer-level -- "Trust the Source, Luke." Haha, I know that one :) Although to be fair I was more offering for someone to tell me what they're supposed to mean, and me updating the wiki to capture that info. > Yet another point supporting the "btrfs is still stabilizing, not yet > fully stable" position, I suppose, as it could definitely be argued that > those counters and their visibility, including display in the kernel log > at mount time, are definitely intended to be consumed at the admin-user > level, and that it follows that they should be documented at the admin- > user level before the filesystem can properly be defined as fully stable. Yes :) and I'm happy to help make this reality in the wiki at least. > Write error counter increments should be accompanied by kernel log events > telling you more -- what level of the device stack is returning the > errors that propagate up to the filesystem level, for instance. Expected > would be either bus level timeouts and resets, or storage device errors. I agree, and I get 0 such errors here, which is why it's weird. > If it's storage device errors, SMART data should show increasing raw > value relocated sectors or the like (smartctl -A). If it's bus errors, Correct, and they are all at 0. > it could be bad cabling (bad connections or bad shielding, or using > SATA-150 certified cables for SATA-600 or some such), or, as I saw on an Cabling is indeed a likely culprit, I'm just surprised that if it's the case, the sata layer is showing me nothing (I'm doing tail -f /var/log/kern.log and usually I'd see sata or PMP errors there) > old and failing mobo (when I pulled it there were bulging and some > exploded capacitors) a few years ago, failing filter-capacitors on the > mobo signalling paths. Bad power, including the possibility of an > overloaded UPS that hit one guy I know, is notorious for both this sort > of issue and memory problems, as well. All true, but wouldn't all of these show up as actual disk errors by the underlying driver involved too? Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html