On 2017-10-19 09:48, Zoltan wrote:
Hi,

On Thu, Oct 19, 2017 at 1:01 PM, Peter Grandi <p...@btrfs.list.sabi.co.uk> 
wrote:

What the OP was doing was using "unreliable" both for the case
where the device "lies" and the case where the device does not
"lie" but reports a failure. Both of these are malfunctions in a
wide sense:

   * The [block] device "lies" as to its status or what it has done.
   * The [block] device reports truthfully that an action has failed.

Thanks for making this point, it made me realize that I had different
assumption than what you use in your reasoning. I assumed that when
writes to a USB device fail due to a temporary disconnection, the
kernel can actually recognize that a write error happened. So are you
saying that a write error due to USB problems can go completely
unnoticed? That seems very strange to me; are USB drives really that
unreliable or is that some software limitation?
It depends on what type of write error happens.

If it's a case where the data gets corrupted on it's way over the bus, or the device just drops the write, or you have a bogus storage device (this is actually a pretty big issue with flash drives and SD cards, check [1], and [2] for more info on this, and [3] for a tool you can use to check things), then it generally won't be detected by the kernel, but might be by the filesystem driver when it tries to read data.

However, it doesn't go completely undetected if the device disconnects (which is where the big issue with BTRFS comes in), the kernel will detect the disconnect, issue a bus reset (which will cause performance issues with other USB devices on the same controller), and generally recover. However, the disappearance of the device doesn't get propagated up to the filesystem correctly, and that is what causes the biggest issue with BTRFS. Because BTRFS just knows writes are suddenly failing for some reason, it doesn't try to release the device so that things get properly cleaned up in the kernel, and thus when the same device reappears (as it will when the disconnect was due to a transient bus error, which happens a lot), it shows up as a different device node, which gets scanned for filesystems by udev, and BTRFS then gets really confused because it now sees 3 (or more) devices for a 2 device filesystem. That final resultant state is what's so dangerous about using USB devices with BTRFS right now, as it's pretty much guaranteed to result in data corruption.


[1] https://fightflashfraud.wordpress.com/
[2] https://sosfakeflash.wordpress.com/
[3] http://oss.digirati.com.br/f3/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to