On Tue, Feb 28, 2017 at 06:35:16PM +0100, Michael Biebl <bi...@debian.org> wrote: > Can you elaborate where and how such data corruption can happen? > Not being able to shutdown/detach all DM devices has been the case for > basically forever in Debian (most prominent example if / is on LVM). > I've never seen data corruption as a result of this.
Dmcache is good at caching random reads. when it isn't shut down cleanly, even in writethrough ("read-only") mode, it will mark every block in the cache as dirty and write it back. in my case, with a modest 40GB cache for a 20TB volume, this results in hours of extremely high-seek workloads. This stresses almost everything, the I/O subsystem, any hardware controllers, the bus, programs which are constantly timeouting due to the heavy I/O and so on. This can (and multiple times has) caused latent bugs in dmcache to corrupt data which wouldn't have happened if the cache was cleanly shut down. In the case of a larger ssd-based subsysstem as origin device, this can also cause excessive wear. I am not trying to pull the data corruption club here - the behaviour itself is not directly causing corruption, it is merely asking for it. the biggest issue for me with current kernels is that servers that are not shut down cleanly are extremely sluggish for hours, basically unusable. Also, I indeed initially had trouble with systemd and used a script to clean up the dm tables (this is hard to do with systemd though, as it's not easy to insert this at the right time during shutdown), but in jessie, and with my current setup(s), it was able to clean up the dm targets - maybe due to luck, so the script didn't run. Looking at the systemd-shutdown sources (thanks for pointing those out!), it becomes quite clear that systemd-shutdown is not even best-effort, but more or less a nice attempt - for example, it doesn't do a topological sort to clean up dependencies but simply does a fixed number of loops in the hope that this resolves issues. A topological sort would be trivial to implement - simply loop until either no progress can be made or all devices have been shut down. That would be a) correct and b) faster than simply looping a few times. (I was looking at the jessie sources because most of our servers run jessie, ignore this if it's already ifxed, otherwise, that would be an obvious improvement :). Greetings, and again, thanks for treating this as an actual bug. -- The choice of a Deliantra, the free code+content MORPG -----==- _GNU_ http://www.deliantra.net ----==-- _ generation ---==---(_)__ __ ____ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / schm...@schmorp.de -=====/_/_//_/\_,_/ /_/\_\ny