On Thu, Nov 13, 2014 at 02:52:02PM +0100, Mathijs Kwik wrote:
> Hi all,
> 
> Today, I lost most my data (don't worry, got backups) after the cache
> got corrupted somehow. I suspected a recent suspend-to-disk to be the
> cause. I checked how my distribution (NixOS) handles suspend/resume and
> I have some concerns about how bcache fits into this.

Augh :(

> Normally, the kernel and initrd get loaded. The initrd loads required
> kernel modules, waits for udev to settle, activates luks&lvm, then
> finally asks the kernel to resume from the resume device.
> 
> The kernel documentation on suspend is VERY clear you should NOT touch
> anything on disk between suspend and resume. So activating luks and LVM
> is probably risky already, but it apppears both luks and LVM do not make
> any on-disk changes when activated and any in-memory state (within the
> resumed image) is still valid. The benefit of activating luks and LVM
> before resume seems to be that it allows resuming from encrypted/lvm
> volumes. 

Yeah, this is handled for in kernel stuff with the freezing mechanism, which
bcache uses.

> Now, with bcache added, things probably get a bit hairy. NixOS supports
> bcache inside the initrd and uses udev rules to activate/attach. I
> suspect this is probably unsafe. Probably bcache starts to see if any
> dirty pages exist, to write them to the backing store. Even without
> writeback caching, the activation of lvm will read some sectors, which
> might trigger the cache to update. Then after resuming the image, the
> in-memory state is corrupted and further damage occurs. 
> 
> - Does this sound plausible? 
> - Is there any way to tell bcache to make absolutely no changes to
>   either the backing device or the cache?
>   Basically like a readaround+writearound which can be triggered on
>   hibernate and switched off on resume.

So, userspace shouldn't have to do anything to tell bcache about hibernation.

The dev branch is getting a true read only mode (still in progress), but this
isn't relevant to hibernation.

bcache kernel threads (allocator thread, gc thread) should be correct w.r.t.
hibernation, but - maybe the workqueue usage isn't.

I'm probably not going to be able to get to this in the next couple days, but
this is a pretty serious issue. Can you ping me again every couple days until I
get a fix out for this, and myabe file a bug somewhere? (i think
bugzilla.kernel.org has been used for bcache bugs before...)
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to