Thanks a lot Manuel for your findings and information.

It's good to know btrfs is not causing this issue and the common symptom is an MD journal on another RAID device.

I have moved journal from logical volume on RAID1 to a plain partition on a SSD and I will monitor the state.

Vojtech



On 17. 03. 21 5:35, Manuel Riel wrote:
Final update on this issue for anyone who encounters a similar problem in the 
future:

I didn't observe any "hanging" RAID devices after using an ordinary NVMe 
partition as journal. So using e.g. another md-RAID1 array as journal doesn't seem to be 
supported.

The docs[1] say "This means the cache disk must be ... sustainable." The 
sustainable part motivated me to use a md-RAID1 array. I think the docs should mention 
that the journal can't be on another RAID array.

I'm sending in a patch to emphasize this in the docs.


1: https://www.kernel.org/doc/html/latest/driver-api/md/raid5-cache.html

On Feb 28, 2021, at 4:34 PM, Manuel Riel <m...@snapdragon.cc> wrote:

Hit another mdadm "hanger" today. No more reading possible and md4_raid6 stuck 
at 100% CPU.

I've now moved the write journal off the RAID1 device. So it's not a "nested" 
RAID any more. Hope this will help.

With only one hardware device used as write cache, I suppose only write-through 
mode[1] is suggested now.


1: https://www.kernel.org/doc/Documentation/md/raid5-cache.txt

Reply via email to