I am experimenting with the use of dm-integrity underneath dm-raid, to get around the problem where, if a RAID 1 or RAID 5 array is inconsistent, you may not know which copy is the good one. I have found a reproducible hard lockup involving XFS, RAID 5 and dm-integrity.
My test array consists of three spinning HDDs (each 2 decimal terabytes), each with dm-integrity laid directly onto the disk (no partition table), using SHA-256 checksums. On top of this is an MD-RAID array (raid5), and on top of *that* is an ordinary XFS filesystem. Extracting a large tar archive (970 G) into the filesystem causes a hard lockup -- the entire system becomes unresponsive -- after some tens of gigabytes have been extracted. I have reproduced the lockup using kernel versions 6.6.21 and 6.9.3. No error messages make it to the console, but with 6.9.3 I was able to extract almost all of a lockdep report from pstore. I don't fully understand lockdep reports, but it *looks* like it might be a livelock rather than a deadlock, with all available kworker threads so bogged down with dm-integrity chores that an XFS log flush is blocked long enough to hit the hung task timeout. Attached are: - what I have of the lockdep report (kernel 6.9.3) (only a couple of lines at the very top are missing) - kernel .config (6.9.3, lockdep enabled) - dmesg up till userspace starts (6.6.21, lockdep not enabled) - details of the test array configuration Please advise if there is any more information you need. I am happy to test patches. I'm not subscribed to either dm-devel or linux-xfs. zw p.s. Incidentally, why doesn't the dm-integrity superblock record the checksum algorithm in use?
# xfs_info /dev/md/sdr5p1
meta-data=/dev/md/sdr5p1 isize=512 agcount=32, agsize=30283904 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=1
= reflink=1 bigtime=1 inobtcount=1 nrext64=1
data = bsize=4096 blocks=969084928, imaxpct=5
= sunit=128 swidth=256 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=473186, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
# gdisk -l /dev/md/sdr5
GPT fdisk (gdisk) version 1.0.9
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Found valid GPT with protective MBR; using GPT.
Disk /dev/md/sdr5: 969086976 sectors, 3.6 TiB
Sector size (logical/physical): 4096/4096 bytes
Disk identifier (GUID): 28F28613-3AAD-46F9-AABA-9CC7E7EFFC3D
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 5
First usable sector is 6, last usable sector is 969086970
Partitions will be aligned on 256-sector boundaries
Total free space is 501 sectors (2.0 MiB)
Number Start (sector) End (sector) Size Code Name
1 256 969086719 3.6 TiB 8300 test_array
# mdadm --detail /dev/md/sdr5
/dev/md/sdr5:
Version : 1.2
Creation Time : Fri May 10 21:13:54 2024
Raid Level : raid5
Array Size : 3876347904 (3.61 TiB 3.97 TB)
Used Dev Size : 1938173952 (1848.39 GiB 1984.69 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Wed Jun 5 14:09:18 2024
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Consistency Policy : bitmap
Name : moxana:sdr5 (local to host moxana)
UUID : 395e1626:1483f9f8:39d6e78c:af21beb1
Events : 13781
Number Major Minor RaidDevice State
0 253 3 0 active sync /dev/dm-3
1 253 4 1 active sync /dev/dm-4
3 253 5 2 active sync /dev/dm-5
# for d in sda sdb sdc; do integritysetup dump /dev/$d; done
Info for integrity device /dev/sda.
superblock_version 5
log2_interleave_sectors 15
integrity_tag_size 32
journal_sections 496
provided_data_sectors 3876612136
sector_size 4096
log2_blocks_per_bitmap 12
flags fix_padding fix_hmac
Info for integrity device /dev/sdb.
superblock_version 5
log2_interleave_sectors 15
integrity_tag_size 32
journal_sections 496
provided_data_sectors 3876612136
sector_size 4096
log2_blocks_per_bitmap 12
flags fix_padding fix_hmac
Info for integrity device /dev/sdc.
superblock_version 5
log2_interleave_sectors 15
integrity_tag_size 32
journal_sections 496
provided_data_sectors 3876612136
sector_size 4096
log2_blocks_per_bitmap 12
flags fix_padding fix_hmac
[ 2213.559141] Not tainted 6.9.3-gentoo-lockdep #2
[ 2213.559146] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 2213.559149] task:kworker/25:3 state:D stack:0 pid:13498 tgid:13498
ppid:2 flags:0x00004000
[ 2213.559160] Workqueue: xfs-sync/md126p1 xfs_log_worker
[ 2213.559169] Call Trace:
[ 2213.559172] <TASK>
[ 2213.559177] __schedule+0x49a/0x1900
[ 2213.559183] ? find_held_lock+0x32/0x90
[ 2213.559190] ? srso_return_thunk+0x5/0x5f
[ 2213.559198] schedule+0x31/0x130
[ 2213.559204] schedule_timeout+0x1cd/0x1e0
[ 2213.559212] __wait_for_common+0xbc/0x1e0
[ 2213.559218] ? usleep_range_state+0xc0/0xc0
[ 2213.559226] __flush_workqueue+0x15f/0x470
[ 2213.559235] ? __wait_for_common+0x4d/0x1e0
[ 2213.559242] xlog_cil_push_now.isra.0+0x59/0xa0
[ 2213.559249] xlog_cil_force_seq+0x7d/0x290
[ 2213.559257] xfs_log_force+0x86/0x2d0
[ 2213.559263] xfs_log_worker+0x36/0xd0
[ 2213.559270] process_one_work+0x210/0x640
[ 2213.559279] worker_thread+0x1c7/0x3c0
[ 2213.559287] ? wq_sysfs_prep_attrs+0xa0/0xa0
[ 2213.559294] kthread+0xd2/0x100
[ 2213.559301] ? kthread_complete_and_exit+0x20/0x20
[ 2213.559309] ret_from_fork+0x2b/0x40
[ 2213.559317] ? kthread_complete_and_exit+0x20/0x20
[ 2213.559324] ret_from_fork_asm+0x11/0x20
[ 2213.559332] </TASK>
[ 2213.559361] Showing all locks held in the system:
[ 2213.559390] 2 locks held by kworker/u131:0/208:
[ 2213.559395] #0: ffff9aa10ffe2d58
((wq_completion)xfs-cil/md126p1){+.+.}-{0:0}, at: process_one_work+0x3cc/0x640
[ 2213.559421] #1: ffffb848c08dbe58
((work_completion)(&ctx->push_work)){+.+.}-{0:0}, at:
process_one_work+0x1ca/0x640
[ 2213.559446] 3 locks held by kworker/u130:13/223:
[ 2213.559451] #0: ffff9aa7cc1f8158 ((wq_completion)writeback){+.+.}-{0:0},
at: process_one_work+0x3cc/0x640
[ 2213.559474] #1: ffffb848c0953e58
((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at:
process_one_work+0x1ca/0x640
[ 2213.559497] #2: ffff9aa0c25400e8 (&type->s_umount_key#32){++++}-{3:3}, at:
super_trylock_shared+0x11/0x50
[ 2213.559522] 1 lock held by khungtaskd/230:
[ 2213.559526] #0: ffffffff89ec2e20 (rcu_read_lock){....}-{1:2}, at:
debug_show_all_locks+0x2c/0x1d0
[ 2213.559557] 1 lock held by usb-storage/414:
[ 2213.559561] #0: ffff9aa0cb15ace8 (&us_interface_key[i]){+.+.}-{3:3}, at:
usb_stor_control_thread+0x43/0x2d0
[ 2213.559591] 1 lock held by in:imklog/1997:
[ 2213.559595] #0: ffff9aa0db2258d8 (&f->f_pos_lock){+.+.}-{3:3}, at:
__fdget_pos+0x84/0xd0
[ 2213.559620] 2 locks held by kworker/u131:3/3226:
[ 2213.559624] #0: ffff9aa10ffe2d58
((wq_completion)xfs-cil/md126p1){+.+.}-{0:0}, at: process_one_work+0x3cc/0x640
[ 2213.559664] #1: ffffb848c47a7e58
((work_completion)(&ctx->push_work)){+.+.}-{0:0}, at:
process_one_work+0x1ca/0x640
[ 2213.559706] 2 locks held by tar/5845:
[ 2213.559710] #0: ffff9aa0c2540420 (sb_writers#6){.+.+}-{0:0}, at:
ksys_write+0x6c/0xf0
[ 2213.559732] #1: ffff9aa0e16c3f58 (&sb->s_type->i_mutex_key#8){++++}-{3:3},
at: xfs_ilock+0x144/0x180
[ 2213.559789] 2 locks held by kworker/14:28/6524:
[ 2213.559793] #0: ffff9aa0da45e758
((wq_completion)dm-integrity-writer#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.559815] #1: ffffb848c64e7e58
((work_completion)(&ic->writer_work)){+.+.}-{0:0}, at:
process_one_work+0x1ca/0x640
[ 2213.559882] 2 locks held by kworker/12:45/8171:
[ 2213.559886] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.559908] #1: ffffb848d6583e58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.559949] 2 locks held by kworker/12:81/8479:
[ 2213.559953] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.559979] #1: ffffb848d6ea3e58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.560006] 2 locks held by kworker/12:98/8496:
[ 2213.560010] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.560036] #1: ffffb848d6f2be58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.560062] 2 locks held by kworker/12:101/8499:
[ 2213.560067] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.560093] #1: ffffb848d6f43e58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.560118] 2 locks held by kworker/12:110/8508:
[ 2213.560122] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.560149] #1: ffffb848d6f8be58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.560175] 2 locks held by kworker/12:111/8509:
[ 2213.560180] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.560206] #1: ffffb848d6f93e58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.560230] 2 locks held by kworker/12:112/8510:
[ 2213.560235] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.560261] #1: ffffb848d6f9be58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.560307] 2 locks held by kworker/u131:5/9163:
[ 2213.560312] #0: ffff9aa10ffe2d58
((wq_completion)xfs-cil/md126p1){+.+.}-{0:0}, at: process_one_work+0x3cc/0x640
[ 2213.560335] #1: ffffb848d8803e58
((work_completion)(&ctx->push_work)){+.+.}-{0:0}, at:
process_one_work+0x1ca/0x640
[ 2213.560359] 2 locks held by kworker/u131:6/9166:
[ 2213.560364] #0: ffff9aa10ffe2d58
((wq_completion)xfs-cil/md126p1){+.+.}-{0:0}, at: process_one_work+0x3cc/0x640
[ 2213.560387] #1: ffffb848c44c7e58
((work_completion)(&ctx->push_work)){+.+.}-{0:0}, at:
process_one_work+0x1ca/0x640
[ 2213.560429] 2 locks held by kworker/30:236/9664:
[ 2213.560433] #0: ffff9aa0e43c0b58
((wq_completion)dm-integrity-writer#3){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.560455] #1: ffffb848da42be58
((work_completion)(&ic->writer_work)){+.+.}-{0:0}, at:
process_one_work+0x1ca/0x640
[ 2213.560540] 2 locks held by kworker/12:128/11574:
[ 2213.560544] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.560564] #1: ffffb848de48be58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.648428] 2 locks held by kworker/12:175/11621:
[ 2213.648431] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.648443] #1: ffffb848de603e58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.651134] 2 locks held by kworker/12:177/11623:
[ 2213.651136] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.651147] #1: ffffb848c4c47e58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.651158] 2 locks held by kworker/12:179/11625:
[ 2213.651159] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.651170] #1: ffffb848de613e58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.651181] 2 locks held by kworker/12:180/11626:
[ 2213.651183] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.651193] #1: ffffb848de61be58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.651205] 2 locks held by kworker/12:182/11628:
[ 2213.651206] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.651217] #1: ffffb848de62be58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.651228] 2 locks held by kworker/12:184/11630:
[ 2213.651230] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.651240] #1: ffffb848d4793e58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.651257] 2 locks held by kworker/12:236/11682:
[ 2213.651259] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.651270] #1: ffffb848de7cbe58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.651280] 2 locks held by kworker/12:239/11685:
[ 2213.651282] #0: ffff9aa0da420358
((wq_completion)dm-integrity-offload#5){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.651293] #1: ffffb848de7e3e58
((work_completion)(&dio->work)){+.+.}-{0:0}, at: process_one_work+0x1ca/0x640
[ 2213.651341] 2 locks held by kworker/25:121/12751:
[ 2213.651343] #0: ffff9aa0c8122f58
((wq_completion)dm-integrity-writer#4){+.+.}-{0:0}, at:
process_one_work+0x3cc/0x640
[ 2213.651353] #1: ffffb848e0c13e58
((work_completion)(&ic->writer_work)){+.+.}-{0:0}, at:
process_one_work+0x1ca/0x640
[ 2213.651425] 2 locks held by kworker/25:3/13498:
[ 2213.651426] #0: ffff9aa0c7bfe758
((wq_completion)xfs-sync/md126p1){+.+.}-{0:0}, at: process_one_work+0x3cc/0x640
[ 2213.651436] #1: ffffb848e259be58
((work_completion)(&(&log->l_work)->work)){+.+.}-{0:0}, at:
process_one_work+0x1ca/0x640
[ 2213.651465] =============================================
[ 2213.651467] Kernel panic - not syncing: hung_task: blocked tasks
[ 2213.652654] Kernel Offset: 0x7000000 from 0xffffffff81000000 (relocation
range: 0xffffffff80000000-0xffffffffbfffffff)
dmesg.txt.gz
Description: application/gzip
kconfig-6.9.x-lockdep.gz
Description: application/gzip
