Le 10/03/2021 à 08:59, Vaibhav Jain a écrit :
While removing large number of mappings from hash page tables for
large memory systems as soft-lockup is reported because of the time
spent inside htap_remove_mapping() like one below:
watchdog: BUG: soft lockup - CPU#8 stuck for 23s!
<snip>
NIP plpar_hcall+0x38/0x58
LR pSeries_lpar_hpte_invalidate+0x68/0xb0
Call Trace:
0x1fffffffffff000 (unreliable)
pSeries_lpar_hpte_removebolted+0x9c/0x230
hash__remove_section_mapping+0xec/0x1c0
remove_section_mapping+0x28/0x3c
arch_remove_memory+0xfc/0x150
devm_memremap_pages_release+0x180/0x2f0
devm_action_release+0x30/0x50
release_nodes+0x28c/0x300
device_release_driver_internal+0x16c/0x280
unbind_store+0x124/0x170
drv_attr_store+0x44/0x60
sysfs_kf_write+0x64/0x90
kernfs_fop_write+0x1b0/0x290
__vfs_write+0x3c/0x70
vfs_write+0xd4/0x270
ksys_write+0xdc/0x130
system_call+0x5c/0x70
Fix this by adding a cond_resched() to the loop in
htap_remove_mapping() that issues hcall to remove hpte mapping. This
should prevent the soft-lockup from being reported.
Isn't it overkill to call is at each iteration ?
Looking at a few other places, there is some mitigation. For instance fadump_free_reserved_memory()
does it based on elapsed time. Another exemple is drmem_lmb_next() doing it every 16 iteration.
Suggested-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Vaibhav Jain <[email protected]>
---
arch/powerpc/mm/book3s64/hash_utils.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c
b/arch/powerpc/mm/book3s64/hash_utils.c
index 581b20a2feaf..ea3945c70b18 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -359,6 +359,8 @@ int htab_remove_mapping(unsigned long vstart, unsigned long
vend,
}
if (rc < 0)
return rc;
+
+ cond_resched();
}
return ret;
Christophe