On Mon, Nov 17, 2025 at 11:16:53AM -0700, Nico Pache wrote: >On Sat, Nov 8, 2025 at 7:40 PM Wei Yang <[email protected]> wrote: >> >> On Wed, Oct 22, 2025 at 12:37:15PM -0600, Nico Pache wrote: >> >There are cases where, if an attempted collapse fails, all subsequent >> >orders are guaranteed to also fail. Avoid these collapse attempts by >> >bailing out early. >> > >> >Signed-off-by: Nico Pache <[email protected]> >> >--- >> > mm/khugepaged.c | 31 ++++++++++++++++++++++++++++++- >> > 1 file changed, 30 insertions(+), 1 deletion(-) >> > >> >diff --git a/mm/khugepaged.c b/mm/khugepaged.c >> >index e2319bfd0065..54f5c7888e46 100644 >> >--- a/mm/khugepaged.c >> >+++ b/mm/khugepaged.c >> >@@ -1431,10 +1431,39 @@ static int collapse_scan_bitmap(struct mm_struct >> >*mm, unsigned long address, >> > ret = collapse_huge_page(mm, address, referenced, >> > unmapped, cc, mmap_locked, >> > order, offset); >> >- if (ret == SCAN_SUCCEED) { >> >+ >> >+ /* >> >+ * Analyze failure reason to determine next action: >> >+ * - goto next_order: try smaller orders in same >> >region >> >+ * - continue: try other regions at same order >> >+ * - break: stop all attempts (system-wide failure) >> >+ */ >> >+ switch (ret) { >> >+ /* Cases were we should continue to the next region >> >*/ >> >+ case SCAN_SUCCEED: >> > collapsed += 1UL << order; >> >+ fallthrough; >> >+ case SCAN_PTE_MAPPED_HUGEPAGE: >> > continue; >> >+ /* Cases were lower orders might still succeed */ >> >+ case SCAN_LACK_REFERENCED_PAGE: >> >+ case SCAN_EXCEED_NONE_PTE: >> >+ case SCAN_EXCEED_SWAP_PTE: >> >+ case SCAN_EXCEED_SHARED_PTE: >> >+ case SCAN_PAGE_LOCK: >> >+ case SCAN_PAGE_COUNT: >> >+ case SCAN_PAGE_LRU: >> >+ case SCAN_PAGE_NULL: >> >+ case SCAN_DEL_PAGE_LRU: >> >+ case SCAN_PTE_NON_PRESENT: >> >+ case SCAN_PTE_UFFD_WP: >> >+ case SCAN_ALLOC_HUGE_PAGE_FAIL: >> >+ goto next_order; >> >+ /* All other cases should stop collapse attempts */ >> >+ default: >> >+ break; >> > } >> >+ break; >> >> One question here: > >Hi Wei Yang, > >Sorry I forgot to get back to this email. >
No problem, thanks for taking a look. >> >> Suppose we have iterated several orders and not collapse successfully yet. So >> the mthp_bitmap_stack[] would look like this: >> >> [8 7 6 6] >> ^ >> | > >so we always pop before pushing. So it would go > >[9] >pop >if (collapse fails) >[8 8] >lets say we pop and successfully collapse a order 8 >[8] >Then we fail the other order 8 >[7 7] >now if we succeed the first order 7 >[7 6 6] >I believe we are now in the state you wanted to describe. > >> >> Now we found this one pass the threshold check, but it fails with other >> result. > >ok lets say we pass the threshold checks, but the collapse fails for >any reason that is described in the >/* Cases were lower orders might still succeed */ >In this case we would continue to order 5 (or lower). Once we are done >with this branch of the tree we go back to the other order 6 collapse. >and eventually the order 7. > >> >> Current code looks it would give up at all, but we may still have a chance to >> collapse the above 3 range? > >for cases under /* All other cases should stop collapse attempts */ >Yes we would bail out and skip some collapses. I tried to think about >all the cases were we would still want to continue trying, vs cases >where the system is probably out of resources or hitting some major >failure, and we should just break out (as others will probably fail >too). > Thanks, your explanation is very clear. >But this is also why I separated this patch out on its own. I was >hoping to have some more focus on the different cases, and make sure I >handled them in the best possible way. So I really appreciate the >question :) > >* I did some digging through old message to find this * > >I believe these are the remaining cases. If these are hit I figured >it's better to abort. > I agree we need to take care of those cases. >/* cases where we must stop collapse attempts */ >case SCAN_CGROUP_CHARGE_FAIL: >case SCAN_COPY_MC: >case SCAN_ADDRESS_RANGE: >case SCAN_PMD_NULL: >case SCAN_ANY_PROCESS: >case SCAN_VMA_NULL: >case SCAN_VMA_CHECK: >case SCAN_SCAN_ABORT: >case SCAN_PMD_NONE: >case SCAN_PAGE_ANON: >case SCAN_PMD_MAPPED: >case SCAN_FAIL: > >Please let me know if you think we should move these to either the >`continue` or `next order` cases. Take a look into these cases, it looks good to me now. Also one of my concern is this coding style is a little hard to maintain. In case we introduce a new result, we should remember to add it here. Otherwise we may stop the collapse too early. While it maybe a separate work after this patch set merged. > >Cheers, >-- Nico > >> >> > } >> > >> > next_order: >> >-- >> >2.51.0 >> >> -- >> Wei Yang >> Help you, Help me >> -- Wei Yang Help you, Help me
