On 6/18/26 02:52, Vlastimil Babka (SUSE) wrote:
> On 6/4/26 04:38, Wandun Chen wrote:
>> From: Wandun Chen <[email protected]>
>>
>> compact_unevictable_allowed is default 0 under PREEMPT_RT,
>> isolate_migratepages_block() skips folios with PG_unevictable set.
>> However, mlock_folio() sets PG_mlocked immediately but defers
>> PG_unevictable to mlock_folio_batch(), result in a folio with
>> PG_mlocked=1 but PG_unevictable=0. Compaction will isolate such a
>> folio.
>>
>> Fix by checking folio_test_mlocked() together with the existing
>> folio_test_unevictable() check.
>>
>> A similar issue has been reported by Alexander Krabler on a 6.12-rt
>> aarch64 system. Vlastimil suggested to check the mlocked flag [1].
>>
>> Reported-by: Alexander Krabler <[email protected]>
>> Closes: 
>> https://lore.kernel.org/all/du0pr01mb10385345f7153f3341009818882...@du0pr01mb10385.eurprd01.prod.exchangelabs.com/
>> Suggested-by: Vlastimil Babka <[email protected]>
>> Signed-off-by: Wandun Chen <[email protected]>
>> Link: 
>> https://lore.kernel.org/all/[email protected]/ [1]
> 
> Well in that thread, Hugh doubted my suggestion and then it seems we didn't
> concluded anything. Did you actually in practice observe the issue that
> Alexander had, and that this patch fixed it, or is that theoretical?
> 
Yes, I wrote a test case that can reproduce it in a few second.

The test case contains 3 steps:
1. mlockall
2. mmap file(2GB) + trigger file write page fault;
3. during step 1, trigger compact via /proc/sys/vm/compact_memory


My reproduction environment is qemu with 4GB ram, 8 core, aarch64,
preempt_rt and includes the tracepoint in patch 02.
After running the reproduction program for a few seconds, the
following output appears.

repro-403     [004] ....1   101.270505: mm_compaction_isolate_folio: 
pfn=0x71e3a mode=0x0 flags=referenced|uptodate|mlocked
repro-403     [004] ....1   101.270507: mm_compaction_isolate_folio: 
pfn=0x71e3b mode=0x0 flags=referenced|uptodate|mlocked
repro-403     [004] ....1   101.270513: mm_compaction_isolate_folio: 
pfn=0x71e3c mode=0x0 flags=referenced|uptodate|mlocked
repro-403     [004] ....1   101.270515: mm_compaction_isolate_folio: 
pfn=0x71e3d mode=0x0 flags=uptodate|mlocked
repro-403     [004] ....1   101.270517: mm_compaction_isolate_folio: 
pfn=0x71e3e mode=0x0 flags=uptodate|mlocked
repro-403     [004] ....1   101.270520: mm_compaction_isolate_folio: 
pfn=0x71e3f mode=0x0 flags=uptodate|mlocked


Unfortunately, I recently found that there is still a bug in the
fix patch. Setting mlocked in the mlock_folio function could happen
even after the page is successfully isolated, so it still cannot
prevent migration. Because of this, I need to think more about how
to fix it.

Perhaps we should double-check whether the page is mlocked during
the actual migration phase.

What do you think of this best-effort approach?


Best regards,
Wandun





The full reproducer is as below:

/* gcc repro.c -o repro -lpthread */

#define _GNU_SOURCE
#include <fcntl.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>

#define PAGE_SIZE       4096
#define NR_PAGES        32
#define FILE_SIZE       (2ULL * 1024 * 1024 * 1024)

static void *worker_fn(void *arg)
{
        int fd = (long)arg;
        size_t len = (size_t)FILE_SIZE;
        char *p = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
        if (p == MAP_FAILED)
                return NULL;

        for (size_t off = 0; off + NR_PAGES * PAGE_SIZE <= len;
             off += NR_PAGES * PAGE_SIZE) {
                for (int i = 0; i < NR_PAGES; i++)
                        p[off + i * PAGE_SIZE] = 1;
                usleep(200);
        }

        munmap(p, len);
        return NULL;
}

static void *compact_fn(void *arg)
{
        (void)arg;
        int fd = open("/proc/sys/vm/compact_memory", O_WRONLY);
        if (fd < 0)
                return NULL;

        while (1) {
                if (write(fd, "1", 1) < 0) {}
                usleep(5000);
        }
}

int main(void)
{
        mlockall(MCL_CURRENT | MCL_FUTURE);

        int fd = open("./repro_largefile.dat", O_RDWR | O_CREAT, 0600);
        if (fd < 0)
                return 1;
        unlink("./repro_largefile.dat");
        if (ftruncate(fd, (off_t)FILE_SIZE) < 0)
                return 1;

        printf("repro_largefile: 1 worker, %d pages/batch, Ctrl-C to stop\n",
               NR_PAGES);

        pthread_t compact, worker;
        pthread_create(&compact, NULL, compact_fn, NULL);
        pthread_create(&worker, NULL, worker_fn, (void *)(long)fd);

        pthread_join(worker, NULL);
        return 0;
}

>> ---
>>  mm/compaction.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index b776f35ad020..7e07b792bcb5 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1116,7 +1116,8 @@ isolate_migratepages_block(struct compact_control *cc, 
>> unsigned long low_pfn,
>>              is_unevictable = folio_test_unevictable(folio);
>>  
>>              /* Compaction might skip unevictable pages but CMA takes them */
>> -            if (!(mode & ISOLATE_UNEVICTABLE) && is_unevictable)
>> +            if (!(mode & ISOLATE_UNEVICTABLE) &&
>> +                (is_unevictable || folio_test_mlocked(folio)))
>>                      goto isolate_fail_put;
>>  
>>              /*
> 


Reply via email to