On 1/12/26 13:44, Miaohe Lin wrote:
On 2026/1/12 19:33, Miaohe Lin wrote:
On 2026/1/12 17:40, David Hildenbrand (Red Hat) wrote:
On 1/12/26 10:19, Miaohe Lin wrote:
On 2026/1/9 21:45, David Hildenbrand (Red Hat) wrote:
On 1/7/26 10:37, Miaohe Lin wrote:
Introduce selftests to validate the functionality of memory failure.
These tests help ensure that memory failure handling for anonymous
pages, pagecaches pages works correctly, including proper SIGBUS
delivery to user processes, page isolation, and recovery paths.

Currently madvise syscall is used to inject memory failures. And only
anonymous pages and pagecaches are tested. More test scenarios, e.g.
hugetlb, shmem, thp, will be added. Also more memory failure injecting
methods will be supported, e.g. APEI Error INJection, if required.


Thanks for test and report. :)

0day reports that these tests fail:

# # ------------------------
# # running ./memory-failure
# # ------------------------
# # TAP version 13
# # 1..6
# # # Starting 6 tests from 2 test cases.
# # #  RUN           memory_failure.madv_hard.anon ...
# # #            OK  memory_failure.madv_hard.anon
# # ok 1 memory_failure.madv_hard.anon
# # #  RUN           memory_failure.madv_hard.clean_pagecache ...
# # # memory-failure.c:166:clean_pagecache:Expected setjmp (1) == 0 (0)
# # # clean_pagecache: Test terminated by assertion
# # #          FAIL  memory_failure.madv_hard.clean_pagecache
# # not ok 2 memory_failure.madv_hard.clean_pagecache
# # #  RUN           memory_failure.madv_hard.dirty_pagecache ...
# # # memory-failure.c:207:dirty_pagecache:Expected unpoison_memory(self->pfn) 
(-16) == 0 (0)
# # # dirty_pagecache: Test terminated by assertion
# # #          FAIL  memory_failure.madv_hard.dirty_pagecache
# # not ok 3 memory_failure.madv_hard.dirty_pagecache
# # #  RUN           memory_failure.madv_soft.anon ...
# # #            OK  memory_failure.madv_soft.anon
# # ok 4 memory_failure.madv_soft.anon
# # #  RUN           memory_failure.madv_soft.clean_pagecache ...
# # # memory-failure.c:282:clean_pagecache:Expected variant->inject(self, addr) 
(-1) == 0 (0)
# # # clean_pagecache: Test terminated by assertion
# # #          FAIL  memory_failure.madv_soft.clean_pagecache
# # not ok 5 memory_failure.madv_soft.clean_pagecache
# # #  RUN           memory_failure.madv_soft.dirty_pagecache ...
# # # memory-failure.c:319:dirty_pagecache:Expected variant->inject(self, addr) 
(-1) == 0 (0)
# # # dirty_pagecache: Test terminated by assertion
# # #          FAIL  memory_failure.madv_soft.dirty_pagecache
# # not ok 6 memory_failure.madv_soft.dirty_pagecache
# # # FAILED: 2 / 6 tests passed.
# # # Totals: pass:2 fail:4 xfail:0 xpass:0 skip:0 error:0
# # [FAIL]
# not ok 71 memory-failure # exit=1


Can the test maybe not deal with running in certain environments (config 
options etc)?

To run the test, I think there should be:
    1.CONFIG_MEMORY_FAILURE and CONFIG_HWPOISON_INJECT should be enabled.
    2.Root privilege is required.
    3.For dirty/clean pagecache testcases, the test file 
"./clean-page-cache-test-file" and
      "./dirty-page-cache-test-file" are assumed to be created on non-memory 
file systems
      such as xfs, ext4, etc.

Does your test environment break any of the above rules?

It is 0day environment, so very likely yes. I suspect 1).

Hi David,

After taking a more close look, I think CONFIG_MEMORY_FAILURE and 
CONFIG_HWPOISON_INJECT should have been
enabled in 0day environment or testcase memory_failure.madv_hard.anon should 
fail. memory_failure.madv_hard.anon
will inject memory failure and expects seeing a SIGBUG signal.

Good point.



Am I expected to add some code to
guard against this?

Yes, at least some.

Checking for root privileges is not required. The tests are commonly run from 
non-memory file systems, but, in theory, could be run from nfs etc.

If you require special file systems, take a look at gup_longterm.o where we 
test for some fileystsem types.

And I think the cause of failures of testcases 
memory_failure.madv_hard.clean_pagecache and 
memory_failure.madv_hard.dirty_pagecache
is they running on memory filesystems. The error pages are kept in page cache 
in that case while memory_failure.madv_hard.clean_pagecache
expects to see the error page truncated.

Maybe they are run on shmem? Good question. (@Phil?)


But I have no idea why memory_failure.madv_soft.dirty_pagecache and 
memory_failure.madv_soft.clean_pagecache return -1(-EPERM?) when try
to inject memory error through madvise syscall. It could be really helpful if 
more information can be provided.

Here is more information:

https://download.01.org/0day-ci/archive/20260110/[email protected]

Unfortunately no config yet. (@Phil, could we provide that one as well as part of that bundle?)

--
Cheers

David

Reply via email to