Hello Kazu,

On 02/07/25 10:22, HAGIO KAZUHITO(萩尾 一仁) wrote:
Hi Tao,

On 2025/07/02 13:36, Tao Liu wrote:
Hi Kazu,

On Wed, Jul 2, 2025 at 12:13 PM HAGIO KAZUHITO(萩尾 一仁)
<k-hagio...@nec.com> wrote:
On 2025/07/01 16:59, Tao Liu wrote:
Hi Kazu,

Thanks for your comments!

On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio...@nec.com> wrote:
Hi Tao,

thank you for the patch.

On 2025/06/25 11:23, Tao Liu wrote:
A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
reproduced with upstream makedumpfile.

When analyzing the corrupt vmcore using crash, the following error
message will output:

        crash: compressed kdump: uncompress failed: 0
        crash: read error: kernel virtual address: c0001e2d2fe48000  type:
        "hardirq thread_union"
        crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
        crash: compressed kdump: uncompress failed: 0

If the vmcore is generated without num-threads option, then no such
errors are noticed.

With --num-threads=N enabled, there will be N sub-threads created. All
sub-threads are producers which responsible for mm page processing, e.g.
compression. The main thread is the consumer which responsible for
writing the compressed data into file. page_flag_buf->ready is used to
sync main and sub-threads. When a sub-thread finishes page processing,
it will set ready flag to be FLAG_READY. In the meantime, main thread
looply check all threads of the ready flags, and break the loop when
find FLAG_READY.
I've tried to reproduce the issue, but I couldn't on x86_64.
Yes, I cannot reproduce it on x86_64 either, but the issue is very
easily reproduced on ppc64 arch, which is where our QE reported.
Recently we have enabled --num-threads=N in rhel by default. N ==
nr_cpus in 2nd kernel, so QE noticed the issue.
I see, thank you for the information.

Do you have any possible scenario that breaks a vmcore?  I could not
think of it only by looking at the code.
I guess the issue only been observed on ppc might be due to ppc's
memory model, multi-thread scheduling algorithm etc. I'm not an expert
on those. So I cannot give a clear explanation, sorry...
ok, I also don't think of how to debug this well..

The page_flag_buf->ready is an integer that r/w by main and sub
threads simultaneously. And the assignment operation, like
page_flag_buf->ready = 1, might be composed of several assembly
instructions. Without atomic r/w (memory) protection, there might be
racing r/w just within the few instructions, which caused the data
inconsistency. Frankly the ppc assembly consists of more instructions
than x86_64 for the same c code, which enlarged the possibility of
data racing.

We can observe the issue without the help of crash, just compare the
binary output of vmcore generated from the same core file, and
compress it with or without --num-threads option. Then compare it with
"cmp vmcore1 vmcore2" cmdline, and cmp will output bytes differ for
the 2 vmcores, and this is unexpected.

and this is just out of curiosity, is the issue reproduced with
makedumpfile compiled with -O0 too?
Sorry, I haven't done the -O0 experiment, I can do it tomorrow and
share my findings...
Thanks, we have to fix this anyway, I want a clue to think about a
possible scenario..
1) Compiled with -O2 flag:

[root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out1
Copying data                                      : [100.0 %] /
     eta: 0s

The dumpfile is saved to /tmp/out1.

makedumpfile Completed.
[root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
31 -l ~/vmcore /tmp/out2
Copying data                                      : [100.0 %] |
     eta: 0s
Copying data                                      : [100.0 %] \
     eta: 0s

The dumpfile is saved to /tmp/out2.

makedumpfile Completed.
[root@ibm-p10-01-lp45 makedumpfile]# cd /tmp
[root@ibm-p10-01-lp45 tmp]# cmp out1 out2
out1 out2 differ: byte 20786414, line 108064

2) Compiled with -O0 flag:

[root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out3
Copying data                                      : [100.0 %] /
     eta: 0s

The dumpfile is saved to /tmp/out3.

makedumpfile Completed.
[root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
31 -l ~/vmcore /tmp/out4
Copying data                                      : [100.0 %] |
     eta: 0s
Copying data                                      : [100.0 %] \
     eta: 0s

The dumpfile is saved to /tmp/out4.

makedumpfile Completed.
[root@ibm-p10-01-lp45 makedumpfile]# cd /tmp
[root@ibm-p10-01-lp45 tmp]# cmp out3 out4
out3 out4 differ: byte 23948282, line 151739

Looks to me the O0/O2 have no difference for this case. If no problem,
the /tmp/outX generated from both single/multi thread should be
exactly the same, however the cmp reports there are differences. With
the v2 patch applied, there is no such difference:

[root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out5
Copying data                                      : [100.0 %] /
     eta: 0s

The dumpfile is saved to /tmp/out5.

makedumpfile Completed.
[root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
31 -l ~/vmcore /tmp/out6
Copying data                                      : [100.0 %] |
     eta: 0s
Copying data                                      : [100.0 %] \
     eta: 0s

The dumpfile is saved to /tmp/out6.

makedumpfile Completed.
[root@ibm-p10-01-lp45 makedumpfile]# cmp /tmp/out5 /tmp/out6
[root@ibm-p10-01-lp45 makedumpfile]#
thank you for testing!  sorry one more thing,
does --num-threads=1 break the vmcore?

I was able to reproduce this issue with --num-threads=1. The reason is that when --num-threads is specified, makedumpfile uses one producer and one consumer thread. So even with --num-threads=1, multithreading
is still in effect.

Thanks,
Sourabh Jain

Reply via email to