Hi Kazu, On Wed, Jul 2, 2025 at 4:52 PM HAGIO KAZUHITO(萩尾 一仁) <[email protected]> wrote: > > Hi Tao, > > On 2025/07/02 13:36, Tao Liu wrote: > > Hi Kazu, > > > > On Wed, Jul 2, 2025 at 12:13 PM HAGIO KAZUHITO(萩尾 一仁) > > <[email protected]> wrote: > >> > >> On 2025/07/01 16:59, Tao Liu wrote: > >>> Hi Kazu, > >>> > >>> Thanks for your comments! > >>> > >>> On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <[email protected]> > >>> wrote: > >>>> > >>>> Hi Tao, > >>>> > >>>> thank you for the patch. > >>>> > >>>> On 2025/06/25 11:23, Tao Liu wrote: > >>>>> A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be > >>>>> reproduced with upstream makedumpfile. > >>>>> > >>>>> When analyzing the corrupt vmcore using crash, the following error > >>>>> message will output: > >>>>> > >>>>> crash: compressed kdump: uncompress failed: 0 > >>>>> crash: read error: kernel virtual address: c0001e2d2fe48000 > >>>>> type: > >>>>> "hardirq thread_union" > >>>>> crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000 > >>>>> crash: compressed kdump: uncompress failed: 0 > >>>>> > >>>>> If the vmcore is generated without num-threads option, then no such > >>>>> errors are noticed. > >>>>> > >>>>> With --num-threads=N enabled, there will be N sub-threads created. All > >>>>> sub-threads are producers which responsible for mm page processing, e.g. > >>>>> compression. The main thread is the consumer which responsible for > >>>>> writing the compressed data into file. page_flag_buf->ready is used to > >>>>> sync main and sub-threads. When a sub-thread finishes page processing, > >>>>> it will set ready flag to be FLAG_READY. In the meantime, main thread > >>>>> looply check all threads of the ready flags, and break the loop when > >>>>> find FLAG_READY. > >>>> > >>>> I've tried to reproduce the issue, but I couldn't on x86_64. > >>> > >>> Yes, I cannot reproduce it on x86_64 either, but the issue is very > >>> easily reproduced on ppc64 arch, which is where our QE reported. > >>> Recently we have enabled --num-threads=N in rhel by default. N == > >>> nr_cpus in 2nd kernel, so QE noticed the issue. > >> > >> I see, thank you for the information. > >> > >>> > >>>> > >>>> Do you have any possible scenario that breaks a vmcore? I could not > >>>> think of it only by looking at the code. > >>> > >>> I guess the issue only been observed on ppc might be due to ppc's > >>> memory model, multi-thread scheduling algorithm etc. I'm not an expert > >>> on those. So I cannot give a clear explanation, sorry... > >> > >> ok, I also don't think of how to debug this well.. > >> > >>> > >>> The page_flag_buf->ready is an integer that r/w by main and sub > >>> threads simultaneously. And the assignment operation, like > >>> page_flag_buf->ready = 1, might be composed of several assembly > >>> instructions. Without atomic r/w (memory) protection, there might be > >>> racing r/w just within the few instructions, which caused the data > >>> inconsistency. Frankly the ppc assembly consists of more instructions > >>> than x86_64 for the same c code, which enlarged the possibility of > >>> data racing. > >>> > >>> We can observe the issue without the help of crash, just compare the > >>> binary output of vmcore generated from the same core file, and > >>> compress it with or without --num-threads option. Then compare it with > >>> "cmp vmcore1 vmcore2" cmdline, and cmp will output bytes differ for > >>> the 2 vmcores, and this is unexpected. > >>> > >>>> > >>>> and this is just out of curiosity, is the issue reproduced with > >>>> makedumpfile compiled with -O0 too? > >>> > >>> Sorry, I haven't done the -O0 experiment, I can do it tomorrow and > >>> share my findings... > >> > >> Thanks, we have to fix this anyway, I want a clue to think about a > >> possible scenario.. > > > > 1) Compiled with -O2 flag: > > > > [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore > > /tmp/out1 > > Copying data : [100.0 %] / > > eta: 0s > > > > The dumpfile is saved to /tmp/out1. > > > > makedumpfile Completed. > > [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d > > 31 -l ~/vmcore /tmp/out2 > > Copying data : [100.0 %] | > > eta: 0s > > Copying data : [100.0 %] \ > > eta: 0s > > > > The dumpfile is saved to /tmp/out2. > > > > makedumpfile Completed. > > [root@ibm-p10-01-lp45 makedumpfile]# cd /tmp > > [root@ibm-p10-01-lp45 tmp]# cmp out1 out2 > > out1 out2 differ: byte 20786414, line 108064 > > > > 2) Compiled with -O0 flag: > > > > [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore > > /tmp/out3 > > Copying data : [100.0 %] / > > eta: 0s > > > > The dumpfile is saved to /tmp/out3. > > > > makedumpfile Completed. > > [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d > > 31 -l ~/vmcore /tmp/out4 > > Copying data : [100.0 %] | > > eta: 0s > > Copying data : [100.0 %] \ > > eta: 0s > > > > The dumpfile is saved to /tmp/out4. > > > > makedumpfile Completed. > > [root@ibm-p10-01-lp45 makedumpfile]# cd /tmp > > [root@ibm-p10-01-lp45 tmp]# cmp out3 out4 > > out3 out4 differ: byte 23948282, line 151739 > > > > Looks to me the O0/O2 have no difference for this case. If no problem, > > the /tmp/outX generated from both single/multi thread should be > > exactly the same, however the cmp reports there are differences. With > > the v2 patch applied, there is no such difference: > > > > [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore > > /tmp/out5 > > Copying data : [100.0 %] / > > eta: 0s > > > > The dumpfile is saved to /tmp/out5. > > > > makedumpfile Completed. > > [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d > > 31 -l ~/vmcore /tmp/out6 > > Copying data : [100.0 %] | > > eta: 0s > > Copying data : [100.0 %] \ > > eta: 0s > > > > The dumpfile is saved to /tmp/out6. > > > > makedumpfile Completed. > > [root@ibm-p10-01-lp45 makedumpfile]# cmp /tmp/out5 /tmp/out6 > > [root@ibm-p10-01-lp45 makedumpfile]# > > thank you for testing! sorry one more thing, > does --num-threads=1 break the vmcore?
Yes: [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out7 Copying data : [100.0 %] / eta: 0s The dumpfile is saved to /tmp/out7. makedumpfile Completed. [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=1 -d 31 -l ~/vmcore /tmp/out8 Copying data : [100.0 %] - eta: 0s Copying data : [100.0 %] / eta: 0s The dumpfile is saved to /tmp/out8. makedumpfile Completed. [root@ibm-p10-01-lp45 makedumpfile]# cmp /tmp/out7 /tmp/out8 /tmp/out7 /tmp/out8 differ: byte 11119019, line 49418 > > Thanks, > Kazu
