Hi Petr, On Fri, Jul 4, 2025 at 2:31 AM Petr Tesarik <[email protected]> wrote: > > On Tue, 1 Jul 2025 19:59:53 +1200 > Tao Liu <[email protected]> wrote: > > > Hi Kazu, > > > > Thanks for your comments! > > > > On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <[email protected]> > > wrote: > > > > > > Hi Tao, > > > > > > thank you for the patch. > > > > > > On 2025/06/25 11:23, Tao Liu wrote: > > > > A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be > > > > reproduced with upstream makedumpfile. > > > > > > > > When analyzing the corrupt vmcore using crash, the following error > > > > message will output: > > > > > > > > crash: compressed kdump: uncompress failed: 0 > > > > crash: read error: kernel virtual address: c0001e2d2fe48000 type: > > > > "hardirq thread_union" > > > > crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000 > > > > crash: compressed kdump: uncompress failed: 0 > > > > > > > > If the vmcore is generated without num-threads option, then no such > > > > errors are noticed. > > > > > > > > With --num-threads=N enabled, there will be N sub-threads created. All > > > > sub-threads are producers which responsible for mm page processing, e.g. > > > > compression. The main thread is the consumer which responsible for > > > > writing the compressed data into file. page_flag_buf->ready is used to > > > > sync main and sub-threads. When a sub-thread finishes page processing, > > > > it will set ready flag to be FLAG_READY. In the meantime, main thread > > > > looply check all threads of the ready flags, and break the loop when > > > > find FLAG_READY. > > > > > > I've tried to reproduce the issue, but I couldn't on x86_64. > > > > Yes, I cannot reproduce it on x86_64 either, but the issue is very > > easily reproduced on ppc64 arch, which is where our QE reported. > > Yes, this is expected. X86 implements a strongly ordered memory model, > so a "store-to-memory" instruction ensures that the new value is > immediately observed by other CPUs. > > FWIW the current code is wrong even on X86, because it does nothing to > prevent compiler optimizations. The compiler is then allowed to reorder > instructions so that the write to page_flag_buf->ready happens after > other writes; with a bit of bad scheduling luck, the consumer thread > may see an inconsistent state (e.g. read a stale page_flag_buf->pfn). > Note that thanks to how compilers are designed (today), this issue is > more or less hypothetical. Nevertheless, the use of atomics fixes it, > because they also serve as memory barriers.
Thanks a lot for your detailed explanation, it's very helpful! I haven't thought of the possibility of instruction reordering and atomic_rw prevents the reorder. Thanks, Tao Liu > > Petr T >
