Re: [PATCH][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)

Tao Liu Sat, 21 Jun 2025 17:06:39 -0700

Hi Sourabh,

On Sat, Jun 21, 2025 at 12:25 AM Sourabh Jain <sourabhj...@linux.ibm.com> wrote:
>
> Hello Tao,
>
> On 19/06/25 05:26, Tao Liu wrote:
> > A vmcore corrupt issue has been noticed in powerpc arch [1]. Although the
> > original issue was closed, but it still can be reproduced with upstream
> > makedumpfile.
> >
> > With --num-threads=N enabled, there will be N sub-threads created. All
> > sub-threads are producers which responsible for mm page processing, e.g.
> > compression. The main thread is the consumer which responsible for
> > writing the compressed data into file. page_flag_buf->ready is used to
> > sync main and sub-threads. When a sub-thread finishes page processing,
> > it will set ready flag to be FLAG_READY. In the meantime, main thread
> > looply check all threads of the ready flags, and break the loop when
> > find FLAG_READY.
> >
> > page_flag_buf->ready is read/write by main/sub-threads simultaneously,
> > but it is unprotected and unsafe. I have tested both mutex and atomic_rw
> > can fix this issue. This patch takes atomic_rw for its simplicity.
> >
> > [1]: https://github.com/makedumpfile/makedumpfile/issues/15
> >
> > Signed-off-by: Tao Liu <l...@redhat.com>
> > ---
> >   makedumpfile.c | 21 ++++++++++++++-------
> >   1 file changed, 14 insertions(+), 7 deletions(-)
> >
> > diff --git a/makedumpfile.c b/makedumpfile.c
> > index 2d3b08b..bac45c2 100644
> > --- a/makedumpfile.c
> > +++ b/makedumpfile.c
> > @@ -8621,7 +8621,8 @@ kdump_thread_function_cyclic(void *arg) {
> >
> >               while (buf_ready == FALSE) {
> >                       pthread_testcancel();
> > -                     if (page_flag_buf->ready == FLAG_READY)
> > +                     if (__atomic_load_n(&page_flag_buf->ready,
> > +                                     __ATOMIC_SEQ_CST) == FLAG_READY)
> >                               continue;
> >
> >                       /* get next dumpable pfn */
> > @@ -8637,7 +8638,8 @@ kdump_thread_function_cyclic(void *arg) {
> >                       info->current_pfn = pfn + 1;
> >
> >                       page_flag_buf->pfn = pfn;
> > -                     page_flag_buf->ready = FLAG_FILLING;
> > +                     __atomic_store_n(&page_flag_buf->ready, FLAG_FILLING,
> > +                                     __ATOMIC_SEQ_CST);
> >                       pthread_mutex_unlock(&info->current_pfn_mutex);
> >                       sem_post(&info->page_flag_buf_sem);
> >
> > @@ -8726,7 +8728,8 @@ kdump_thread_function_cyclic(void *arg) {
> >                       page_flag_buf->index = index;
> >                       buf_ready = TRUE;
> >   next:
> > -                     page_flag_buf->ready = FLAG_READY;
> > +                     __atomic_store_n(&page_flag_buf->ready, FLAG_READY,
> > +                                     __ATOMIC_SEQ_CST);
> >                       page_flag_buf = page_flag_buf->next;
> >
> >               }
> > @@ -8855,7 +8858,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data 
> > *cd_header,
> >                        * current_pfn is used for recording the value of pfn 
> > when checking the pfn.
> >                        */
> >                       for (i = 0; i < info->num_threads; i++) {
> > -                             if (info->page_flag_buf[i]->ready == 
> > FLAG_UNUSED)
> > +                             if 
> > (__atomic_load_n(&info->page_flag_buf[i]->ready,
> > +                                             __ATOMIC_SEQ_CST) == 
> > FLAG_UNUSED)
> >                                       continue;
> >                               temp_pfn = info->page_flag_buf[i]->pfn;
> >
> > @@ -8863,7 +8867,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data 
> > *cd_header,
> >                                * count how many threads have reached the 
> > end.
> >                                */
> >                               if (temp_pfn >= end_pfn) {
> > -                                     info->page_flag_buf[i]->ready = 
> > FLAG_UNUSED;
> > +                                     
> > __atomic_store_n(&info->page_flag_buf[i]->ready,
> > +                                             FLAG_UNUSED, 
> > __ATOMIC_SEQ_CST);
> >                                       end_count++;
> >                                       continue;
> >                               }
> > @@ -8885,7 +8890,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data 
> > *cd_header,
> >                        * If the page_flag_buf is not ready, the pfn 
> > recorded may be changed.
> >                        * So we should recheck.
> >                        */
> > -                     if (info->page_flag_buf[consuming]->ready != 
> > FLAG_READY) {
> > +                     if 
> > (__atomic_load_n(&info->page_flag_buf[consuming]->ready,
> > +                                     __ATOMIC_SEQ_CST) != FLAG_READY) {
> >                               clock_gettime(CLOCK_MONOTONIC, &new);
> >                               if (new.tv_sec - last.tv_sec > WAIT_TIME) {
> >                                       ERRMSG("Can't get data of pfn.\n");
> > @@ -8927,7 +8933,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data 
> > *cd_header,
> >                               goto out;
> >                       page_data_buf[index].used = FALSE;
> >               }
> > -             info->page_flag_buf[consuming]->ready = FLAG_UNUSED;
> > +             __atomic_store_n(&info->page_flag_buf[consuming]->ready,
> > +                             FLAG_UNUSED, __ATOMIC_SEQ_CST);
> >               info->page_flag_buf[consuming] = 
> > info->page_flag_buf[consuming]->next;
> >       }
> >   finish:
>
>
> I also observed this issue, and it is easily reproducible on Power
> systems with large memory configurations around 64TB.
> In my case the crash tool printed following error message when running
> to debug the corrupted vmcore.
>
> ```
> crash: compressed kdump: uncompress failed: 0
> crash: read error: kernel virtual address: c0001e2d2fe48000  type:
> "hardirq thread_union"
> crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
> crash: compressed kdump: uncompress failed: 0
> ```
>
> Should we include the above crash logs or the you observed in this
> commit message for easy reference?


In fact I didn't test the vmcore by crash. I generated 2 vmcores from
one single vmcore, and one is generated with "--num-threads=N" and the
other without. If multi-thread makedumpfile works fine, the 2 vmcores
should be exactly the same. However the cmp reports there are bytes
different. Anyway both cmp and crash give the same result,
multi-thread makedumpfile does have bugs.

I will add the crash error message into the v2 commit log for easy reference.

>
> I tested this fix on high-end Power systems with 64TB of RAM as well as
> on systems with lower memory, and everything is
> working fine in both cases.
>
> Thanks for the fix.
> Tested-by: Sourabh Jain <sourabhj...@linux.ibm.com>

Thanks a lot for your testing!

Thanks,
Tao Liu

>
> - Sourabh Jain
>

Re: [PATCH][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)

Reply via email to