On Mon, Feb 05, 2024 at 04:49:29PM -0300, Fabiano Rosas wrote:
> It is possible that one of the multifd channels fails to be created at
> multifd_new_send_channel_async() while the rest of the channel
> creation tasks are still in flight.
> 
> This could lead to multifd_save_cleanup() executing the
> qemu_thread_join() loop too early and not waiting for the threads
> which haven't been created yet, leading to the freeing of resources
> that the newly created threads will try to access and crash.
> 
> Add a synchronization point after which there will be no attempts at
> thread creation and therefore calling multifd_save_cleanup() past that
> point will ensure it properly waits for the threads.
> 
> A note about performance: Prior to this patch, if a channel took too
> long to be established, other channels could finish connecting first
> and already start taking load. Now we're bounded by the
> slowest-connecting channel.
> 
> Reported-by: Avihai Horon <avih...@nvidia.com>
> Signed-off-by: Fabiano Rosas <faro...@suse.de>

[...]

> @@ -934,7 +936,6 @@ static void multifd_new_send_channel_async(QIOTask *task, 
> gpointer opaque)
>      MultiFDSendParams *p = opaque;
>      QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
>      Error *local_err = NULL;
> -

This line removal should belong to the previous patch.  I can touch that
up.

>      bool ret;
>  
>      trace_multifd_new_send_channel_async(p->id);

Reviewed-by: Peter Xu <pet...@redhat.com>

-- 
Peter Xu


Reply via email to