On Mon, Feb 05, 2024 at 04:49:29PM -0300, Fabiano Rosas wrote: > It is possible that one of the multifd channels fails to be created at > multifd_new_send_channel_async() while the rest of the channel > creation tasks are still in flight. > > This could lead to multifd_save_cleanup() executing the > qemu_thread_join() loop too early and not waiting for the threads > which haven't been created yet, leading to the freeing of resources > that the newly created threads will try to access and crash. > > Add a synchronization point after which there will be no attempts at > thread creation and therefore calling multifd_save_cleanup() past that > point will ensure it properly waits for the threads. > > A note about performance: Prior to this patch, if a channel took too > long to be established, other channels could finish connecting first > and already start taking load. Now we're bounded by the > slowest-connecting channel. > > Reported-by: Avihai Horon <avih...@nvidia.com> > Signed-off-by: Fabiano Rosas <faro...@suse.de>
[...] > @@ -934,7 +936,6 @@ static void multifd_new_send_channel_async(QIOTask *task, > gpointer opaque) > MultiFDSendParams *p = opaque; > QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task)); > Error *local_err = NULL; > - This line removal should belong to the previous patch. I can touch that up. > bool ret; > > trace_multifd_new_send_channel_async(p->id); Reviewed-by: Peter Xu <pet...@redhat.com> -- Peter Xu