On Fri, Feb 02, 2024 at 04:11:27PM -0300, Fabiano Rosas wrote: > We currently have an unfavorable situation around multifd channels > creation and the migration thread execution. > > We create the multifd channels with qio_channel_socket_connect_async > -> qio_task_run_in_thread, but only connect them at the > multifd_new_send_channel_async callback, called from > qio_task_complete, which is registered as a glib event. > > So at multifd_save_setup() we create the channels, but they will only > be actually usable after the whole multifd_save_setup() calling stack > returns back to the main loop. Which means that the migration thread > is already up and running without any possibility for the multifd > channels to be ready on time. > > We currently rely on the channels-ready semaphore blocking > multifd_send_sync_main() until channels start to come up and release > it. However there have been bugs recently found when a channel's > creation fails and multifd_save_cleanup() is allowed to run while > other channels are still being created. > > Let's start to organize this situation by moving the > multifd_save_setup() call into the migration thread. That way we > unblock the main-loop to dispatch the completion callbacks and > actually have a chance of getting the multifd channels ready for when > the migration thread needs them. > > The next patches will deal with the synchronization aspects. > > Note that this takes multifd_save_setup() out of the BQL. > > Signed-off-by: Fabiano Rosas <faro...@suse.de>
Reviewed-by: Peter Xu <pet...@redhat.com> -- Peter Xu