Cédric Le Goater <c...@redhat.com> writes: > On 3/8/24 15:36, Fabiano Rosas wrote: >> Cédric Le Goater <c...@redhat.com> writes: >> >>> This prepares ground for the changes coming next which add an Error** >>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup() >>> now handle the error and fail earlier setting the migration state from >>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED. >>> >>> In qemu_savevm_state(), move the cleanup to preserve the error >>> reported by .save_setup() handlers. >>> >>> Since the previous behavior was to ignore errors at this step of >>> migration, this change should be examined closely to check that >>> cleanups are still correctly done. >>> >>> Signed-off-by: Cédric Le Goater <c...@redhat.com> >>> --- >>> >>> Changes in v4: >>> >>> - Merged cleanup change in qemu_savevm_state() >>> >>> Changes in v3: >>> >>> - Set migration state to MIGRATION_STATUS_FAILED >>> - Fixed error handling to be done under lock in bg_migration_thread() >>> - Made sure an error is always set in case of failure in >>> qemu_savevm_state_setup() >>> >>> migration/savevm.h | 2 +- >>> migration/migration.c | 27 ++++++++++++++++++++++++--- >>> migration/savevm.c | 26 +++++++++++++++----------- >>> 3 files changed, 40 insertions(+), 15 deletions(-) >>> >>> diff --git a/migration/savevm.h b/migration/savevm.h >>> index >>> 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 >>> 100644 >>> --- a/migration/savevm.h >>> +++ b/migration/savevm.h >>> @@ -32,7 +32,7 @@ >>> bool qemu_savevm_state_blocked(Error **errp); >>> void qemu_savevm_non_migratable_list(strList **reasons); >>> int qemu_savevm_state_prepare(Error **errp); >>> -void qemu_savevm_state_setup(QEMUFile *f); >>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp); >>> bool qemu_savevm_state_guest_unplug_pending(void); >>> int qemu_savevm_state_resume_prepare(MigrationState *s); >>> void qemu_savevm_state_header(QEMUFile *f); >>> diff --git a/migration/migration.c b/migration/migration.c >>> index >>> a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 >>> 100644 >>> --- a/migration/migration.c >>> +++ b/migration/migration.c >>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque) >>> int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST); >>> MigThrError thr_error; >>> bool urgent = false; >>> + Error *local_err = NULL; >>> + int ret; >>> >>> thread = migration_threads_add("live_migration", >>> qemu_get_thread_id()); >>> >>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque) >>> } >>> >>> bql_lock(); >>> - qemu_savevm_state_setup(s->to_dst_file); >>> + ret = qemu_savevm_state_setup(s->to_dst_file, &local_err); >>> bql_unlock(); >>> >>> + if (ret) { >>> + migrate_set_error(s, local_err); >>> + error_free(local_err); >>> + migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, >>> + MIGRATION_STATUS_FAILED); >>> + goto out; >>> + } >>> + >>> qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP, >>> MIGRATION_STATUS_ACTIVE); >> >> This^ should be before the new block it seems: >> >> GOOD: >> migrate_set_state new state setup >> migrate_set_state new state wait-unplug >> migrate_fd_cancel >> migrate_set_state new state cancelling >> migrate_fd_cleanup >> migrate_set_state new state cancelled >> migrate_fd_cancel >> ok 1 /x86_64/failover-virtio-net/migrate/abort/wait-unplug >> >> BAD: >> migrate_set_state new state setup >> migrate_fd_cancel >> migrate_set_state new state cancelling >> migrate_fd_cleanup >> migrate_set_state new state cancelled >> qemu-system-x86_64: ram_save_setup failed: Input/output error >> ** >> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: >> assertion failed (status == "cancelling"): ("cancelled" == "cancelling") >> >> Otherwise migration_iteration_finish() will schedule the cleanup BH and >> that will run concurrently with migrate_fd_cancel() issued by the test >> and bad things happens. > > This hack makes things work : > > @@ -3452,6 +3452,9 @@ static void *migration_thread(void *opaq > qemu_savevm_send_colo_enable(s->to_dst_file); > } > > + qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP, > + MIGRATION_STATUS_SETUP); > +
Why move it all the way up here? Has moving the wait_unplug before the 'if (ret)' block not worked for you? > bql_lock(); > ret = qemu_savevm_state_setup(s->to_dst_file, &local_err); > bql_unlock(); > > We should fix the test instead :) Unless waiting for failover devices > to unplug before the save_setup handlers and not after is ok. > > commit c7e0acd5a3f8 ("migration: add new migration state wait-unplug") > is not clear about the justification.: > > This patch adds a new migration state called wait-unplug. It is entered > after the SETUP state if failover devices are present. It will transition > into ACTIVE once all devices were succesfully unplugged from the guest. This is not clear indeed, but to me it seems having the wait-unplug after setup was important. > > >> ===== >> PS: I guess the next level in our Freestyle Concurrency video-game is to >> make migrate_fd_cancel() stop setting state and poking files and only >> set a flag that's tested in the other parts of the code. > > Is that a new item on the TODO list? Yep, I'll add it to the wiki.