Cédric Le Goater <c...@redhat.com> writes:

> On 3/8/24 15:36, Fabiano Rosas wrote:
>> Cédric Le Goater <c...@redhat.com> writes:
>> 
>>> This prepares ground for the changes coming next which add an Error**
>>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
>>> now handle the error and fail earlier setting the migration state from
>>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>>>
>>> In qemu_savevm_state(), move the cleanup to preserve the error
>>> reported by .save_setup() handlers.
>>>
>>> Since the previous behavior was to ignore errors at this step of
>>> migration, this change should be examined closely to check that
>>> cleanups are still correctly done.
>>>
>>> Signed-off-by: Cédric Le Goater <c...@redhat.com>
>>> ---
>>>
>>>   Changes in v4:
>>>   
>>>   - Merged cleanup change in qemu_savevm_state()
>>>     
>>>   Changes in v3:
>>>   
>>>   - Set migration state to MIGRATION_STATUS_FAILED
>>>   - Fixed error handling to be done under lock in bg_migration_thread()
>>>   - Made sure an error is always set in case of failure in
>>>     qemu_savevm_state_setup()
>>>     
>>>   migration/savevm.h    |  2 +-
>>>   migration/migration.c | 27 ++++++++++++++++++++++++---
>>>   migration/savevm.c    | 26 +++++++++++++++-----------
>>>   3 files changed, 40 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/migration/savevm.h b/migration/savevm.h
>>> index 
>>> 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328
>>>  100644
>>> --- a/migration/savevm.h
>>> +++ b/migration/savevm.h
>>> @@ -32,7 +32,7 @@
>>>   bool qemu_savevm_state_blocked(Error **errp);
>>>   void qemu_savevm_non_migratable_list(strList **reasons);
>>>   int qemu_savevm_state_prepare(Error **errp);
>>> -void qemu_savevm_state_setup(QEMUFile *f);
>>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>>>   bool qemu_savevm_state_guest_unplug_pending(void);
>>>   int qemu_savevm_state_resume_prepare(MigrationState *s);
>>>   void qemu_savevm_state_header(QEMUFile *f);
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 
>>> a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581
>>>  100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>>>       int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>>       MigThrError thr_error;
>>>       bool urgent = false;
>>> +    Error *local_err = NULL;
>>> +    int ret;
>>>   
>>>       thread = migration_threads_add("live_migration", 
>>> qemu_get_thread_id());
>>>   
>>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>>>       }
>>>   
>>>       bql_lock();
>>> -    qemu_savevm_state_setup(s->to_dst_file);
>>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>>       bql_unlock();
>>>   
>>> +    if (ret) {
>>> +        migrate_set_error(s, local_err);
>>> +        error_free(local_err);
>>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>> +                          MIGRATION_STATUS_FAILED);
>>> +        goto out;
>>> +     }
>>> +
>>>       qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>>                                  MIGRATION_STATUS_ACTIVE);
>> 
>> This^ should be before the new block it seems:
>> 
>> GOOD:
>> migrate_set_state new state setup
>> migrate_set_state new state wait-unplug
>> migrate_fd_cancel
>> migrate_set_state new state cancelling
>> migrate_fd_cleanup
>> migrate_set_state new state cancelled
>> migrate_fd_cancel
>> ok 1 /x86_64/failover-virtio-net/migrate/abort/wait-unplug
>> 
>> BAD:
>> migrate_set_state new state setup
>> migrate_fd_cancel
>> migrate_set_state new state cancelling
>> migrate_fd_cleanup
>> migrate_set_state new state cancelled
>> qemu-system-x86_64: ram_save_setup failed: Input/output error
>> **
>> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug:
>> assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
>> 
>> Otherwise migration_iteration_finish() will schedule the cleanup BH and
>> that will run concurrently with migrate_fd_cancel() issued by the test
>> and bad things happens.
>
> This hack makes things work :
>
> @@ -3452,6 +3452,9 @@ static void *migration_thread(void *opaq
>           qemu_savevm_send_colo_enable(s->to_dst_file);
>       }
>   
> +    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
> +                            MIGRATION_STATUS_SETUP);
> +

Why move it all the way up here? Has moving the wait_unplug before the
'if (ret)' block not worked for you?

>       bql_lock();
>       ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>       bql_unlock();
>
> We should fix the test instead :) Unless waiting for failover devices
> to unplug before the save_setup handlers and not after is ok.
>
> commit c7e0acd5a3f8 ("migration: add new migration state wait-unplug")
> is not clear about the justification.:
>
>      This patch adds a new migration state called wait-unplug.  It is entered
>      after the SETUP state if failover devices are present. It will transition
>      into ACTIVE once all devices were succesfully unplugged from the guest.

This is not clear indeed, but to me it seems having the wait-unplug
after setup was important.

>
>
>> =====
>> PS: I guess the next level in our Freestyle Concurrency video-game is to
>> make migrate_fd_cancel() stop setting state and poking files and only
>> set a flag that's tested in the other parts of the code.
>
> Is that a new item on the TODO list?

Yep, I'll add it to the wiki.


Reply via email to