On 3/8/24 14:39, Cédric Le Goater wrote:
On 3/8/24 14:14, Cédric Le Goater wrote:
On 3/8/24 13:56, Peter Xu wrote:
On Wed, Mar 06, 2024 at 02:34:25PM +0100, Cédric Le Goater wrote:
This prepares ground for the changes coming next which add an Error**
argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
now handle the error and fail earlier setting the migration state from
MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.

In qemu_savevm_state(), move the cleanup to preserve the error
reported by .save_setup() handlers.

Since the previous behavior was to ignore errors at this step of
migration, this change should be examined closely to check that
cleanups are still correctly done.

Signed-off-by: Cédric Le Goater <c...@redhat.com>
---

  Changes in v4:
  - Merged cleanup change in qemu_savevm_state()
  Changes in v3:
  - Set migration state to MIGRATION_STATUS_FAILED
  - Fixed error handling to be done under lock in bg_migration_thread()
  - Made sure an error is always set in case of failure in
    qemu_savevm_state_setup()
  migration/savevm.h    |  2 +-
  migration/migration.c | 27 ++++++++++++++++++++++++---
  migration/savevm.c    | 26 +++++++++++++++-----------
  3 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/migration/savevm.h b/migration/savevm.h
index 
74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328
 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -32,7 +32,7 @@
  bool qemu_savevm_state_blocked(Error **errp);
  void qemu_savevm_non_migratable_list(strList **reasons);
  int qemu_savevm_state_prepare(Error **errp);
-void qemu_savevm_state_setup(QEMUFile *f);
+int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
  bool qemu_savevm_state_guest_unplug_pending(void);
  int qemu_savevm_state_resume_prepare(MigrationState *s);
  void qemu_savevm_state_header(QEMUFile *f);
diff --git a/migration/migration.c b/migration/migration.c
index 
a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581
 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
      MigThrError thr_error;
      bool urgent = false;
+    Error *local_err = NULL;
+    int ret;
      thread = migration_threads_add("live_migration", qemu_get_thread_id());
@@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
      }
      bql_lock();
-    qemu_savevm_state_setup(s->to_dst_file);
+    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
      bql_unlock();
+    if (ret) {
+        migrate_set_error(s, local_err);
+        error_free(local_err);
+        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+                          MIGRATION_STATUS_FAILED);
+        goto out;
+     }

There's a small indent issue, I can fix it.

checkpatch did report anything.


The bigger problem is I _think_ this will trigger a ci failure in the
virtio-net-failover test:

▶ 121/464 ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed 
(status == "cancelling"): ("cancelled" == "cancelling") ERROR
121/464 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover    ERROR     
       4.77s   killed by signal 6 SIGABRT
PYTHON=/builds/peterx/qemu/build/pyvenv/bin/python3.8 
G_TEST_DBUS_DAEMON=/builds/peterx/qemu/tests/dbus-vmstate-daemon.sh 
MALLOC_PERTURB_=161 QTEST_QEMU_IMG=./qemu-img 
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
QTEST_QEMU_BINARY=./qemu-system-x86_64 
/builds/peterx/qemu/build/tests/qtest/virtio-net-failover --tap -k
――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
stderr:
qemu-system-x86_64: ram_save_setup failed: Input/output error
**
ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == 
"cancelling"): ("cancelled" == "cancelling")
(test program exited with status code -6)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

I am not familiar enough with the failover code, and may not have time
today to follow this up, copy Laurent.  Cedric, if you have time, please
have a look.


Sure. Weird because I usually run make check on x86_64, s390x, ppc64 and
aarch64. Let me check again.

I see one timeout error on s390x but not always. See below. It occurs with
or without this patchset. the other x86_64, ppc64 arches run fine (a part
from one io  test failing from time to time)

Ah ! I got this once on aarch64 :

 161/486 
ERROR:../tests/qtest/virtio-net-failover.c:1222:test_migrate_abort_wait_unplug: 
'device' should not be NULL ERROR
161/486 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover              
    ERROR            5.98s   killed by signal 6 SIGABRT
G_TEST_DBUS_DAEMON=/home/legoater/work/qemu/qemu.git/tests/dbus-vmstate-daemon.sh
 MALLOC_PERTURB_=119 QTEST_QEMU_BINARY=./qemu-system-x86_64 
QTEST_QEMU_IMG=./qemu-img 
PYTHON=/home/legoater/work/qemu/qemu.git/build/pyvenv/bin/python3 
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
/home/legoater/work/qemu/qemu.git/build/tests/qtest/virtio-net-failover --tap -k
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀  
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
stderr:
qemu-system-x86_64: ram_save_setup failed: Input/output error
**
ERROR:../tests/qtest/virtio-net-failover.c:1222:test_migrate_abort_wait_unplug: 
'device' should not be NULL

(test program exited with status code -6)
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

I couldn't reproduce yet :/

Thanks,

C.





Reply via email to