The following pull request was submitted through Github. It can be accessed and reviewed at: https://github.com/lxc/lxd/pull/2512
This e-mail was sent by the LXC bot, direct replies will not reach the author unless they happen to be subscribed to this list. === Description (from pull-request) === The problem here is that with the new action script method for waiting until the restore is successful to finish the dump. The issue is that the action script waits until the operation is done, but the operation deleted the logs. Since the Migrate function is the one that actually collects the logs, the logs got deleted before the Migrate function had a chance to collect them. Instead, let's make the goroutine that is calling Migrate responsible for cleaning up the checkpoint directory, so that the logs are always guaranteed to be collected. This means we have to be more explicit elsewhere about deleting the logs in error cases, but we avoid the race. In the legacy case, we can still just use the defer, since the Migrate call returns immediately in the function itself. Fixes a race condition seen in #2505 Signed-off-by: Tycho Andersen <tycho.ander...@canonical.com>
From 26dd07627ddfaf22429c95140f6c7fa9b38aec2b Mon Sep 17 00:00:00 2001 From: Tycho Andersen <tycho.ander...@canonical.com> Date: Mon, 17 Oct 2016 11:00:50 -0600 Subject: [PATCH] migration: fix a race for collecting logs The problem here is that with the new action script method for waiting until the restore is successful to finish the dump. The issue is that the action script waits until the operation is done, but the operation deleted the logs. Since the Migrate function is the one that actually collects the logs, the logs got deleted before the Migrate function had a chance to collect them. Instead, let's make the goroutine that is calling Migrate responsible for cleaning up the checkpoint directory, so that the logs are always guaranteed to be collected. This means we have to be more explicit elsewhere about deleting the logs in error cases, but we avoid the race. In the legacy case, we can still just use the defer, since the Migrate call returns immediately in the function itself. Fixes a race condition seen in #2505 Signed-off-by: Tycho Andersen <tycho.ander...@canonical.com> --- lxd/migrate.go | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/lxd/migrate.go b/lxd/migrate.go index 2a10261..b0b3d68 100644 --- a/lxd/migrate.go +++ b/lxd/migrate.go @@ -390,7 +390,6 @@ func (s *migrationSourceWs) Do(migrateOp *operation) error { if err != nil { return abort(err) } - defer os.RemoveAll(checkpointDir) if lxc.VersionAtLeast(2, 0, 4) { /* What happens below is slightly convoluted. Due to various @@ -412,6 +411,7 @@ func (s *migrationSourceWs) Do(migrateOp *operation) error { dumpDone := make(chan bool, 1) actionScriptOpSecret, err := shared.RandomCryptoString() if err != nil { + os.RemoveAll(checkpointDir) return abort(err) } @@ -453,21 +453,25 @@ func (s *migrationSourceWs) Do(migrateOp *operation) error { }, ) if err != nil { + os.RemoveAll(checkpointDir) return abort(err) } if err := writeActionScript(checkpointDir, actionScriptOp.url, actionScriptOpSecret); err != nil { + os.RemoveAll(checkpointDir) return abort(err) } _, err = actionScriptOp.Run() if err != nil { + os.RemoveAll(checkpointDir) return abort(err) } migrateDone := make(chan error, 1) go func() { migrateDone <- s.container.Migrate(lxc.MIGRATE_DUMP, checkpointDir, "migration", true, true) + os.RemoveAll(checkpointDir) }() select { @@ -479,6 +483,7 @@ func (s *migrationSourceWs) Do(migrateOp *operation) error { shared.LogDebugf("Dump finished, continuing with restore...") } } else { + defer os.RemoveAll(checkpointDir) if err := s.container.Migrate(lxc.MIGRATE_DUMP, checkpointDir, "migration", true, false); err != nil { return abort(err) }
_______________________________________________ lxc-devel mailing list lxc-devel@lists.linuxcontainers.org http://lists.linuxcontainers.org/listinfo/lxc-devel