The following pull request was submitted through Github.
It can be accessed and reviewed at: https://github.com/lxc/lxd/pull/2512

This e-mail was sent by the LXC bot, direct replies will not reach the author
unless they happen to be subscribed to this list.

=== Description (from pull-request) ===
The problem here is that with the new action script method for waiting
until the restore is successful to finish the dump. The issue is that the
action script waits until the operation is done, but the operation deleted
the logs. Since the Migrate function is the one that actually collects the
logs, the logs got deleted before the Migrate function had a chance to
collect them.

Instead, let's make the goroutine that is calling Migrate responsible for
cleaning up the checkpoint directory, so that the logs are always
guaranteed to be collected. This means we have to be more explicit
elsewhere about deleting the logs in error cases, but we avoid the race.
In the legacy case, we can still just use the defer, since the Migrate
call returns immediately in the function itself.

Fixes a race condition seen in #2505

Signed-off-by: Tycho Andersen <tycho.ander...@canonical.com>
From 26dd07627ddfaf22429c95140f6c7fa9b38aec2b Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho.ander...@canonical.com>
Date: Mon, 17 Oct 2016 11:00:50 -0600
Subject: [PATCH] migration: fix a race for collecting logs

The problem here is that with the new action script method for waiting
until the restore is successful to finish the dump. The issue is that the
action script waits until the operation is done, but the operation deleted
the logs. Since the Migrate function is the one that actually collects the
logs, the logs got deleted before the Migrate function had a chance to
collect them.

Instead, let's make the goroutine that is calling Migrate responsible for
cleaning up the checkpoint directory, so that the logs are always
guaranteed to be collected. This means we have to be more explicit
elsewhere about deleting the logs in error cases, but we avoid the race.
In the legacy case, we can still just use the defer, since the Migrate
call returns immediately in the function itself.

Fixes a race condition seen in #2505

Signed-off-by: Tycho Andersen <tycho.ander...@canonical.com>
---
 lxd/migrate.go | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lxd/migrate.go b/lxd/migrate.go
index 2a10261..b0b3d68 100644
--- a/lxd/migrate.go
+++ b/lxd/migrate.go
@@ -390,7 +390,6 @@ func (s *migrationSourceWs) Do(migrateOp *operation) error {
                if err != nil {
                        return abort(err)
                }
-               defer os.RemoveAll(checkpointDir)
 
                if lxc.VersionAtLeast(2, 0, 4) {
                        /* What happens below is slightly convoluted. Due to 
various
@@ -412,6 +411,7 @@ func (s *migrationSourceWs) Do(migrateOp *operation) error {
                        dumpDone := make(chan bool, 1)
                        actionScriptOpSecret, err := shared.RandomCryptoString()
                        if err != nil {
+                               os.RemoveAll(checkpointDir)
                                return abort(err)
                        }
 
@@ -453,21 +453,25 @@ func (s *migrationSourceWs) Do(migrateOp *operation) 
error {
                                },
                        )
                        if err != nil {
+                               os.RemoveAll(checkpointDir)
                                return abort(err)
                        }
 
                        if err := writeActionScript(checkpointDir, 
actionScriptOp.url, actionScriptOpSecret); err != nil {
+                               os.RemoveAll(checkpointDir)
                                return abort(err)
                        }
 
                        _, err = actionScriptOp.Run()
                        if err != nil {
+                               os.RemoveAll(checkpointDir)
                                return abort(err)
                        }
 
                        migrateDone := make(chan error, 1)
                        go func() {
                                migrateDone <- 
s.container.Migrate(lxc.MIGRATE_DUMP, checkpointDir, "migration", true, true)
+                               os.RemoveAll(checkpointDir)
                        }()
 
                        select {
@@ -479,6 +483,7 @@ func (s *migrationSourceWs) Do(migrateOp *operation) error {
                                shared.LogDebugf("Dump finished, continuing 
with restore...")
                        }
                } else {
+                       defer os.RemoveAll(checkpointDir)
                        if err := s.container.Migrate(lxc.MIGRATE_DUMP, 
checkpointDir, "migration", true, false); err != nil {
                                return abort(err)
                        }
_______________________________________________
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel

Reply via email to