The following pull request was submitted through Github.
It can be accessed and reviewed at: https://github.com/lxc/lxd/pull/2076

This e-mail was sent by the LXC bot, direct replies will not reach the author
unless they happen to be subscribed to this list.

=== Description (from pull-request) ===
Consider,

t=2016-05-31T20:49:07+0000 lvl=dbug msg=MigrationSink driver=storage/dir live=false container=nonlive objects=[nonlive/snap0]
t=2016-05-31T20:49:07+0000 lvl=dbug msg="sending write barrier"
t=2016-05-31T20:49:07+0000 lvl=dbug msg="Got message barrier, resetting stream"
t=2016-05-31T20:49:07+0000 lvl=dbug msg="sending write barrier"
t=2016-05-31T20:49:07+0000 lvl=dbug msg="Got message barrier, resetting stream"
panic: concurrent write to websocket connection

goroutine 560 [running]:
panic(0x9b9da0, 0xc42012a000)
	/lxd/build/tmp.xTgWRy05x1/go/golang/src/runtime/panic.go:500 +0x1a1
github.com/gorilla/websocket.(*Conn).flushFrame(0xc42000bb00, 0xc42034e601, 0x0, 0x0, 0x0, 0xc42034e778, 0xc42046c001)
	/lxd/build/tmp.xTgWRy05x1/go/src/github.com/gorilla/websocket/conn.go:481 +0x414
github.com/gorilla/websocket.(*Conn).NextWriter(0xc42000bb00, 0x2, 0xc42034e778, 0x1044e01, 0xc42028aa20, 0x0)
	/lxd/build/tmp.xTgWRy05x1/go/src/github.com/gorilla/websocket/conn.go:407 +0x1a1
github.com/lxc/lxd/shared.WebsocketMirror.func2(0xc420147180, 0xc42000bb00, 0x10482e0, 0xc42028aa20)
	/lxd/build/tmp.xTgWRy05x1/go/src/github.com/lxc/lxd/shared/network.go:257 +0x125
created by github.com/lxc/lxd/shared.WebsocketMirror
	/lxd/build/tmp.xTgWRy05x1/go/src/github.com/lxc/lxd/shared/network.go:274 +0xe2
error: Get https://127.0.0.1:22441/1.0/operations/eb592bbb-aa8c-4a0f-a57f-cac9285fedba/wait: Unable to connect to: 127.0.0.1:22441

One way this could happen is that the goroute that is sending the snapshot
signals readDone, and then the storage driver starts sending the next bits over
the websocket before the message barrier is actually sent. This seems really
unlikley to happen since the goroutine sending the message needs to be starved
for a long time, but given that we've only seen this stack trace once in the
last few months it seems quite rare.

There may be other ways for this to occur, but this is the only one I can see
right now.

Signed-off-by: Tycho Andersen <tycho.ander...@canonical.com>
From 38c7a38f12e0f2928d7bcb74a20410d09d23cd67 Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho.ander...@canonical.com>
Date: Wed, 1 Jun 2016 21:26:02 +0000
Subject: [PATCH] probably fix a concurrent websocket write

Consider,

t=2016-05-31T20:49:07+0000 lvl=dbug msg=MigrationSink driver=storage/dir 
live=false container=nonlive objects=[nonlive/snap0]
t=2016-05-31T20:49:07+0000 lvl=dbug msg="sending write barrier"
t=2016-05-31T20:49:07+0000 lvl=dbug msg="Got message barrier, resetting stream"
t=2016-05-31T20:49:07+0000 lvl=dbug msg="sending write barrier"
t=2016-05-31T20:49:07+0000 lvl=dbug msg="Got message barrier, resetting stream"
panic: concurrent write to websocket connection

goroutine 560 [running]:
panic(0x9b9da0, 0xc42012a000)
        /lxd/build/tmp.xTgWRy05x1/go/golang/src/runtime/panic.go:500 +0x1a1
github.com/gorilla/websocket.(*Conn).flushFrame(0xc42000bb00, 0xc42034e601, 
0x0, 0x0, 0x0, 0xc42034e778, 0xc42046c001)
        
/lxd/build/tmp.xTgWRy05x1/go/src/github.com/gorilla/websocket/conn.go:481 +0x414
github.com/gorilla/websocket.(*Conn).NextWriter(0xc42000bb00, 0x2, 
0xc42034e778, 0x1044e01, 0xc42028aa20, 0x0)
        
/lxd/build/tmp.xTgWRy05x1/go/src/github.com/gorilla/websocket/conn.go:407 +0x1a1
github.com/lxc/lxd/shared.WebsocketMirror.func2(0xc420147180, 0xc42000bb00, 
0x10482e0, 0xc42028aa20)
        
/lxd/build/tmp.xTgWRy05x1/go/src/github.com/lxc/lxd/shared/network.go:257 +0x125
created by github.com/lxc/lxd/shared.WebsocketMirror
        
/lxd/build/tmp.xTgWRy05x1/go/src/github.com/lxc/lxd/shared/network.go:274 +0xe2
error: Get 
https://127.0.0.1:22441/1.0/operations/eb592bbb-aa8c-4a0f-a57f-cac9285fedba/wait:
 Unable to connect to: 127.0.0.1:22441

One way this could happen is that the goroute that is sending the snapshot
signals readDone, and then the storage driver starts sending the next bits over
the websocket before the message barrier is actually sent. This seems really
unlikley to happen since the goroutine sending the message needs to be starved
for a long time, but given that we've only seen this stack trace once in the
last few months it seems quite rare.

There may be other ways for this to occur, but this is the only one I can see
right now.

Signed-off-by: Tycho Andersen <tycho.ander...@canonical.com>
---
 shared/network.go | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/shared/network.go b/shared/network.go
index cfd2017..9ad35e4 100644
--- a/shared/network.go
+++ b/shared/network.go
@@ -275,10 +275,10 @@ func WebsocketMirror(conn *websocket.Conn, w 
io.WriteCloser, r io.ReadCloser) (c
                for {
                        buf, ok := <-in
                        if !ok {
-                               readDone <- true
                                r.Close()
                                Debugf("sending write barrier")
                                conn.WriteMessage(websocket.TextMessage, 
[]byte{})
+                               readDone <- true
                                return
                        }
                        w, err := conn.NextWriter(websocket.BinaryMessage)
_______________________________________________
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel

Reply via email to