fishy commented on a change in pull request #2497:
URL: https://github.com/apache/thrift/pull/2497#discussion_r791929108
##########
File path: lib/go/README.md
##########
@@ -132,3 +132,17 @@ if this interval is set to a value too low (for example,
1ms), it might cause
excessive cpu overhead.
This feature is also only enabled on non-oneway endpoints.
+
+A note about server stop implementations
+===========================================
+
+The server stop will wait for all client connections to be closed after the
last received or to be received request to be hanled,
Review comment:
I would prefer to `s/The server stop/TSimpleServer.Stop/`, with
optionally link to the go doc page
(https://pkg.go.dev/github.com/apache/thrift/lib/go/thrift#TSimpleServer.Stop)
##########
File path: lib/go/thrift/simple_server.go
##########
@@ -48,15 +49,26 @@ var ErrAbandonRequest = errors.New("request abandoned")
// If it's changed to <=0, the feature will be disabled.
var ServerConnectivityCheckInterval = time.Millisecond * 5
+// ServerCloseTimeout defines max close wait duration used by
+// server stop to avoid hanging too long to wait for all client connections to
be closed gracefully.
+//
+// It's defined as a variable instead of constant, so that thrift server
+// implementations can change its value to control the behavior.
+//
+// If it's changed to <=0, the feature will be disabled, and the server will
wait for
+// for all the client connections to be closed gracefully.
+var ServerCloseTimeout = time.Duration(0)
Review comment:
On second thought a better name is probably `ServerStopTimeout`, since
the function using it is `Stop`, not `Close`.
##########
File path: lib/go/README.md
##########
@@ -132,3 +132,17 @@ if this interval is set to a value too low (for example,
1ms), it might cause
excessive cpu overhead.
This feature is also only enabled on non-oneway endpoints.
+
+A note about server stop implementations
+===========================================
+
+The server stop will wait for all client connections to be closed after the
last received or to be received request to be hanled,
+so the time spent by stop may be too long in some situation:
+* when socket timeout is not set,server will be hanged before all active
clients to finish handling the last received or to be received request.
+* when the socket timeout is too long saying one hour, server will be hanged
for one hour before all active clients to finish handling the last received or
to be received request.
+
+To avoid the server stop to be hanged for too long, you can take advantage of
thrift.ServerCloseTimeout in your main or init function:
+
+ thrift.ServerCloseTimeout = <max_duration_to_stop>
+
+If it's changed to <=0, the feature will be disabled, and server will wait for
all the client connections to be closed gracefully with zero err
time.Otherwise, the stop will wait for all the client connections to be closed
gracefully util thrift.ServerCloseTimeout is reached, and the client
connections that are not closed after thrift.ServerCloseTimeout will be closed
abruptly which may cause some client err rate.
Review comment:
also in markdown only a blank line is actual line break (paragraph
separate), so you don't have to write the whole paragraph in a single line. I
usually prefer to add line breaks after comma or period to keep each lines
reasonably short.
##########
File path: lib/go/thrift/simple_server_test.go
##########
@@ -154,3 +158,112 @@ func
TestNoHangDuringStopFromDanglingLockAcquireDuringAcceptLoop(t *testing.T) {
runtime.Gosched()
serv.Stop()
}
+
+func TestNoHangDuringStopFromClientNoDataSendDuringAcceptLoop(t *testing.T) {
+ ln, _ := net.Listen("tcp", "localhost:0")
+
+ proc := &mockProcessor{
+ ProcessFunc: func(in, out TProtocol) (bool, TException) {
+ in.ReadMessageBegin(context.Background())
+ return false, nil
+ },
+ }
+
+ trans := &mockServerTransport{
+ ListenFunc: func() error {
+ return nil
+ },
+ AcceptFunc: func() (TTransport, error) {
+ conn, err := ln.Accept()
+ if err != nil {
+ return nil, err
+ }
+
+ return NewTSocketFromConnConf(conn, nil), nil
+ },
+ CloseFunc: func() error {
+ return nil
+ },
+ InterruptFunc: func() error {
+ return ln.Close()
+ },
+ }
+
+ serv := NewTSimpleServer2(proc, trans)
+ go serv.Serve()
+ time.Sleep(10 * time.Millisecond)
+
+ netConn, err := net.Dial("tcp", ln.Addr().String())
+ if err != nil || netConn == nil {
+ t.Fatal("error when dial server")
+ }
+ time.Sleep(10 * time.Millisecond)
+
+ st := time.Now()
+ ServerCloseTimeout = 50 * time.Millisecond
+ err = serv.Stop()
+ if err != nil {
+ t.Logf("error when stop server:%v", err)
+ t.FailNow()
+ }
+ if time.Since(st) < 50*time.Millisecond {
+ t.Logf("stop cost less time than socket timeout, server close
timeout:%v,cost time:%v", ServerCloseTimeout, time.Since(st))
Review comment:
also we should define `50*time.Millisecond` as a function scoped
constant as we are using it several times inside this function.
##########
File path: lib/go/thrift/simple_server_test.go
##########
@@ -154,3 +158,112 @@ func
TestNoHangDuringStopFromDanglingLockAcquireDuringAcceptLoop(t *testing.T) {
runtime.Gosched()
serv.Stop()
}
+
+func TestNoHangDuringStopFromClientNoDataSendDuringAcceptLoop(t *testing.T) {
+ ln, _ := net.Listen("tcp", "localhost:0")
+
+ proc := &mockProcessor{
+ ProcessFunc: func(in, out TProtocol) (bool, TException) {
+ in.ReadMessageBegin(context.Background())
+ return false, nil
+ },
+ }
+
+ trans := &mockServerTransport{
+ ListenFunc: func() error {
+ return nil
+ },
+ AcceptFunc: func() (TTransport, error) {
+ conn, err := ln.Accept()
+ if err != nil {
+ return nil, err
+ }
+
+ return NewTSocketFromConnConf(conn, nil), nil
+ },
+ CloseFunc: func() error {
+ return nil
+ },
+ InterruptFunc: func() error {
+ return ln.Close()
+ },
+ }
+
+ serv := NewTSimpleServer2(proc, trans)
+ go serv.Serve()
+ time.Sleep(10 * time.Millisecond)
+
+ netConn, err := net.Dial("tcp", ln.Addr().String())
+ if err != nil || netConn == nil {
+ t.Fatal("error when dial server")
+ }
+ time.Sleep(10 * time.Millisecond)
+
+ st := time.Now()
+ ServerCloseTimeout = 50 * time.Millisecond
+ err = serv.Stop()
+ if err != nil {
+ t.Logf("error when stop server:%v", err)
+ t.FailNow()
+ }
+ if time.Since(st) < 50*time.Millisecond {
+ t.Logf("stop cost less time than socket timeout, server close
timeout:%v,cost time:%v", ServerCloseTimeout, time.Since(st))
Review comment:
also calling `time.Since` again will give you a different value and it
might be more than 50ms now and that will make the error message confusing. so
do something like this instead:
```
if elapsed := time.Since(st); elapsed < 50*time.Millisecond {
t.Errorf("...", ServerCloseTimeout, elapsed)
}
```
(also there's no reason to use `t.Logf` then `t.FailNow()`, just use
`t.Errorf`)
##########
File path: lib/go/README.md
##########
@@ -132,3 +132,17 @@ if this interval is set to a value too low (for example,
1ms), it might cause
excessive cpu overhead.
This feature is also only enabled on non-oneway endpoints.
+
+A note about server stop implementations
+===========================================
Review comment:
```suggestion
========================================
```
Using this way of making headings requires you have the same number of `=`s
as the heading line (also github is more lenient on that one and still render
it as a heading even if the number doesn't match)
##########
File path: lib/go/thrift/simple_server_test.go
##########
@@ -154,3 +158,112 @@ func
TestNoHangDuringStopFromDanglingLockAcquireDuringAcceptLoop(t *testing.T) {
runtime.Gosched()
serv.Stop()
}
+
+func TestNoHangDuringStopFromClientNoDataSendDuringAcceptLoop(t *testing.T) {
+ ln, _ := net.Listen("tcp", "localhost:0")
+
+ proc := &mockProcessor{
+ ProcessFunc: func(in, out TProtocol) (bool, TException) {
+ in.ReadMessageBegin(context.Background())
+ return false, nil
+ },
+ }
+
+ trans := &mockServerTransport{
+ ListenFunc: func() error {
+ return nil
+ },
+ AcceptFunc: func() (TTransport, error) {
+ conn, err := ln.Accept()
+ if err != nil {
+ return nil, err
+ }
+
+ return NewTSocketFromConnConf(conn, nil), nil
+ },
+ CloseFunc: func() error {
+ return nil
+ },
+ InterruptFunc: func() error {
+ return ln.Close()
+ },
+ }
+
+ serv := NewTSimpleServer2(proc, trans)
+ go serv.Serve()
+ time.Sleep(10 * time.Millisecond)
+
+ netConn, err := net.Dial("tcp", ln.Addr().String())
+ if err != nil || netConn == nil {
+ t.Fatal("error when dial server")
+ }
+ time.Sleep(10 * time.Millisecond)
+
+ st := time.Now()
+ ServerCloseTimeout = 50 * time.Millisecond
Review comment:
nit: when a test involves changing the value of a global variable, I'd
prefer to change its value back so we don't have tests relying on the default
value becomes flaky because of the order of test executions, so this should be
something like:
```suggestion
backupServerCloseTimeout = ServerCloseTimeout
t.Cleanup(func() {
ServerCloseTimeout = backupServerCloseTimeout
})
ServerCloseTimeout = 50 * time.Millisecond
```
same for the other test.
##########
File path: lib/go/thrift/simple_server_test.go
##########
@@ -154,3 +158,112 @@ func
TestNoHangDuringStopFromDanglingLockAcquireDuringAcceptLoop(t *testing.T) {
runtime.Gosched()
serv.Stop()
}
+
+func TestNoHangDuringStopFromClientNoDataSendDuringAcceptLoop(t *testing.T) {
+ ln, _ := net.Listen("tcp", "localhost:0")
+
+ proc := &mockProcessor{
+ ProcessFunc: func(in, out TProtocol) (bool, TException) {
+ in.ReadMessageBegin(context.Background())
+ return false, nil
+ },
+ }
+
+ trans := &mockServerTransport{
+ ListenFunc: func() error {
+ return nil
+ },
+ AcceptFunc: func() (TTransport, error) {
+ conn, err := ln.Accept()
+ if err != nil {
+ return nil, err
+ }
+
+ return NewTSocketFromConnConf(conn, nil), nil
+ },
+ CloseFunc: func() error {
+ return nil
+ },
+ InterruptFunc: func() error {
+ return ln.Close()
+ },
+ }
+
+ serv := NewTSimpleServer2(proc, trans)
+ go serv.Serve()
+ time.Sleep(10 * time.Millisecond)
+
+ netConn, err := net.Dial("tcp", ln.Addr().String())
+ if err != nil || netConn == nil {
+ t.Fatal("error when dial server")
+ }
+ time.Sleep(10 * time.Millisecond)
+
+ st := time.Now()
+ ServerCloseTimeout = 50 * time.Millisecond
+ err = serv.Stop()
+ if err != nil {
+ t.Logf("error when stop server:%v", err)
+ t.FailNow()
+ }
+ if time.Since(st) < 50*time.Millisecond {
+ t.Logf("stop cost less time than socket timeout, server close
timeout:%v,cost time:%v", ServerCloseTimeout, time.Since(st))
Review comment:
>less time than socket timeout
this is actually "stop/close timeout", not "socket timeout"
##########
File path: lib/go/thrift/simple_server_test.go
##########
@@ -154,3 +158,112 @@ func
TestNoHangDuringStopFromDanglingLockAcquireDuringAcceptLoop(t *testing.T) {
runtime.Gosched()
serv.Stop()
}
+
+func TestNoHangDuringStopFromClientNoDataSendDuringAcceptLoop(t *testing.T) {
+ ln, _ := net.Listen("tcp", "localhost:0")
Review comment:
don't discard errors, even in a test, do something like this instead:
```suggestion
ln, err := net.Listen("tcp", "localhost:0")
if err != nil {
t.Fatalf("Failed to listen: %v", err)
}
```
same for the other test function.
##########
File path: lib/go/thrift/simple_server_test.go
##########
@@ -154,3 +158,112 @@ func
TestNoHangDuringStopFromDanglingLockAcquireDuringAcceptLoop(t *testing.T) {
runtime.Gosched()
serv.Stop()
}
+
+func TestNoHangDuringStopFromClientNoDataSendDuringAcceptLoop(t *testing.T) {
+ ln, _ := net.Listen("tcp", "localhost:0")
+
+ proc := &mockProcessor{
+ ProcessFunc: func(in, out TProtocol) (bool, TException) {
+ in.ReadMessageBegin(context.Background())
+ return false, nil
+ },
+ }
+
+ trans := &mockServerTransport{
+ ListenFunc: func() error {
+ return nil
+ },
+ AcceptFunc: func() (TTransport, error) {
+ conn, err := ln.Accept()
+ if err != nil {
+ return nil, err
+ }
+
+ return NewTSocketFromConnConf(conn, nil), nil
+ },
+ CloseFunc: func() error {
+ return nil
+ },
+ InterruptFunc: func() error {
+ return ln.Close()
+ },
+ }
+
+ serv := NewTSimpleServer2(proc, trans)
+ go serv.Serve()
+ time.Sleep(10 * time.Millisecond)
+
+ netConn, err := net.Dial("tcp", ln.Addr().String())
+ if err != nil || netConn == nil {
+ t.Fatal("error when dial server")
+ }
+ time.Sleep(10 * time.Millisecond)
+
+ st := time.Now()
+ ServerCloseTimeout = 50 * time.Millisecond
+ err = serv.Stop()
+ if err != nil {
+ t.Logf("error when stop server:%v", err)
+ t.FailNow()
+ }
+ if time.Since(st) < 50*time.Millisecond {
+ t.Logf("stop cost less time than socket timeout, server close
timeout:%v,cost time:%v", ServerCloseTimeout, time.Since(st))
+ t.FailNow()
+ }
+}
+
+func TestStopTimeoutWithSocketTimeout(t *testing.T) {
+ ln, _ := net.Listen("tcp", "localhost:0")
+
+ proc := &mockProcessor{
+ ProcessFunc: func(in, out TProtocol) (bool, TException) {
+ in.ReadMessageBegin(context.Background())
+ return false, nil
+ },
+ }
+
+ conf := &TConfiguration{SocketTimeout: 5 * time.Millisecond}
+ wg := &sync.WaitGroup{}
+ trans := &mockServerTransport{
+ ListenFunc: func() error {
+ return nil
+ },
+ AcceptFunc: func() (TTransport, error) {
+ conn, err := ln.Accept()
+ if err != nil {
+ return nil, err
+ }
+ defer wg.Done()
+ return NewTSocketFromConnConf(conn, conf), nil
+ },
+ CloseFunc: func() error {
+ return nil
+ },
+ InterruptFunc: func() error {
+ return ln.Close()
+ },
+ }
+
+ serv := NewTSimpleServer2(proc, trans)
+ go serv.Serve()
+ time.Sleep(10 * time.Millisecond)
+
+ wg.Add(1)
+ netConn, err := net.Dial("tcp", ln.Addr().String())
+ if err != nil || netConn == nil {
+ t.Fatal("error when dial server")
+ }
+ wg.Wait()
+
+ st := time.Now()
+ ServerCloseTimeout = 50 * time.Millisecond
+ err = serv.Stop()
+ if time.Since(st) >= 10*time.Millisecond {
Review comment:
same comments as the previous function, plus:
this check of `>=10ms` can make this test really flaky, because when the
test is running on CI, the system can be resource constrained and it can
actually take more than 10ms for `serv.Stop()` to return.
I would suggest to set `ServerCloseTimeout` to a much longer duration (say,
1 second), and fail the test if it takes more than half of that (so that would
be 500ms, but just use `ServerCloseTimeout/2` here).
##########
File path: lib/go/README.md
##########
@@ -132,3 +132,17 @@ if this interval is set to a value too low (for example,
1ms), it might cause
excessive cpu overhead.
This feature is also only enabled on non-oneway endpoints.
+
+A note about server stop implementations
+===========================================
+
+The server stop will wait for all client connections to be closed after the
last received or to be received request to be hanled,
+so the time spent by stop may be too long in some situation:
+* when socket timeout is not set,server will be hanged before all active
clients to finish handling the last received or to be received request.
+* when the socket timeout is too long saying one hour, server will be hanged
for one hour before all active clients to finish handling the last received or
to be received request.
+
+To avoid the server stop to be hanged for too long, you can take advantage of
thrift.ServerCloseTimeout in your main or init function:
+
+ thrift.ServerCloseTimeout = <max_duration_to_stop>
+
+If it's changed to <=0, the feature will be disabled, and server will wait for
all the client connections to be closed gracefully with zero err
time.Otherwise, the stop will wait for all the client connections to be closed
gracefully util thrift.ServerCloseTimeout is reached, and the client
connections that are not closed after thrift.ServerCloseTimeout will be closed
abruptly which may cause some client err rate.
Review comment:
>If it's changed to <=0
It's 0 by default so "if it's changed to <=0" doesn't make too much sense :)
I'd prefer to word it like "When it's set to <=0" and also mention it's
disabled by default.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]