fishy commented on a change in pull request #2497:
URL: https://github.com/apache/thrift/pull/2497#discussion_r791929108



##########
File path: lib/go/README.md
##########
@@ -132,3 +132,17 @@ if this interval is set to a value too low (for example, 
1ms), it might cause
 excessive cpu overhead.
 
 This feature is also only enabled on non-oneway endpoints.
+
+A note about server stop implementations
+===========================================
+
+The server stop will wait for all client connections to be closed after the 
last received or to be received request to be hanled, 

Review comment:
       I would prefer to `s/The server stop/TSimpleServer.Stop/`, with 
optionally link to the go doc page 
(https://pkg.go.dev/github.com/apache/thrift/lib/go/thrift#TSimpleServer.Stop)

##########
File path: lib/go/thrift/simple_server.go
##########
@@ -48,15 +49,26 @@ var ErrAbandonRequest = errors.New("request abandoned")
 // If it's changed to <=0, the feature will be disabled.
 var ServerConnectivityCheckInterval = time.Millisecond * 5
 
+// ServerCloseTimeout defines max close wait duration used by
+// server stop to avoid hanging too long to wait for all client connections to 
be closed gracefully.
+//
+// It's defined as a variable instead of constant, so that thrift server
+// implementations can change its value to control the behavior.
+//
+// If it's changed to <=0, the feature will be disabled, and the server will 
wait for
+// for all the client connections to be closed gracefully.
+var ServerCloseTimeout = time.Duration(0)

Review comment:
       On second thought a better name is probably `ServerStopTimeout`, since 
the function using it is `Stop`, not `Close`.

##########
File path: lib/go/README.md
##########
@@ -132,3 +132,17 @@ if this interval is set to a value too low (for example, 
1ms), it might cause
 excessive cpu overhead.
 
 This feature is also only enabled on non-oneway endpoints.
+
+A note about server stop implementations
+===========================================
+
+The server stop will wait for all client connections to be closed after the 
last received or to be received request to be hanled, 
+so the time spent by stop may be too long in some situation:
+* when socket timeout is not set,server will be hanged before all active 
clients to finish handling the last received or to be received request.
+* when the socket timeout is too long saying one hour, server will be hanged 
for one hour before all active clients to finish handling the last received or 
to be received request.
+
+To avoid the server stop to be hanged for too long, you can take advantage of 
thrift.ServerCloseTimeout in your main or init function:
+
+    thrift.ServerCloseTimeout = <max_duration_to_stop>
+
+If it's changed to <=0, the feature will be disabled, and server will wait for 
all the client connections to be closed gracefully with zero err 
time.Otherwise, the stop will wait for all the client connections to be closed 
gracefully util thrift.ServerCloseTimeout is reached, and the client 
connections that are not closed after thrift.ServerCloseTimeout will be closed 
abruptly which may cause some client err rate.

Review comment:
       also in markdown only a blank line is actual line break (paragraph 
separate), so you don't have to write the whole paragraph in a single line. I 
usually prefer to add line breaks after comma or period to keep each lines 
reasonably short.

##########
File path: lib/go/thrift/simple_server_test.go
##########
@@ -154,3 +158,112 @@ func 
TestNoHangDuringStopFromDanglingLockAcquireDuringAcceptLoop(t *testing.T) {
        runtime.Gosched()
        serv.Stop()
 }
+
+func TestNoHangDuringStopFromClientNoDataSendDuringAcceptLoop(t *testing.T) {
+       ln, _ := net.Listen("tcp", "localhost:0")
+
+       proc := &mockProcessor{
+               ProcessFunc: func(in, out TProtocol) (bool, TException) {
+                       in.ReadMessageBegin(context.Background())
+                       return false, nil
+               },
+       }
+
+       trans := &mockServerTransport{
+               ListenFunc: func() error {
+                       return nil
+               },
+               AcceptFunc: func() (TTransport, error) {
+                       conn, err := ln.Accept()
+                       if err != nil {
+                               return nil, err
+                       }
+
+                       return NewTSocketFromConnConf(conn, nil), nil
+               },
+               CloseFunc: func() error {
+                       return nil
+               },
+               InterruptFunc: func() error {
+                       return ln.Close()
+               },
+       }
+
+       serv := NewTSimpleServer2(proc, trans)
+       go serv.Serve()
+       time.Sleep(10 * time.Millisecond)
+
+       netConn, err := net.Dial("tcp", ln.Addr().String())
+       if err != nil || netConn == nil {
+               t.Fatal("error when dial server")
+       }
+       time.Sleep(10 * time.Millisecond)
+
+       st := time.Now()
+       ServerCloseTimeout = 50 * time.Millisecond
+       err = serv.Stop()
+       if err != nil {
+               t.Logf("error when stop server:%v", err)
+               t.FailNow()
+       }
+       if time.Since(st) < 50*time.Millisecond {
+               t.Logf("stop cost less time than socket timeout, server close 
timeout:%v,cost time:%v", ServerCloseTimeout, time.Since(st))

Review comment:
       also we should define `50*time.Millisecond` as a function scoped 
constant as we are using it several times inside this function.

##########
File path: lib/go/thrift/simple_server_test.go
##########
@@ -154,3 +158,112 @@ func 
TestNoHangDuringStopFromDanglingLockAcquireDuringAcceptLoop(t *testing.T) {
        runtime.Gosched()
        serv.Stop()
 }
+
+func TestNoHangDuringStopFromClientNoDataSendDuringAcceptLoop(t *testing.T) {
+       ln, _ := net.Listen("tcp", "localhost:0")
+
+       proc := &mockProcessor{
+               ProcessFunc: func(in, out TProtocol) (bool, TException) {
+                       in.ReadMessageBegin(context.Background())
+                       return false, nil
+               },
+       }
+
+       trans := &mockServerTransport{
+               ListenFunc: func() error {
+                       return nil
+               },
+               AcceptFunc: func() (TTransport, error) {
+                       conn, err := ln.Accept()
+                       if err != nil {
+                               return nil, err
+                       }
+
+                       return NewTSocketFromConnConf(conn, nil), nil
+               },
+               CloseFunc: func() error {
+                       return nil
+               },
+               InterruptFunc: func() error {
+                       return ln.Close()
+               },
+       }
+
+       serv := NewTSimpleServer2(proc, trans)
+       go serv.Serve()
+       time.Sleep(10 * time.Millisecond)
+
+       netConn, err := net.Dial("tcp", ln.Addr().String())
+       if err != nil || netConn == nil {
+               t.Fatal("error when dial server")
+       }
+       time.Sleep(10 * time.Millisecond)
+
+       st := time.Now()
+       ServerCloseTimeout = 50 * time.Millisecond
+       err = serv.Stop()
+       if err != nil {
+               t.Logf("error when stop server:%v", err)
+               t.FailNow()
+       }
+       if time.Since(st) < 50*time.Millisecond {
+               t.Logf("stop cost less time than socket timeout, server close 
timeout:%v,cost time:%v", ServerCloseTimeout, time.Since(st))

Review comment:
       also calling `time.Since` again will give you a different value and it 
might be more than 50ms now and that will make the error message confusing. so 
do something like this instead:
   ```
   if elapsed := time.Since(st); elapsed < 50*time.Millisecond {
     t.Errorf("...", ServerCloseTimeout, elapsed)
   }
   ```
   
   (also there's no reason to use `t.Logf` then `t.FailNow()`, just use 
`t.Errorf`)

##########
File path: lib/go/README.md
##########
@@ -132,3 +132,17 @@ if this interval is set to a value too low (for example, 
1ms), it might cause
 excessive cpu overhead.
 
 This feature is also only enabled on non-oneway endpoints.
+
+A note about server stop implementations
+===========================================

Review comment:
       ```suggestion
   ========================================
   ```
   
   Using this way of making headings requires you have the same number of `=`s 
as the heading line (also github is more lenient on that one and still render 
it as a heading even if the number doesn't match)

##########
File path: lib/go/thrift/simple_server_test.go
##########
@@ -154,3 +158,112 @@ func 
TestNoHangDuringStopFromDanglingLockAcquireDuringAcceptLoop(t *testing.T) {
        runtime.Gosched()
        serv.Stop()
 }
+
+func TestNoHangDuringStopFromClientNoDataSendDuringAcceptLoop(t *testing.T) {
+       ln, _ := net.Listen("tcp", "localhost:0")
+
+       proc := &mockProcessor{
+               ProcessFunc: func(in, out TProtocol) (bool, TException) {
+                       in.ReadMessageBegin(context.Background())
+                       return false, nil
+               },
+       }
+
+       trans := &mockServerTransport{
+               ListenFunc: func() error {
+                       return nil
+               },
+               AcceptFunc: func() (TTransport, error) {
+                       conn, err := ln.Accept()
+                       if err != nil {
+                               return nil, err
+                       }
+
+                       return NewTSocketFromConnConf(conn, nil), nil
+               },
+               CloseFunc: func() error {
+                       return nil
+               },
+               InterruptFunc: func() error {
+                       return ln.Close()
+               },
+       }
+
+       serv := NewTSimpleServer2(proc, trans)
+       go serv.Serve()
+       time.Sleep(10 * time.Millisecond)
+
+       netConn, err := net.Dial("tcp", ln.Addr().String())
+       if err != nil || netConn == nil {
+               t.Fatal("error when dial server")
+       }
+       time.Sleep(10 * time.Millisecond)
+
+       st := time.Now()
+       ServerCloseTimeout = 50 * time.Millisecond

Review comment:
       nit: when a test involves changing the value of a global variable, I'd 
prefer to change its value back so we don't have tests relying on the default 
value becomes flaky because of the order of test executions, so this should be 
something like:
   ```suggestion
        backupServerCloseTimeout = ServerCloseTimeout
        t.Cleanup(func() {
                ServerCloseTimeout = backupServerCloseTimeout
        })
        ServerCloseTimeout = 50 * time.Millisecond
   ```
   
   same for the other test.

##########
File path: lib/go/thrift/simple_server_test.go
##########
@@ -154,3 +158,112 @@ func 
TestNoHangDuringStopFromDanglingLockAcquireDuringAcceptLoop(t *testing.T) {
        runtime.Gosched()
        serv.Stop()
 }
+
+func TestNoHangDuringStopFromClientNoDataSendDuringAcceptLoop(t *testing.T) {
+       ln, _ := net.Listen("tcp", "localhost:0")
+
+       proc := &mockProcessor{
+               ProcessFunc: func(in, out TProtocol) (bool, TException) {
+                       in.ReadMessageBegin(context.Background())
+                       return false, nil
+               },
+       }
+
+       trans := &mockServerTransport{
+               ListenFunc: func() error {
+                       return nil
+               },
+               AcceptFunc: func() (TTransport, error) {
+                       conn, err := ln.Accept()
+                       if err != nil {
+                               return nil, err
+                       }
+
+                       return NewTSocketFromConnConf(conn, nil), nil
+               },
+               CloseFunc: func() error {
+                       return nil
+               },
+               InterruptFunc: func() error {
+                       return ln.Close()
+               },
+       }
+
+       serv := NewTSimpleServer2(proc, trans)
+       go serv.Serve()
+       time.Sleep(10 * time.Millisecond)
+
+       netConn, err := net.Dial("tcp", ln.Addr().String())
+       if err != nil || netConn == nil {
+               t.Fatal("error when dial server")
+       }
+       time.Sleep(10 * time.Millisecond)
+
+       st := time.Now()
+       ServerCloseTimeout = 50 * time.Millisecond
+       err = serv.Stop()
+       if err != nil {
+               t.Logf("error when stop server:%v", err)
+               t.FailNow()
+       }
+       if time.Since(st) < 50*time.Millisecond {
+               t.Logf("stop cost less time than socket timeout, server close 
timeout:%v,cost time:%v", ServerCloseTimeout, time.Since(st))

Review comment:
       >less time than socket timeout
   
   this is actually "stop/close timeout", not "socket timeout"

##########
File path: lib/go/thrift/simple_server_test.go
##########
@@ -154,3 +158,112 @@ func 
TestNoHangDuringStopFromDanglingLockAcquireDuringAcceptLoop(t *testing.T) {
        runtime.Gosched()
        serv.Stop()
 }
+
+func TestNoHangDuringStopFromClientNoDataSendDuringAcceptLoop(t *testing.T) {
+       ln, _ := net.Listen("tcp", "localhost:0")

Review comment:
       don't discard errors, even in a test, do something like this instead:
   ```suggestion
        ln, err := net.Listen("tcp", "localhost:0")
        if err != nil {
                t.Fatalf("Failed to listen: %v", err)
        }
   ```
   
   same for the other test function.

##########
File path: lib/go/thrift/simple_server_test.go
##########
@@ -154,3 +158,112 @@ func 
TestNoHangDuringStopFromDanglingLockAcquireDuringAcceptLoop(t *testing.T) {
        runtime.Gosched()
        serv.Stop()
 }
+
+func TestNoHangDuringStopFromClientNoDataSendDuringAcceptLoop(t *testing.T) {
+       ln, _ := net.Listen("tcp", "localhost:0")
+
+       proc := &mockProcessor{
+               ProcessFunc: func(in, out TProtocol) (bool, TException) {
+                       in.ReadMessageBegin(context.Background())
+                       return false, nil
+               },
+       }
+
+       trans := &mockServerTransport{
+               ListenFunc: func() error {
+                       return nil
+               },
+               AcceptFunc: func() (TTransport, error) {
+                       conn, err := ln.Accept()
+                       if err != nil {
+                               return nil, err
+                       }
+
+                       return NewTSocketFromConnConf(conn, nil), nil
+               },
+               CloseFunc: func() error {
+                       return nil
+               },
+               InterruptFunc: func() error {
+                       return ln.Close()
+               },
+       }
+
+       serv := NewTSimpleServer2(proc, trans)
+       go serv.Serve()
+       time.Sleep(10 * time.Millisecond)
+
+       netConn, err := net.Dial("tcp", ln.Addr().String())
+       if err != nil || netConn == nil {
+               t.Fatal("error when dial server")
+       }
+       time.Sleep(10 * time.Millisecond)
+
+       st := time.Now()
+       ServerCloseTimeout = 50 * time.Millisecond
+       err = serv.Stop()
+       if err != nil {
+               t.Logf("error when stop server:%v", err)
+               t.FailNow()
+       }
+       if time.Since(st) < 50*time.Millisecond {
+               t.Logf("stop cost less time than socket timeout, server close 
timeout:%v,cost time:%v", ServerCloseTimeout, time.Since(st))
+               t.FailNow()
+       }
+}
+
+func TestStopTimeoutWithSocketTimeout(t *testing.T) {
+       ln, _ := net.Listen("tcp", "localhost:0")
+
+       proc := &mockProcessor{
+               ProcessFunc: func(in, out TProtocol) (bool, TException) {
+                       in.ReadMessageBegin(context.Background())
+                       return false, nil
+               },
+       }
+
+       conf := &TConfiguration{SocketTimeout: 5 * time.Millisecond}
+       wg := &sync.WaitGroup{}
+       trans := &mockServerTransport{
+               ListenFunc: func() error {
+                       return nil
+               },
+               AcceptFunc: func() (TTransport, error) {
+                       conn, err := ln.Accept()
+                       if err != nil {
+                               return nil, err
+                       }
+                       defer wg.Done()
+                       return NewTSocketFromConnConf(conn, conf), nil
+               },
+               CloseFunc: func() error {
+                       return nil
+               },
+               InterruptFunc: func() error {
+                       return ln.Close()
+               },
+       }
+
+       serv := NewTSimpleServer2(proc, trans)
+       go serv.Serve()
+       time.Sleep(10 * time.Millisecond)
+
+       wg.Add(1)
+       netConn, err := net.Dial("tcp", ln.Addr().String())
+       if err != nil || netConn == nil {
+               t.Fatal("error when dial server")
+       }
+       wg.Wait()
+
+       st := time.Now()
+       ServerCloseTimeout = 50 * time.Millisecond
+       err = serv.Stop()
+       if time.Since(st) >= 10*time.Millisecond {

Review comment:
       same comments as the previous function, plus:
   
   this check of `>=10ms` can make this test really flaky, because when the 
test is running on CI, the system can be resource constrained and it can 
actually take more than 10ms for `serv.Stop()` to return.
   
   I would suggest to set `ServerCloseTimeout` to a much longer duration (say, 
1 second), and fail the test if it takes more than half of that (so that would 
be 500ms, but just use `ServerCloseTimeout/2` here).

##########
File path: lib/go/README.md
##########
@@ -132,3 +132,17 @@ if this interval is set to a value too low (for example, 
1ms), it might cause
 excessive cpu overhead.
 
 This feature is also only enabled on non-oneway endpoints.
+
+A note about server stop implementations
+===========================================
+
+The server stop will wait for all client connections to be closed after the 
last received or to be received request to be hanled, 
+so the time spent by stop may be too long in some situation:
+* when socket timeout is not set,server will be hanged before all active 
clients to finish handling the last received or to be received request.
+* when the socket timeout is too long saying one hour, server will be hanged 
for one hour before all active clients to finish handling the last received or 
to be received request.
+
+To avoid the server stop to be hanged for too long, you can take advantage of 
thrift.ServerCloseTimeout in your main or init function:
+
+    thrift.ServerCloseTimeout = <max_duration_to_stop>
+
+If it's changed to <=0, the feature will be disabled, and server will wait for 
all the client connections to be closed gracefully with zero err 
time.Otherwise, the stop will wait for all the client connections to be closed 
gracefully util thrift.ServerCloseTimeout is reached, and the client 
connections that are not closed after thrift.ServerCloseTimeout will be closed 
abruptly which may cause some client err rate.

Review comment:
       >If it's changed to <=0
   
   It's 0 by default so "if it's changed to <=0" doesn't make too much sense :)
   
   I'd prefer to word it like "When it's set to <=0" and also mention it's 
disabled by default.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to