damccorm commented on code in PR #24808:
URL: https://github.com/apache/beam/pull/24808#discussion_r1058865868


##########
sdks/go/pkg/beam/runners/universal/extworker/extworker.go:
##########
@@ -106,12 +106,16 @@ func (s *Loopback) StopWorker(ctx context.Context, req 
*fnpb.StopWorkerRequest)
 // Stop terminates the service and stops all workers.
 func (s *Loopback) Stop(ctx context.Context) error {
        s.mu.Lock()
-       defer s.mu.Unlock()
 
        log.Infof(ctx, "stopping Loopback, and %d workers", len(s.workers))
        s.workers = map[string]context.CancelFunc{}
        s.lis.Close()
        s.rootCancel()
+
+       // There can be a deadlock between the StopWorker RPC and GracefulStop
+       // which waits for all RPCs to finish, so it must be outside the 
critical section.
+       s.mu.Unlock()

Review Comment:
   Does this introduce a new potential race where:
   
   1) We take the lock
   2) We close the LIstener
   3) We release the lock
   4) A worker tries to send information to the port the Listener was set up on
   
   That seems less severe than the current race, but I'm not really sure how 
we'd behave there - would that cause errors? Is there a reason we need to close 
the listener before cancelling/stopping?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to