https://bz.apache.org/bugzilla/show_bug.cgi?id=61183

            Bug ID: 61183
           Summary: Deadlock occurs while sending to a closing session
           Product: Tomcat 8
           Version: 8.0.33
          Hardware: PC
            Status: NEW
          Severity: regression
          Priority: P2
         Component: WebSocket
          Assignee: dev@tomcat.apache.org
          Reporter: lorc...@live.com
  Target Milestone: ----

Hi,
We have recently ran into a deadlock situation in a customer site while using
websockets.
One of the locks is in our code while the other is the Tomcat "stateLock"
defined in "class WsSession".

I have marked this issue as a regression because one of the 2 locations that
Tomcat takes the stateLock in 8.0.33 did not exist in 8.0.29. Here's a quick
look at the differences before examining the deadlocked call stacks:

In 8.0.33 there's an expansion of the stateLock usage:
    protected void registerFuture(FutureToSendHandler f2sh) {
        boolean fail = false;
        synchronized (stateLock) {
                        // leaving out the details...
                }
        }

While in 8.0.29 the entire method looks like:
    protected void registerFuture(FutureToSendHandler f2sh) {
        futures.put(f2sh, f2sh);
    }

I can understand why the stateLock usage has been expanded. But I think there
is a problem with this. Previously we'd only enter stateLock in Tomcat ->
Application calls. Now with the new registerFuture we also can enter stateLock
in Application -> Tomcat calls.
So a recent change seems to have introduced a key ingredient for deadlocks. If
my application has its own lock there is the potential for 2 locks to be
acquired in reverse order by 2 simultaneous threads. 

We have experienced a real-world example of this:

In the 1st thread Tomcat is closing the session after an abrupt disconnect from
the client. The close action bubbles up to the Application onClose handler:

"http-nio-8445-exec-76" daemon prio=5 tid=249 BLOCKED
                at
com.myc.common.domain.websocketutil.WsOutbound.setSocketStatus(WsOutbound.java:152)
--> Waits on the Application local lock
                   Local Variable:
com.myc.common.domain.websocketutil.WsOutbound$SocketStatus#2
                at
com.myc.customer.web.socket.WebSocketEndpoint.onClose(WebSocketEndpoint.java:224)
 --> Application onClose handler called which has to do some tidy up
                   Local Variable: javax.websocket.CloseReason#2
                   Local Variable:
com.myc.customer.web.socket.WebSocketEndpoint#4
                   Local Variable: java.lang.Object#9184
                at sun.reflect.GeneratedMethodAccessor250.invoke(<unknown
string>)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(<unknown
string>)
                at java.lang.reflect.Method.invoke(<unknown string>)
                at
org.apache.tomcat.websocket.pojo.PojoEndpointBase.onClose(PojoEndpointBase.java:107)
                   Local Variable:
org.apache.tomcat.websocket.pojo.PojoEndpointServer#6
                at
org.apache.tomcat.websocket.WsSession.fireEndpointOnClose(WsSession.java:541)
                   Local Variable:
org.apache.catalina.loader.WebappClassLoader#2
                   Local Variable:
org.apache.catalina.core.DefaultInstanceManager#3
                at
org.apache.tomcat.websocket.WsSession.doClose(WsSession.java:490)              
     --> Acquires a “state lock” on Tomcats call to close the session
                at
org.apache.tomcat.websocket.server.WsHttpUpgradeHandler.onError(WsHttpUpgradeHandler.java:150)
                   Local Variable:
org.apache.tomcat.websocket.server.WsHttpUpgradeHandler#6
                at
org.apache.tomcat.websocket.server.WsHttpUpgradeHandler.access$300(WsHttpUpgradeHandler.java:48)


In the 2nd thread the Application is simply sending a normal message to Tomcat
(unaware the session is closing simultaneously):

"defaultEventExecutorGroup-3-1" prio=5 tid=94 BLOCKED
                at
org.apache.tomcat.websocket.WsSession.registerFuture(WsSession.java:650)       
   --> Waits on the Tomcat “stateLock”
                   Local Variable: org.apache.tomcat.websocket.WsSession#6
                at
org.apache.tomcat.websocket.FutureToSendHandler.get(FutureToSendHandler.java:104)
                   Local Variable:
org.apache.tomcat.websocket.FutureToSendHandler#247
                   Local Variable: java.util.concurrent.TimeUnit$4#1
                at
org.apache.tomcat.websocket.FutureToSendHandler.get(FutureToSendHandler.java:31)
                at
com.myc.common.domain.websocketutil.WsOutbound.await(WsOutbound.java:255)
                at
com.myc.common.domain.websocketutil.WsOutbound.sendObj(WsOutbound.java:234)
                at
com.myc.common.domain.websocketutil.WsOutbound.send(WsOutbound.java:215)       
 --> Acquires the Application local lock while sending (to guarantee message
order when sending multiple messages)
                   Local Variable:
com.myc.customer.model.json.notification.NewMessageNotification#3

Our interface to Tomcat (wsOutbound.java) takes the Application lock using
class "synchronized" as follows:

    public synchronized void setSocketStatus(SocketStatus status) {
                // Does some state change stuff. Can be invoked from onClose()
        }

        public synchronized boolean send(Object obj) {
                // Sends one or more messages to Tomcat
        }

So we had a simple locking strategy. Now I have to come up with a message
queuing strategy so that I can send multiple messages, in order, without
holding an Application lock.

To my mind the root problem is in the existing Tomcat code for doClose(). In
other deadlock situations I've been involved with the onus has been on the
thread that invokes a callback/listener to exit any locks it holds first:

    private void doClose(CloseReason closeReasonMessage,
            CloseReason closeReasonLocal) {
        // Double-checked locking. OK because state is volatile
        if (state != State.OPEN) {
            return;
        }

        synchronized (stateLock) {
            if (state != State.OPEN) {
                return;
            }

            if (log.isDebugEnabled()) {
                log.debug(sm.getString("wsSession.doClose", id));
            }
            try {
                wsRemoteEndpoint.setBatchingAllowed(false);
            } catch (IOException e) {
                log.warn(sm.getString("wsSession.flushFailOnClose"), e);
                                                                               
        <-- Exit stateLock first?
                fireEndpointOnError(e);
            }

            state = State.CLOSING;

            sendCloseMessage(closeReasonMessage);
                                                                               
        <-- Exit stateLock first?
            fireEndpointOnClose(closeReasonLocal);
                                                                               
        <-- Reenter stateLock?
            state = State.CLOSED;
        }

I took a quick look at the latest 8.0.x (8.0.44) code and it appears the same
issue can happen.
I'd appreciate any comment on my analysis or improvements in coding our
application.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to