[
https://issues.apache.org/jira/browse/TINKERPOP-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181248#comment-17181248
]
Guilherme Quentel Melo commented on TINKERPOP-2388:
---------------------------------------------------
Thanks Mark for pointing this out. I looked at that issue. I see how they are
related, but this one seems to be an easier fix.
I've been using the solution mentioned in the description (with
{{CustomTornadoTransport}}) and it seems to be working fine so far. Connections
are being started and closed in multiple threads. What do you think of it?
Do you think that is a fix worth doing in the original {{TornadoTranport}}
> gremlinpython: Can't close DriverRemoteConnection
> -------------------------------------------------
>
> Key: TINKERPOP-2388
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2388
> Project: TinkerPop
> Issue Type: Bug
> Components: python
> Affects Versions: 3.4.6
> Environment: Ubuntu 18.04, Flask 1.1.1, python 3.8.1, Amazon Neptune
> Reporter: Guilherme Quentel Melo
> Priority: Major
>
> In the context of a Flask application using multi threads, it is currently
> not possible to close the DriverRemoteConnection due to two issues. As our
> Flask application initiates a new connection on every new request (because we
> don't want the trouble of reusing the connections), the process eventually
> runs out of file descriptors.
> h1. How to reproduce
> Given a gremlin server running on {{127.0.0.1:8182}}, this can reproduce the
> first error:
> {code:python}
> import threading
> from gremlin_python.driver.driver_remote_connection import
> DriverRemoteConnection
> from gremlin_python.process.anonymous_traversal import traversal
> if __name__ == "__main__":
> def handle_request():
> remote_connection =
> DriverRemoteConnection("ws://127.0.0.1:8182/gremlin", "g")
> g = traversal().withRemote(remote_connection)
> print(g.V().limit(1).toList())
> remote_connection.close()
> for i in range(10):
> t = threading.Thread(target=handle_request)
> t.start()
> t.join()
> print("Press ENTER to terminate")
> s = input()
> {code}
> h2. Error due to not finding current event loop
> When a thread tries to execute {{remote_connection.close()}}, the following
> error happens:
> {code:python}
> asyncio/events.py", line 639, in get_event_loop
> raise RuntimeError('There is no current event loop in thread %r.'
> RuntimeError: There is no current event loop in thread 'Thread-10'.
> {code}
> This is caused by {{TornadoTransport.close()}} [does not close the websocket
> in a
> loop|https://github.com/apache/tinkerpop/blob/1bb1a49dffbcb64ee6cbe86d048eee386303da7d/gremlin-python/src/main/python/gremlin_python/driver/tornado/transport.py#L46].
> I can fix that by providing my own transport to close the websocket with
> {{self._loop.run_sync(lambda: self._ws.close())}}:
> {code:python}
> import threading
> from gremlin_python.driver.driver_remote_connection import
> DriverRemoteConnection
> from gremlin_python.driver.tornado.transport import TornadoTransport
> from gremlin_python.process.anonymous_traversal import traversal
> class CustomTornadoTransport(TornadoTransport):
> def close(self):
> self._loop.run_sync(lambda: self._ws.close())
> self._loop.close()
> if __name__ == "__main__":
> def handle_request():
> remote_connection = DriverRemoteConnection(
> "ws://127.0.0.1:8182/gremlin", "g",
> transport_factory=CustomTornadoTransport
> )
> g = traversal().withRemote(remote_connection)
> print(g.V().limit(1).toList())
> remote_connection.close()
> for i in range(10):
> t = threading.Thread(target=handle_request)
> t.start()
> t.join()
> print("Press ENTER to terminate")
> s = input()
> {code}
> h2. Connections are kept in CLOSE_WAIT state
> Now, apparently the connection is closed successfully, but if we look at the
> open connections, we will find a bunch of tcp connections in {{CLOSE_WAIT}}
> state.
> For example, using netstat on Linux while the script is still running:
> {code:java}
> netstat -nt4p | grep 8182
> (Not all processes could be identified, non-owned process info
> will not be shown, you would have to be root to see it all.)
> tcp 3 0 127.0.0.1:52092 127.0.0.1:8182
> CLOSE_WAIT 26886/ld-linux-x86-
> tcp 3 0 127.0.0.1:52110 127.0.0.1:8182
> CLOSE_WAIT 26886/ld-linux-x86-
> tcp 3 0 127.0.0.1:52098 127.0.0.1:8182
> CLOSE_WAIT 26886/ld-linux-x86-
> tcp 3 0 127.0.0.1:52104 127.0.0.1:8182
> CLOSE_WAIT 26886/ld-linux-x86-
> {code}
> Digging into the code, I found out that tornado does not terminate the
> connection right away. This is what happens when the websocket is closed:
> # [It sends a message to the
> server|https://github.com/tornadoweb/tornado/blob/v5.1.1/tornado/websocket.py#L1041]
> # [It schedules a 5s timer to abort the connection in case the client does
> not close
> it|https://github.com/tornadoweb/tornado/blob/v5.1.1/tornado/websocket.py#L1053]
> # *On the next IO loop iteration*, [it receives the client message for
> closing the
> connection|https://github.com/tornadoweb/tornado/blob/v5.1.1/tornado/websocket.py#L1007]
> # [As the client closed the connection cleanly, it cancels the
> timeout|https://github.com/tornadoweb/tornado/blob/v5.1.1/tornado/websocket.py#L1047]
> So, for the websocket to properly close, the loop needs to run again for
> Tornado to receive the client's close message or for the timeout to call
> abort. But that never happens, because TornadoTransport.close also closes the
> loop, leaking those connections.
> I don't know if that is the best solution, but reading a message from the
> socket after closing it, makes tornado receive the client message to close
> the connection:
> {code:python}
> class CustomTornadoTransport(TornadoTransport):
> def close(self):
> self._loop.run_sync(lambda: self._ws.close())
> message = self._loop.run_sync(lambda: self._ws.read_message())
> # This situation shouldn't really happen. Since the connection was
> closed,
> # the next message should be None
> if message is not None:
> raise RuntimeError("Connection was not properly closed")
> self._loop.close()
> {code}
> Now, after running the script with the change above, {{netstat -nt4p | grep
> 8182}} does not show any connections any more.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)