[
https://issues.apache.org/jira/browse/TINKERPOP-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185192#comment-17185192
]
Stephen Mallette commented on TINKERPOP-2405:
---------------------------------------------
Thanks for reporting this and for doing some analysis on the problem. Please
have a look at the approach I've included to set the tornado read/write timeout
in the associated pull request referenced above to see if that will work well
enough to solve your problem.
I should have asked if you wanted to offer a pull request for this change -
sorry about that.
> gremlinpython: traversal hangs when the connection is established but the
> servers stops responding later
> --------------------------------------------------------------------------------------------------------
>
> Key: TINKERPOP-2405
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2405
> Project: TinkerPop
> Issue Type: Bug
> Components: python
> Affects Versions: 3.4.6
> Environment: Ubuntu 18.04, Flask 1.1.1, python 3.8.1, Amazon
> Neptune, Gremlin Server
> Reporter: Guilherme Quentel Melo
> Priority: Major
>
> On a HTTP server that connects to Amazon Neptune, I've seen some situations
> where a request just hangs and never returns any response. While
> investigating this, I found out that it hangs right when it is going to query
> Neptune.
> The problem is that if the connection to Gremlin/Neptune is established and
> after that the server does not respond any more, the gremlin connection never
> times out, making the process/thread wait forever for a response that will
> never come.
> h1. How to reproduce
> # Start a local gremlin server on the default port 8182
> # On a terminal, run {{nc}} to listen on port 8183 with {{nc -lk 8183}}
> # Run the following python code to connect to the *8183* port:
> {code:python}
> from gremlin_python.driver.driver_remote_connection import
> DriverRemoteConnection
> from gremlin_python.process.anonymous_traversal import traversal
> remote_connection = DriverRemoteConnection("ws://127.0.0.1:8183/gremlin",
> "g")
> g = traversal().withRemote(remote_connection)
>
> g.V().limit(1).toList()
> {code}
> # You will see the connection request on {{nc}} output. First time, don't do
> anything and the it will timeout saying the connection couldn't be
> established.
> # Now repeat the steps, but make nc respond to establish the connection. The
> quickest way I found is to manually relay the message the real gremlin server:
> ## Copy the whole request from {{nc -l}} output
> ## On another terminal, open a connection to the gremlin server with {{nc
> 127.0.0.1 8182}}
> ## Paste the request you copied before to {{nc 127.0.0.1 8182}} terminal
> ## Copy the gremlin server response and paste into {{nc -l}} output
> ## The connection will be established and the {{nc -l}} will receive some
> unprintable chars corresponding to {{g.V().limit(1).toList()}}
> ## Now, if there is no response from {{nc -l}} process, the python code will
> hang forever.
> h1. Possible solution
> As I looked into it, the problem seems that the {TornadoTransport}
> implementation does not pass any timeout when reading (and writing) messages.
> So, passing a timeout to {{self._loop.run_sync}} can solve the issue, at
> least raising an exception when the server does not respond.
> If I change the example above:
> {code:python}
> from gremlin_python.driver.driver_remote_connection import
> DriverRemoteConnection
> from gremlin_python.driver.tornado.transport import TornadoTransport
>
> from gremlin_python.process.anonymous_traversal import traversal
> class CustomTornadoTransport(TornadoTransport):
> def read(self):
> return self._loop.run_sync(lambda: self._ws.read_message(), timeout=5)
> remote_connection = DriverRemoteConnection("ws://127.0.0.1:8183/gremlin",
> "g", transport_factory=CustomTornadoTransport)
> g = traversal().withRemote(remote_connection)
>
> g.V().limit(1).toList()
> {code}
> and repeat the same steps, {{g.V().limit(1).toList()}} times out after not
> getting any response from the server for 5 seconds.
> I'm not sure if there should be any timeout for writing, but it seems it
> should definitely be set for read operations.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)