[
https://issues.apache.org/jira/browse/TINKERPOP-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17321093#comment-17321093
]
Florian Hockmann commented on TINKERPOP-2390:
---------------------------------------------
I just tried to reproduce the scenario but I don't see anything wrong. Here is
what I did:
# Start the server with a {{gremlinPool}} of 1 as described above
(_TinkerpopServer configured not to provide any concurrent service (i.e., all
the queries were processed sequentially_).
# Connect from Gremlin.Net (I used the version from current {{master}} and
also tried it with the version from {{3.4-dev}}) with default settings
({{PoolSize}} of 4 and {{MaxInProcessPerConnection}}: 32)
# Send 10 requests with a custom evaluation timeout of 1 ms that simply sleep
for 3 seconds.
# Result:
## All requests get a {{ResponseException}} with a timeout on the server side.
## 4 connections in state {{ESTABLISHED}} on the server side.
# Send 1 request to verify that both the driver and the server are still in a
valid state. -> Receive the expected result.
# Dispose the {{GremlinClient}} instance.
# Result:
## All 4 connections in state {{TIME_WAIT}} on the server
## After 1 min: connections completely closed
The server is still responsive after this. The {{TIME_WAIT}} is expected from
my limited knowledge about TCP as connections are not completely closed
immediately in case a packet is received out of order. But they are closed
after a timeout which seems to be one minute on my machine.
What I really don't understand here is why the server should close the
connection just because one request ran into a timeout. That doesn't make much
sense as multiple requests can be processed on the same connection. So, the
connection shouldn't be affected by a failing request (failing here in the
sense of timing out).
[~Bobed] Could you please provide more information on this, ideally a setup to
reproduce the problem deterministically? Otherwise, I'm inclined to close this
issue as we cannot reproduce it.
> Connections not released when closed abruptly in the server side
> ----------------------------------------------------------------
>
> Key: TINKERPOP-2390
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2390
> Project: TinkerPop
> Issue Type: Bug
> Components: dotnet
> Affects Versions: 3.4.7
> Environment: Tinkerpop 3.4.7 + Janusgraph 0.5.1 (optional opencypher
> 1.0.0)
> Reporter: Carlos
> Priority: Major
>
> We have developed a WService to query a gremlin-server (JanusGraph 0.5.1)
> using the .net driver. Using the opencypher plugin has allowed us to see a
> behaviour where the server gets completely blocked after a timeout on the
> server side. We thought this might be related to issue
> https://issues.apache.org/jira/browse/TINKERPOP-2288, so we have moved our
> driver version to the master one (3.4-dev, which includes the PR solving this
> issue). However, when facing a timeout (server side always, it is the one
> launching the exception), quite a lot of connections get stalled at
> CLOSE_WAIT status, and the server becomes unusable.
> I've been digging around other bugs and issues, and from what I've read, some
> similar behaviour happened to CosmoDB (although it might be caused in that
> situation due to the some connection leaks, in this case is the timeout). We
> have traced down the problem to the driver itself after isolating all the
> components involved (optimizing the cypher query results in a non-timeout
> situation where everything is ok; forcing the timeout from pure gremlin
> replicates the behaviour).
> We have set up the connection pool params to 16 / 4096 (we are expecting
> quite a high concurrency load).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)