[ https://issues.apache.org/jira/browse/TINKERPOP-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17321093#comment-17321093 ]
Florian Hockmann commented on TINKERPOP-2390: --------------------------------------------- I just tried to reproduce the scenario but I don't see anything wrong. Here is what I did: # Start the server with a {{gremlinPool}} of 1 as described above (_TinkerpopServer configured not to provide any concurrent service (i.e., all the queries were processed sequentially_). # Connect from Gremlin.Net (I used the version from current {{master}} and also tried it with the version from {{3.4-dev}}) with default settings ({{PoolSize}} of 4 and {{MaxInProcessPerConnection}}: 32) # Send 10 requests with a custom evaluation timeout of 1 ms that simply sleep for 3 seconds. # Result: ## All requests get a {{ResponseException}} with a timeout on the server side. ## 4 connections in state {{ESTABLISHED}} on the server side. # Send 1 request to verify that both the driver and the server are still in a valid state. -> Receive the expected result. # Dispose the {{GremlinClient}} instance. # Result: ## All 4 connections in state {{TIME_WAIT}} on the server ## After 1 min: connections completely closed The server is still responsive after this. The {{TIME_WAIT}} is expected from my limited knowledge about TCP as connections are not completely closed immediately in case a packet is received out of order. But they are closed after a timeout which seems to be one minute on my machine. What I really don't understand here is why the server should close the connection just because one request ran into a timeout. That doesn't make much sense as multiple requests can be processed on the same connection. So, the connection shouldn't be affected by a failing request (failing here in the sense of timing out). [~Bobed] Could you please provide more information on this, ideally a setup to reproduce the problem deterministically? Otherwise, I'm inclined to close this issue as we cannot reproduce it. > Connections not released when closed abruptly in the server side > ---------------------------------------------------------------- > > Key: TINKERPOP-2390 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2390 > Project: TinkerPop > Issue Type: Bug > Components: dotnet > Affects Versions: 3.4.7 > Environment: Tinkerpop 3.4.7 + Janusgraph 0.5.1 (optional opencypher > 1.0.0) > Reporter: Carlos > Priority: Major > > We have developed a WService to query a gremlin-server (JanusGraph 0.5.1) > using the .net driver. Using the opencypher plugin has allowed us to see a > behaviour where the server gets completely blocked after a timeout on the > server side. We thought this might be related to issue > https://issues.apache.org/jira/browse/TINKERPOP-2288, so we have moved our > driver version to the master one (3.4-dev, which includes the PR solving this > issue). However, when facing a timeout (server side always, it is the one > launching the exception), quite a lot of connections get stalled at > CLOSE_WAIT status, and the server becomes unusable. > I've been digging around other bugs and issues, and from what I've read, some > similar behaviour happened to CosmoDB (although it might be caused in that > situation due to the some connection leaks, in this case is the timeout). We > have traced down the problem to the driver itself after isolating all the > components involved (optimizing the cypher query results in a non-timeout > situation where everything is ok; forcing the timeout from pure gremlin > replicates the behaviour). > We have set up the connection pool params to 16 / 4096 (we are expecting > quite a high concurrency load). -- This message was sent by Atlassian Jira (v8.3.4#803005)