[ 
https://issues.apache.org/jira/browse/TINKERPOP-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17321093#comment-17321093
 ] 

Florian Hockmann commented on TINKERPOP-2390:
---------------------------------------------

I just tried to reproduce the scenario but I don't see anything wrong. Here is 
what I did:
 # Start the server with a {{gremlinPool}} of 1 as described above 
(_TinkerpopServer configured not to provide any concurrent service (i.e., all 
the queries were processed sequentially_).
 # Connect from Gremlin.Net (I used the version from current {{master}} and 
also tried it with the version from {{3.4-dev}}) with default settings 
({{PoolSize}} of 4 and {{MaxInProcessPerConnection}}: 32)
 # Send 10 requests with a custom evaluation timeout of 1 ms that simply sleep 
for 3 seconds.
 # Result:
 ## All requests get a {{ResponseException}} with a timeout on the server side.
 ## 4 connections in state {{ESTABLISHED}} on the server side.
 # Send 1 request to verify that both the driver and the server are still in a 
valid state. -> Receive the expected result.
 # Dispose the {{GremlinClient}} instance.
 # Result:
 ## All 4 connections in state {{TIME_WAIT}} on the server
 ## After 1 min: connections completely closed

The server is still responsive after this. The {{TIME_WAIT}} is expected from 
my limited knowledge about TCP as connections are not completely closed 
immediately in case a packet is received out of order. But they are closed 
after a timeout which seems to be one minute on my machine.

What I really don't understand here is why the server should close the 
connection just because one request ran into a timeout. That doesn't make much 
sense as multiple requests can be processed on the same connection. So, the 
connection shouldn't be affected by a failing request (failing here in the 
sense of timing out).

[~Bobed] Could you please provide more information on this, ideally a setup to 
reproduce the problem deterministically? Otherwise, I'm inclined to close this 
issue as we cannot reproduce it.

> Connections not released when closed abruptly in the server side
> ----------------------------------------------------------------
>
>                 Key: TINKERPOP-2390
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2390
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: dotnet
>    Affects Versions: 3.4.7
>         Environment: Tinkerpop 3.4.7 + Janusgraph 0.5.1 (optional opencypher 
> 1.0.0) 
>            Reporter: Carlos
>            Priority: Major
>
> We have developed a WService to query a gremlin-server (JanusGraph 0.5.1) 
> using the .net driver. Using the opencypher plugin has allowed us to see a 
> behaviour where the server gets completely blocked after a timeout on the 
> server side. We thought this might be related to issue 
> https://issues.apache.org/jira/browse/TINKERPOP-2288, so we have moved our 
> driver version to the master one (3.4-dev, which includes the PR solving this 
> issue). However, when facing a timeout (server side always, it is the one 
> launching the exception), quite a lot of connections get stalled at 
> CLOSE_WAIT status, and the server becomes unusable. 
> I've been digging around other bugs and issues, and from what I've read, some 
> similar behaviour happened to CosmoDB (although it might be caused in that 
> situation due to the some connection leaks, in this case is the timeout). We 
> have traced down the problem to the driver itself after isolating all the 
> components involved (optimizing the cypher query results in a non-timeout 
> situation where everything is ok; forcing the timeout from pure gremlin 
> replicates the behaviour). 
> We have set up the connection pool params to 16 / 4096 (we are expecting 
> quite a high concurrency load).  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to