[ 
https://issues.apache.org/jira/browse/TINKERPOP-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236873#comment-17236873
 ] 

Carlos commented on TINKERPOP-2390:
-----------------------------------

On the server side, the only exception we saw was a script timeout exception 
(we had set up the script timeout to 30s, and the query was longer). We built a 
REST layer on top of the Gremlin.Net driver and the behaviour we witnessed was 
that the connections stucked in TIME_WAIT instead of CLOSE_WAIT (the local side 
wasn't aware of the fact that the server had closed the connection ... this 
lead to a driver completely blocked when several long queries gave the timeout 
exception). We had no other choice than restarting the VM (we tried deploying 
pure VMs and pods). 

Regarding the new version, I'm afraid I cannot answer for sure as we ended up 
moving to the Java driver. I recall doing some tests with the latest version 
just before moving to the other driver (we used the code from the branch 
including that fix) and it still happen, but I cannot state that we tested 
3.4.8 version for sure. The behaviour is quite easy to reproduce though, just 
send several long queries to the driver (setting up a short script timeout on 
the server side) and check the nestat of the client machine (the setup should 
be a REST fassade, in order to keep the client alive). 

We discarded a problem at Janusgraph side as with the Java driver, we didn't 
have this problem (I assume that the Gremlin Driver abstracts from the actual 
underlying provider). 

Best, 

> Connections not released when closed abruptly in the server side
> ----------------------------------------------------------------
>
>                 Key: TINKERPOP-2390
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2390
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: dotnet
>    Affects Versions: 3.4.7
>         Environment: Tinkerpop 3.4.7 + Janusgraph 0.5.1 (optional opencypher 
> 1.0.0) 
>            Reporter: Carlos
>            Priority: Major
>
> We have developed a WService to query a gremlin-server (JanusGraph 0.5.1) 
> using the .net driver. Using the opencypher plugin has allowed us to see a 
> behaviour where the server gets completely blocked after a timeout on the 
> server side. We thought this might be related to issue 
> https://issues.apache.org/jira/browse/TINKERPOP-2288, so we have moved our 
> driver version to the master one (3.4-dev, which includes the PR solving this 
> issue). However, when facing a timeout (server side always, it is the one 
> launching the exception), quite a lot of connections get stalled at 
> CLOSE_WAIT status, and the server becomes unusable. 
> I've been digging around other bugs and issues, and from what I've read, some 
> similar behaviour happened to CosmoDB (although it might be caused in that 
> situation due to the some connection leaks, in this case is the timeout). We 
> have traced down the problem to the driver itself after isolating all the 
> components involved (optimizing the cypher query results in a non-timeout 
> situation where everything is ok; forcing the timeout from pure gremlin 
> replicates the behaviour). 
> We have set up the connection pool params to 16 / 4096 (we are expecting 
> quite a high concurrency load).  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to