[ 
https://issues.apache.org/jira/browse/TINKERPOP-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stephen mallette closed TINKERPOP-2169.
---------------------------------------
       Resolution: Fixed
         Assignee: stephen mallette
    Fix Version/s: 3.4.1
                   3.3.6

> Responses exceeding maxContentLength cause subsequent queries to hang
> ---------------------------------------------------------------------
>
>                 Key: TINKERPOP-2169
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2169
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: driver
>    Affects Versions: 3.4.0
>         Environment: Client: EC2 Amazon Linux t3.medium
> Server: Amazon Neptune r4.2xlarge
>            Reporter: Yigit Kiran
>            Assignee: stephen mallette
>            Priority: Major
>             Fix For: 3.3.6, 3.4.1
>
>
> Gremlin Driver replaces connections on the channel when it receives 
> Exceptions that are instances of IOException or CodecException (including 
> CorruptedFrameException). When CorruptedFrameException is thrown because 
> response length is greater than the maxContentLength value (32kb by default), 
> driver thinks the host might be unavailable and tries to replace Connection.
> If Connection is shared among multiple requests (its pending queue is > 1), 
> other WSConnection goes stale after connection replacement, while keeping the 
> server executor threads busy.
> Keeping the exec threads busy for stale connections prevents server from 
> picking up new tasks for subsequent requests from the request queue. 
> Additionally since there is a new connection added in Client, it can accept 
> more requests and similar errors can lead to a build up in request queue. 
> When many concurrent requests gets into this situation server become 
> unresponsive to the new requests.
> h3. Steps to repro
>  
> 1. Have a gremlin server
> 2. Connect it using java driver with setting the maxContentLength pretty low, 
> i.e. using the config below:
>  
> {code:java}
> Cluster.Builder builder = Cluster.build();
>         builder.addContactPoint(endpoint[0]);
>         builder.port(8182);
>         builder.maxConnectionPoolSize(100);
>         builder.maxSimultaneousUsagePerConnection(100);
>         builder.maxInProcessPerConnection(50);
>         builder.maxContentLength(32); // <-- this is reduced from 32k 
>         builder.keepAliveInterval(0);
> {code}
>  
> 3. Issue concurrent requests using the cluster, where the response would be 
> greater than 32 bytes.
> h3. Ideas on a possible solution
> One possible solution to this is to not consider channel as dead when request 
> length exceeds maxContentFrame length. 
> {{}}
> {code:java}
> final class Connection {
>     ...
>     public ChannelPromise write(final RequestMessage requestMessage, final 
> CompletableFuture<ResultSet> future) {
>     ...
>         // FIX HERE: Do not consider CorruptedFrameException as 
> non-recoverable exception.
>         if ((t instanceof IOException || t instanceof CodecException) && (! 
> (t instanceof CorruptedFrameException))) {
>         ...
>         }
>     }
> }
> {code}
> Another fix could be the request can be deleted from the Connections' pending 
> request map, and if there are other pending requests on the connection, close 
> them before replacing the connection, or not replace the connection at all: 
>  
> {code:java}
> final class Connection {void replaceConnection(final Connection connection) {
>     ...
>     // FIX HERE: Do not replace connection if there are pending requests on 
> it. 
>     if (!connection.getPending().isEmpty()) {
>         return; // prevent replacing the connection while there are pending 
> requests.
>     }
>     considerNewConnection();
>     definitelyDestroyConnection(connection);
>     }
> }{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to