[
https://issues.apache.org/jira/browse/TINKERPOP-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830910#comment-16830910
]
ASF GitHub Bot commented on TINKERPOP-2205:
-------------------------------------------
divijvaidya commented on pull request #1105: TINKERPOP-2205 Change connection
management to single request per channel
URL: https://github.com/apache/tinkerpop/pull/1105
https://issues.apache.org/jira/browse/TINKERPOP-2205
The code in this pull request changes the server interaction mechanism of
the Gremlin open source Java client. The new code addresses problems and
shortcomings discussed in the linked conversation
[[1]](https://lists.apache.org/thread.html/77728cb77d4eab90f15680595e653ffc6055b74db29cbd4dcd5f0339@%3Cdev.tinkerpop.apache.org%3E).
More specifically, the problems addressed are as follows:
1. Difficulty in configuring the client for optimum performance.
2. Undocumented dependency of configuration parameters on each other.
3. A bad request can impact other requests on the same channel.
4. Host is marked as dead even if it is busy serving requests.
5. No way to free up server resources if the client has stopped consuming
results.
6. No differentiation between retriable and non-retriable exceptions from
the application code.
7. Keep alive is only sent when a query is executing, which means that a
connection open for a very long time with no query being sent will be closed by
the server.
8. Race condition if the server response reaches before result queue has
been registered.
9. Unpredictable behaviour if the server sends an exception followed by a
genuine response for the same request.
10. A concurrent hash map (tracking pending requests) is a point of
contention amongst threads.
### Changes
1. ResultSet can be closed.
* This allows the client to tell the server to relinquish resources
associated with this request.
2. Single request per connection. No channel multiplexing.
* Impact of a rogue response (such as one which causes IOException
exceeding content length) does not impact the rest of the in-flight requests.
* Each request has its own bandwidth.
3. Removed custom keep alive logic and replaced with Netty IdleState
handler.
* Makes the client more robust
4. Deprecated InProcess and SimultaneousUsage configuration parameters.
* Now the customers would have to configure only a single parameter for
setting concurrency of requests.
5. Throw different exceptions to the application code which makes it easy to
determine what can be retried and what not.
6. Handle errors gracefully during WebSocket handshake.
* Makes the client robust
7. Close the websocket channel gracefully (with a close frame).
* Server closes the channel gracefully on receiving the close frame.
8. Use EPoll instead of Nio whenever possible.
* Poll provides better performance on Linux platforms
9. Run chooseConnection in an async manner using executors threads.
* Increases thread utilization. In general a lot of effort has been made
to improve thread utilization.
10. Make client resilient to multiple response from the server for the same
request.
11. Client operations do not rely on the UUID of the request provided by the
server.
### Backward compatibility with 3.4.x/3.3.x
**Application layer code** - This new client is fully backward compatible
and requires no change in the application layer code. The only change required
will be if the application layer code is relying on certain types of exceptions
thrown by the client.
**Channelizer** - Although the channelizer interface hasn’t changed, custom
implementations of the channelizer will have to change their code to work with
the new client.
### Limitations
1. A client generating high TPS from a single machine will have to modify
the OS setting for max number of open files, since each connection corresponds
to a single file in linux OS.
### Benchmarks
Benchmark code will be shared soon in this PR and results will be updated
here. During preliminary testing, there was no difference in performance. This
is because channels are being re-used and the additional overhead is only at
the bootstrap when we do more WebSocket handshakes (due to more connections)
than older code.
### Testing
1. Added a new test suite.
2. All existing tests pass.
*
gremlin-driver: mvn clean install -DskipIntegrationTests=false
* gremlin-server: mvn clean install -DskipIntegrationTests=false
### Post merge work
1. Write a document describing how the client works.
2. Add examples of efficient usage of client.
3. Update change log.
4. Update documentation.
### Future work
1. Add a default retry strategy for timeouts while trying to obtain a
connection.
2. Add a strategy to remove a fishy host from the load balancer (without
impacting existing requests).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Use one connection per request for Java client
> ----------------------------------------------
>
> Key: TINKERPOP-2205
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2205
> Project: TinkerPop
> Issue Type: Improvement
> Components: driver
> Reporter: Divij Vaidya
> Priority: Major
>
> This issue is a tracking item for the conversation in the mailing list
> [[1]|https://lists.apache.org/thread.html/77728cb77d4eab90f15680595e653ffc6055b74db29cbd4dcd5f0339@%3Cdev.tinkerpop.apache.org%3E]
> which highlights multiple problems and shortcomings in the existing Java
> client and proposes a design change in the client connection pooling to
> address the same. More specifically, the problems addressed are as follows:
> # Difficulty in configuring the client for optimum performance.
> # Undocumented dependency of configuration parameters on each other.
> # A bad request can impact other requests on the same channel.
> # Host is marked as dead even if it is busy serving requests.
> # No way to free up server resources if the client has stopped consuming
> results.
> # No differentiation between retriable and non-retriable exceptions from the
> application code.
> # Keep alive is only sent when a query is executing, which means that a
> connection open for a very long time with no query being sent will be closed
> by the server.
> # Race condition if the server response reaches before result queue has been
> registered.
> # Unpredictable behaviour if the server sends an exception followed by a
> genuine response for the same request.
> # A concurrent hash map (tracking pending requests) is a point of contention
> amongst threads.
> [1]https://lists.apache.org/thread.html/77728cb77d4eab90f15680595e653ffc6055b74db29cbd4dcd5f0339@%3Cdev.tinkerpop.apache.org%3E
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)