[
https://issues.apache.org/jira/browse/TINKERPOP-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478454#comment-17478454
]
ASF GitHub Bot commented on TINKERPOP-2679:
-------------------------------------------
jorgebay commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r787504326
##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a
per-request timeout for scripts is
with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script.
Scripts with multiple traversals and multiple
timeouts will be interpreted as a sum of all timeouts identified in the script
for that request.
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin
server and receives messages according to the `batchSize` parameter on the per
request settings or the `resultIterationBatchSize` value configured for the
Gremlin server. When submitting scripts the default behavior is to wait for the
entire result set to be returned from a query before allowing any processing on
the result set.
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory
management to process each chunk of data as it is returned from the gremlin
server. The Gremlin JavaScript driver can accept an optional callback to run on
each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {
Review comment:
I think mixing promises and callbacks in the same API can be confusing
and prone to misuse.
I think we should expose a different method for "streaming" or grouping into
smaller sets of results, for example with async iterables:
```javascript
const stream = client.stream(traversal);
for await (const item of stream) {
statement
}
```
or regular callbacks
```javascript
client.forEach(traversal, item => {
// called for each item
}, err => {
// called at the end or when there's an error
});
```
##########
File path:
gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/connection.js
##########
@@ -290,13 +296,31 @@ class Connection extends EventEmitter {
}
switch (response.status.code) {
case responseStatusCode.noContent:
+ if (this._onDataMessageHandlers[response.requestId]) {
+ this._onDataMessageHandlers[response.requestId](
+ new ResultSet(utils.emptyArray, response.status.attributes)
+ );
+ }
this._clearHandler(response.requestId);
return handler.callback(null, new ResultSet(utils.emptyArray,
response.status.attributes));
case responseStatusCode.partialContent:
- handler.result = handler.result || [];
- handler.result.push.apply(handler.result, response.result.data);
+ if (this._onDataMessageHandlers[response.requestId]) {
+ this._onDataMessageHandlers[response.requestId](
+ new ResultSet(response.result.data, response.status.attributes)
Review comment:
Maybe instead of having 4 resultsets, the user wants to access each
individual item.
##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a
per-request timeout for scripts is
with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script.
Scripts with multiple traversals and multiple
timeouts will be interpreted as a sum of all timeouts identified in the script
for that request.
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin
server and receives messages according to the `batchSize` parameter on the per
request settings or the `resultIterationBatchSize` value configured for the
Gremlin server. When submitting scripts the default behavior is to wait for the
entire result set to be returned from a query before allowing any processing on
the result set.
Review comment:
Nice way to introduce the change 👍
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Update JavaScript driver to support processing messages as a stream
> -------------------------------------------------------------------
>
> Key: TINKERPOP-2679
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2679
> Project: TinkerPop
> Issue Type: Improvement
> Components: javascript
> Affects Versions: 3.5.1
> Reporter: Tom Kolanko
> Priority: Minor
>
> The JavaScript driver's
> [_handleMessage|https://github.com/apache/tinkerpop/blob/d4bd5cc5a228fc22442101ccb6a9751653900d32/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/connection.js#L249]
> receives messages from the gremlin server and stores each message in an
> object associated with the handler for the specific request. Currently, the
> driver waits until all the data is available from the gremlin server before
> allowing further processing of it.
> However, this can lead to cases where a lot of memory is required to hold
> onto the results before any processing can take place. If we had the abilty
> to process results as they come in from the gremlin server we could reduce
> memory in some cases
> If you are open to it I would like to submit a PR where {{submit}} can take
> an optional callback which is run on each set of data returned from the
> gremlin server, rather than waiting for the entire result set.
> The following examples assume that you have 100 vertices in your graph.
> current behaviour:
> {code:javascript}
> const result = await client.submit("g.V()")
> console.log(result.toArray()) // 100 - all the vertices in your graph
> {code}
> proposed addition
> {code:javascript}
> await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {
> console.log(data.toArray().length) // 25 - this callback will be called 4
> times (100 / 25 = 4)
> })
> {code}
> If the optional callback is not provided then the default behaviour is
> unchanged
> I have the changes running locally and the overall performance is unchanged,
> queries run about the same as they used to, however, for some specific
> queries memory usage has dropped considerably.
> With the process-on-message strategy the memory usage will be related to how
> large the {{batchSize}} is rather than the final result set. Using the
> default of 64 and testing some specific cases we have I can get the memory to
> go from 1.2gb to 10mb.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)