[jira] [Commented] (TINKERPOP-2679) Update JavaScript driver to support processing messages as a stream

ASF GitHub Bot (Jira) Thu, 03 Mar 2022 01:10:34 -0800


    [ 
https://issues.apache.org/jira/browse/TINKERPOP-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500601#comment-17500601
 ]


ASF GitHub Bot commented on TINKERPOP-2679:
-------------------------------------------

jorgebay commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r818437551



##########
File path: 
gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/client.js
##########
@@ -46,13 +45,13 @@ class Client {
    */
   constructor(url, options = {}) {
     this._options = options;
-    if (this._options.processor === 'session') {
+    if (this._options.processor === "session") {

Review comment:
       We don't have a linter for the project but we should follow the rest of 
the other files as conventions.
   
   There are several line changes related to strings and overriding existing 
formatting, it's likely that it's your IDE but in any case, we should avoid 
them to avoid history noise.
   
   In general, you can check out airbnb's style guide: 
https://github.com/airbnb/javascript#strings
   
   Would it be possible to revert the unnecessary style changes?

##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1733,6 +1733,59 @@ IMPORTANT: The preferred method for setting a 
per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. 
Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script 
for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin 
server and receives messages according to the `batchSize` parameter on the per 
request settings or the `resultIterationBatchSize` value configured for the 
Gremlin server. When submitting scripts the default behavior is to wait for the 
entire result set to be returned from a query before allowing any processing on 
the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory 
management to process each chunk of data as it is returned from the gremlin 
server. The Gremlin JavaScript driver can return a readable stream instead of 
waiting for the entire result set to be loaded.
+
+[source,javascript]
+----
+
+const readable =  client.stream("g.V()", {}, { batchSize: 25 });
+
+readable.on('data', (data) => {
+  console.log(data.toArray()); // 25 vertices
+})
+
+readable.on('error', (error) => {
+  console.log(error); // errors returned from gremlin server
+})
+
+readable.on('end', () => {
+  console.log('query complete'); // when the end event is received then all 
the results have been processed
+})
+----
+
+If you are using NodeJS >= 10.0, you can asynchronously iterate readable 
streams:
+
+
+[source,javascript]
+----
+
+const readable = client.stream("g.V()", {}, { batchSize: 25 });
+
+try {
+  for await (const result of readable) {
+      console.log('data', result.toArray()); // 25 vertices

Review comment:
       NIT: spacing :)

##########
File path: 
gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/client.js
##########
@@ -73,41 +72,118 @@ class Client {
     return this._connection.isOpen;
   }
 
+  /**
+   * Configuration specific to the current request.
+   * @typedef {Object} RequestOptions
+   * @property {String} requestId - User specified request identifier which 
must be a UUID.
+   * @property {Number} batchSize - Indicates whether the Power component is 
present.
+   * @property {String} userAgent - The size in which the result of a request 
is to be "batched" back to the client
+   * @property {Number} evaluationTimeout - The timeout for the evaluation of 
the request.
+   */
+
   /**
    * Send a request to the Gremlin Server, can send a script or bytecode steps.
    * @param {Bytecode|string} message The bytecode or script to send
    * @param {Object} [bindings] The script bindings, if any.
-   * @param {Object} [requestOptions] Configuration specific to the current 
request.
-   * @param {String} [requestOptions.requestId] User specified request 
identifier which must be a UUID.
-   * @param {Number} [requestOptions.batchSize] The size in which the result 
of a request is to be "batched" back to the client
-   * @param {String} [requestOptions.userAgent] A custom string that specifies 
to the server where the request came from.
-   * @param {Number} [requestOptions.evaluationTimeout] The timeout for the 
evaluation of the request.
+   * @param {RequestOptions} [requestOptions] Configuration specific to the 
current request.
    * @returns {Promise}
    */
   submit(message, bindings, requestOptions) {
-    const requestIdOverride = requestOptions && requestOptions.requestId
-    if (requestIdOverride) delete requestOptions['requestId'];
+    const requestIdOverride = requestOptions && requestOptions.requestId;
+    if (requestIdOverride) delete requestOptions["requestId"];
+
+    const args = Object.assign(
+      {
+        gremlin: message,
+        aliases: { g: this._options.traversalSource || "g" },
+      },
+      requestOptions
+    );
+
+    if (this._options.session && this._options.processor === "session") {
+      args["session"] = this._options.session;
+    }
+
+    if (message instanceof Bytecode) {
+      if (this._options.session && this._options.processor === "session") {
+        return this._connection.submit(
+          "session",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
+      } else {
+        return this._connection.submit(
+          "traversal",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
+      }
+    } else if (typeof message === "string") {
+      args["bindings"] = bindings;
+      args["language"] = "gremlin-groovy";
+      args["accept"] = this._connection.mimeType;
+      return this._connection.submit(
+        this._options.processor || "",
+        "eval",
+        args,
+        requestIdOverride
+      );
+    } else {
+      throw new TypeError("message must be of type Bytecode or string");
+    }
+  }
+
+  /**
+   * Send a request to the Gremlin Server and receive a stream for the 
results, can send a script or bytecode steps.
+   * @param {Bytecode|string} message The bytecode or script to send
+   * @param {Object} [bindings] The script bindings, if any.
+   * @param {RequestOptions} [requestOptions] Configuration specific to the 
current request.
+   * @returns {ReadableStream}
+   */
+  stream(message, bindings, requestOptions) {
+    const requestIdOverride = requestOptions && requestOptions.requestId;
+    if (requestIdOverride) delete requestOptions["requestId"];
 
-    const args = Object.assign({
-      gremlin: message,
-      aliases: { 'g': this._options.traversalSource || 'g' }
-    }, requestOptions)
+    const args = Object.assign(
+      {
+        gremlin: message,
+        aliases: { g: this._options.traversalSource || "g" },
+      },
+      requestOptions
+    );
 
-    if (this._options.session && this._options.processor === 'session') {
-      args['session'] = this._options.session;
+    if (this._options.session && this._options.processor === "session") {
+      args["session"] = this._options.session;
     }
 
     if (message instanceof Bytecode) {
-      if (this._options.session && this._options.processor === 'session') {
-        return this._connection.submit('session', 'bytecode', args, 
requestIdOverride);
+      if (this._options.session && this._options.processor === "session") {
+        return this._connection.stream(
+          "session",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
       } else {
-        return this._connection.submit('traversal', 'bytecode', args, 
requestIdOverride);
+        return this._connection.stream(
+          "traversal",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
       }
-    } else if (typeof message === 'string') {
-      args['bindings'] = bindings;
-      args['language'] = 'gremlin-groovy';
-      args['accept'] = this._connection.mimeType;
-      return this._connection.submit(this._options.processor || '','eval', 
args, requestIdOverride);
+    } else if (typeof message === "string") {
+      args["bindings"] = bindings;
+      args["language"] = "gremlin-groovy";
+      args["accept"] = this._connection.mimeType;
+      return this._connection.stream(
+        this._options.processor || "",
+        "eval",
+        args,
+        requestIdOverride
+      );

Review comment:
       this code block is identical to the previous one, except from the 
underlying method that it's called.
   
   Would it be possible to extract to something like `executeOnConnection()` or 
something like that, where the connection method is a parameter as well?
   
   For example, it will have:
   ```javascript
   return this._connection.[methodName](/* ... */)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Update JavaScript driver to support processing messages as a stream
> -------------------------------------------------------------------
>
>                 Key: TINKERPOP-2679
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2679
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: javascript
>    Affects Versions: 3.5.1
>            Reporter: Tom Kolanko
>            Priority: Minor
>
> The JavaScript driver's 
> [_handleMessage|https://github.com/apache/tinkerpop/blob/d4bd5cc5a228fc22442101ccb6a9751653900d32/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/connection.js#L249]
>  receives messages from the gremlin server and stores each message in an 
> object associated with the handler for the specific request. Currently, the 
> driver waits until all the data is available from the gremlin server before 
> allowing further processing of it.
> However, this can lead to cases where a lot of memory is required to hold 
> onto the results before any processing can take place. If we had the abilty 
> to process results as they come in from the gremlin server we could reduce 
> memory in some cases
> If you are open to it I would like to submit a PR where {{submit}} can take 
> an optional callback which is run on each set of data returned from the 
> gremlin server, rather than waiting for the entire result set.
> The following examples assume that you have 100 vertices in your graph.
> current behaviour:
> {code:javascript}
> const result = await client.submit("g.V()")
> console.log(result.toArray()) // 100 - all the vertices in your graph
> {code}
> proposed addition
> {code:javascript}
> await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {
>   console.log(data.toArray().length) // 25 - this callback will be called 4 
> times (100 / 25 = 4)
> })
> {code}
> If the optional callback is not provided then the default behaviour is 
> unchanged
> I have the changes running locally and the overall performance is unchanged, 
> queries run about the same as they used to, however, for some specific 
> queries memory usage has dropped considerably. 
> With the process-on-message strategy the memory usage will be related to how 
> large the {{batchSize}} is rather than the final result set. Using the 
> default of 64 and testing some specific cases we have I can get the memory to 
> go from 1.2gb to 10mb.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (TINKERPOP-2679) Update JavaScript driver to support processing messages as a stream

Reply via email to