Re: [PR] optimize: Optimize RocksDB batch query performance [hugegraph]

via GitHub Mon, 06 Apr 2026 08:35:46 -0700


Copilot commented on code in PR #2982:
URL: https://github.com/apache/hugegraph/pull/2982#discussion_r3040241443



##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/tx/GraphTransaction.java:
##########
@@ -799,17 +800,30 @@ protected Iterator<Vertex> queryVerticesByIds(Object[] 
vertexIds, boolean adjace
                 // Found from local tx
                 vertices.put(vertex.id(), vertex);
             } else {
-                // Prepare to query from backend store
-                query.query(id);
+                // store the IDs queried from backend
+                backendIds.add(id);
             }

Review Comment:
   This change allocates `backendIds` as a second full list of ids (in addition 
to `ids`), which doubles memory usage for large `g.V(id1,id2,...)` calls (up to 
`Query.DEFAULT_CAPACITY`). Consider batching/issuing backend IdQuery requests 
incrementally during the main loop (flush when reaching batch size) to avoid 
retaining all backend ids at once.



##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/tx/GraphTransaction.java:
##########
@@ -799,17 +800,30 @@ protected Iterator<Vertex> queryVerticesByIds(Object[] 
vertexIds, boolean adjace
                 // Found from local tx
                 vertices.put(vertex.id(), vertex);
             } else {
-                // Prepare to query from backend store
-                query.query(id);
+                // store the IDs queried from backend
+                backendIds.add(id);
             }
             ids.add(id);
         }
 
-        if (!query.empty()) {
+        if (!backendIds.isEmpty()) {
             // Query from backend store
-            query.mustSortByInput(false);
-            Iterator<HugeVertex> it = this.queryVerticesFromBackend(query);
-            QueryResults.fillMap(it, vertices);
+            final int batch = this.batchSize > 0 ? this.batchSize : 
backendIds.size();
+            for (int i = 0; i < backendIds.size(); i += batch) {
+                int end = Math.min(i + batch, backendIds.size());
+                IdQuery query = new IdQuery(type);
+                for (int j = i; j < end; j++) {
+                    Id id = backendIds.get(j);
+                    query.query(id);
+                }

Review Comment:
   With batching, duplicated ids that fall into different batches will trigger 
repeated backend reads/RPCs for the same id. You can keep the output behavior 
(duplicates preserved via `ids`) while deduplicating backend fetches (e.g., 
track a seen-set for backendIds or build per-batch unique ids) to avoid 
redundant backend queries.



##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/tx/GraphTransaction.java:
##########
@@ -799,17 +800,30 @@ protected Iterator<Vertex> queryVerticesByIds(Object[] 
vertexIds, boolean adjace
                 // Found from local tx
                 vertices.put(vertex.id(), vertex);
             } else {
-                // Prepare to query from backend store
-                query.query(id);
+                // store the IDs queried from backend
+                backendIds.add(id);
             }
             ids.add(id);
         }
 
-        if (!query.empty()) {
+        if (!backendIds.isEmpty()) {
             // Query from backend store
-            query.mustSortByInput(false);
-            Iterator<HugeVertex> it = this.queryVerticesFromBackend(query);
-            QueryResults.fillMap(it, vertices);
+            final int batch = this.batchSize > 0 ? this.batchSize : 
backendIds.size();
+            for (int i = 0; i < backendIds.size(); i += batch) {
+                int end = Math.min(i + batch, backendIds.size());
+                IdQuery query = new IdQuery(type);

Review Comment:
   The PR title says this is a RocksDB optimization, but the functional change 
here is backend-agnostic batching in GraphTransaction (affecting all stores, 
especially RPC-based ones). Please align the PR title (or add RocksDB-specific 
changes) so the intent matches what’s actually being modified.



##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/tx/GraphTransaction.java:
##########
@@ -799,17 +800,30 @@ protected Iterator<Vertex> queryVerticesByIds(Object[] 
vertexIds, boolean adjace
                 // Found from local tx
                 vertices.put(vertex.id(), vertex);
             } else {
-                // Prepare to query from backend store
-                query.query(id);
+                // store the IDs queried from backend
+                backendIds.add(id);
             }
             ids.add(id);
         }
 
-        if (!query.empty()) {
+        if (!backendIds.isEmpty()) {
             // Query from backend store
-            query.mustSortByInput(false);
-            Iterator<HugeVertex> it = this.queryVerticesFromBackend(query);
-            QueryResults.fillMap(it, vertices);
+            final int batch = this.batchSize > 0 ? this.batchSize : 
backendIds.size();

Review Comment:
   `final int batch = this.batchSize > 0 ? this.batchSize : backendIds.size();` 
is effectively always `this.batchSize` because `query.batch_size` is configured 
with a minimum of 1 (see CoreOptions.QUERY_BATCH_SIZE). Removing the dead 
fallback branch will simplify the code.
   ```suggestion
               final int batch = this.batchSize;
   ```



##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/tx/GraphTransaction.java:
##########
@@ -799,17 +800,30 @@ protected Iterator<Vertex> queryVerticesByIds(Object[] 
vertexIds, boolean adjace
                 // Found from local tx
                 vertices.put(vertex.id(), vertex);
             } else {
-                // Prepare to query from backend store
-                query.query(id);
+                // store the IDs queried from backend
+                backendIds.add(id);
             }
             ids.add(id);
         }
 
-        if (!query.empty()) {
+        if (!backendIds.isEmpty()) {
             // Query from backend store
-            query.mustSortByInput(false);
-            Iterator<HugeVertex> it = this.queryVerticesFromBackend(query);
-            QueryResults.fillMap(it, vertices);
+            final int batch = this.batchSize > 0 ? this.batchSize : 
backendIds.size();
+            for (int i = 0; i < backendIds.size(); i += batch) {
+                int end = Math.min(i + batch, backendIds.size());
+                IdQuery query = new IdQuery(type);
+                for (int j = i; j < end; j++) {
+                    Id id = backendIds.get(j);
+                    query.query(id);
+                }
+                // Single batch capacity check
+                Query.checkForceCapacity(query.idsSize());
+
+                // Query from backend store
+                query.mustSortByInput(false);
+                Iterator<HugeVertex> it = this.queryVerticesFromBackend(query);
+                QueryResults.fillMap(it, vertices);
+            }

Review Comment:
   The new multi-batch path isn’t covered by tests. Please add a 
unit/integration test that exercises `queryVerticesByIds()` with 
`vertexIds.length > query.batch_size`, including (1) duplicates across a batch 
boundary and (2) mixed local-tx + backend ids, to ensure results and 
NotFoundException behavior remain unchanged.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] optimize: Optimize RocksDB batch query performance [hugegraph]

Reply via email to