Copilot commented on code in PR #2982:
URL: https://github.com/apache/hugegraph/pull/2982#discussion_r3040241443
##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/tx/GraphTransaction.java:
##########
@@ -799,17 +800,30 @@ protected Iterator<Vertex> queryVerticesByIds(Object[]
vertexIds, boolean adjace
// Found from local tx
vertices.put(vertex.id(), vertex);
} else {
- // Prepare to query from backend store
- query.query(id);
+ // store the IDs queried from backend
+ backendIds.add(id);
}
Review Comment:
This change allocates `backendIds` as a second full list of ids (in addition
to `ids`), which doubles memory usage for large `g.V(id1,id2,...)` calls (up to
`Query.DEFAULT_CAPACITY`). Consider batching/issuing backend IdQuery requests
incrementally during the main loop (flush when reaching batch size) to avoid
retaining all backend ids at once.
##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/tx/GraphTransaction.java:
##########
@@ -799,17 +800,30 @@ protected Iterator<Vertex> queryVerticesByIds(Object[]
vertexIds, boolean adjace
// Found from local tx
vertices.put(vertex.id(), vertex);
} else {
- // Prepare to query from backend store
- query.query(id);
+ // store the IDs queried from backend
+ backendIds.add(id);
}
ids.add(id);
}
- if (!query.empty()) {
+ if (!backendIds.isEmpty()) {
// Query from backend store
- query.mustSortByInput(false);
- Iterator<HugeVertex> it = this.queryVerticesFromBackend(query);
- QueryResults.fillMap(it, vertices);
+ final int batch = this.batchSize > 0 ? this.batchSize :
backendIds.size();
+ for (int i = 0; i < backendIds.size(); i += batch) {
+ int end = Math.min(i + batch, backendIds.size());
+ IdQuery query = new IdQuery(type);
+ for (int j = i; j < end; j++) {
+ Id id = backendIds.get(j);
+ query.query(id);
+ }
Review Comment:
With batching, duplicated ids that fall into different batches will trigger
repeated backend reads/RPCs for the same id. You can keep the output behavior
(duplicates preserved via `ids`) while deduplicating backend fetches (e.g.,
track a seen-set for backendIds or build per-batch unique ids) to avoid
redundant backend queries.
##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/tx/GraphTransaction.java:
##########
@@ -799,17 +800,30 @@ protected Iterator<Vertex> queryVerticesByIds(Object[]
vertexIds, boolean adjace
// Found from local tx
vertices.put(vertex.id(), vertex);
} else {
- // Prepare to query from backend store
- query.query(id);
+ // store the IDs queried from backend
+ backendIds.add(id);
}
ids.add(id);
}
- if (!query.empty()) {
+ if (!backendIds.isEmpty()) {
// Query from backend store
- query.mustSortByInput(false);
- Iterator<HugeVertex> it = this.queryVerticesFromBackend(query);
- QueryResults.fillMap(it, vertices);
+ final int batch = this.batchSize > 0 ? this.batchSize :
backendIds.size();
+ for (int i = 0; i < backendIds.size(); i += batch) {
+ int end = Math.min(i + batch, backendIds.size());
+ IdQuery query = new IdQuery(type);
Review Comment:
The PR title says this is a RocksDB optimization, but the functional change
here is backend-agnostic batching in GraphTransaction (affecting all stores,
especially RPC-based ones). Please align the PR title (or add RocksDB-specific
changes) so the intent matches what’s actually being modified.
##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/tx/GraphTransaction.java:
##########
@@ -799,17 +800,30 @@ protected Iterator<Vertex> queryVerticesByIds(Object[]
vertexIds, boolean adjace
// Found from local tx
vertices.put(vertex.id(), vertex);
} else {
- // Prepare to query from backend store
- query.query(id);
+ // store the IDs queried from backend
+ backendIds.add(id);
}
ids.add(id);
}
- if (!query.empty()) {
+ if (!backendIds.isEmpty()) {
// Query from backend store
- query.mustSortByInput(false);
- Iterator<HugeVertex> it = this.queryVerticesFromBackend(query);
- QueryResults.fillMap(it, vertices);
+ final int batch = this.batchSize > 0 ? this.batchSize :
backendIds.size();
Review Comment:
`final int batch = this.batchSize > 0 ? this.batchSize : backendIds.size();`
is effectively always `this.batchSize` because `query.batch_size` is configured
with a minimum of 1 (see CoreOptions.QUERY_BATCH_SIZE). Removing the dead
fallback branch will simplify the code.
```suggestion
final int batch = this.batchSize;
```
##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/tx/GraphTransaction.java:
##########
@@ -799,17 +800,30 @@ protected Iterator<Vertex> queryVerticesByIds(Object[]
vertexIds, boolean adjace
// Found from local tx
vertices.put(vertex.id(), vertex);
} else {
- // Prepare to query from backend store
- query.query(id);
+ // store the IDs queried from backend
+ backendIds.add(id);
}
ids.add(id);
}
- if (!query.empty()) {
+ if (!backendIds.isEmpty()) {
// Query from backend store
- query.mustSortByInput(false);
- Iterator<HugeVertex> it = this.queryVerticesFromBackend(query);
- QueryResults.fillMap(it, vertices);
+ final int batch = this.batchSize > 0 ? this.batchSize :
backendIds.size();
+ for (int i = 0; i < backendIds.size(); i += batch) {
+ int end = Math.min(i + batch, backendIds.size());
+ IdQuery query = new IdQuery(type);
+ for (int j = i; j < end; j++) {
+ Id id = backendIds.get(j);
+ query.query(id);
+ }
+ // Single batch capacity check
+ Query.checkForceCapacity(query.idsSize());
+
+ // Query from backend store
+ query.mustSortByInput(false);
+ Iterator<HugeVertex> it = this.queryVerticesFromBackend(query);
+ QueryResults.fillMap(it, vertices);
+ }
Review Comment:
The new multi-batch path isn’t covered by tests. Please add a
unit/integration test that exercises `queryVerticesByIds()` with
`vertexIds.length > query.batch_size`, including (1) duplicates across a batch
boundary and (2) mixed local-tx + backend ids, to ensure results and
NotFoundException behavior remain unchanged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]