This is an automated email from the ASF dual-hosted git repository.
lidavidm pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 7bfe02db04 GH-41573: [Java] VectorSchemaRoot uses inefficient stream
to copy fieldVectors (#41574)
7bfe02db04 is described below
commit 7bfe02db04e34fc1ab6df6f647a76899e0c654db
Author: David Schlosnagle <[email protected]>
AuthorDate: Wed May 8 19:46:15 2024 -0400
GH-41573: [Java] VectorSchemaRoot uses inefficient stream to copy
fieldVectors (#41574)
### Rationale for this change
While reviewing allocation profiling of an Arrow intensive application, I
noticed significant allocations due to `ArrayList#grow()` originating from
`org.apache.arrow.vector.VectorSchemaRoot#getFieldVectors()`. The
`org.apache.arrow.vector.VectorSchemaRoot#getFieldVectors()` method uses an
inefficient `fieldVectors.stream().collect(Collectors.toList())` to create a
list copy, leading to reallocations as the target list is collected. This could
be replaced with a more efficent `new Arr [...]
### What changes are included in this PR?
* Use `Collections.unmodifiableList(List)` to return unmodifiable list view
of `fieldVectors` from `getFieldVectors()`
* Pre-size the `fieldVectors` `ArrayList` in static factory
`VectorSchemaRoot#create(Schema, BufferAllocator)`
* `VectorSchemaRoot#setRowCount(int)` iterates over instance `fieldVectors`
instead of copied list (similar to existing `allocateNew()`, `clear()`,
`contentToTSVString()`).
### Are these changes tested?
These changes are covered by existing unit and integration tests.
### Are there any user-facing changes?
No
* GitHub Issue: #41573
Authored-by: David Schlosnagle <[email protected]>
Signed-off-by: David Li <[email protected]>
---
.../src/main/java/org/apache/arrow/vector/VectorSchemaRoot.java | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git
a/java/vector/src/main/java/org/apache/arrow/vector/VectorSchemaRoot.java
b/java/vector/src/main/java/org/apache/arrow/vector/VectorSchemaRoot.java
index 8768a90c80..9a92ce5060 100644
--- a/java/vector/src/main/java/org/apache/arrow/vector/VectorSchemaRoot.java
+++ b/java/vector/src/main/java/org/apache/arrow/vector/VectorSchemaRoot.java
@@ -19,6 +19,7 @@ package org.apache.arrow.vector;
import java.util.ArrayList;
import java.util.Arrays;
+import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
@@ -121,7 +122,7 @@ public class VectorSchemaRoot implements AutoCloseable {
* Creates a new set of empty vectors corresponding to the given schema.
*/
public static VectorSchemaRoot create(Schema schema, BufferAllocator
allocator) {
- List<FieldVector> fieldVectors = new ArrayList<>();
+ List<FieldVector> fieldVectors = new
ArrayList<>(schema.getFields().size());
for (Field field : schema.getFields()) {
FieldVector vector = field.createVector(allocator);
fieldVectors.add(vector);
@@ -160,7 +161,7 @@ public class VectorSchemaRoot implements AutoCloseable {
}
public List<FieldVector> getFieldVectors() {
- return fieldVectors.stream().collect(Collectors.toList());
+ return Collections.unmodifiableList(fieldVectors);
}
/**
@@ -236,7 +237,7 @@ public class VectorSchemaRoot implements AutoCloseable {
*/
public void setRowCount(int rowCount) {
this.rowCount = rowCount;
- for (FieldVector v : getFieldVectors()) {
+ for (FieldVector v : fieldVectors) {
v.setValueCount(rowCount);
}
}