This is an automated email from the ASF dual-hosted git repository.

lidavidm pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new 7bfe02db04 GH-41573: [Java] VectorSchemaRoot uses inefficient stream 
to copy fieldVectors (#41574)
7bfe02db04 is described below

commit 7bfe02db04e34fc1ab6df6f647a76899e0c654db
Author: David Schlosnagle <[email protected]>
AuthorDate: Wed May 8 19:46:15 2024 -0400

    GH-41573: [Java] VectorSchemaRoot uses inefficient stream to copy 
fieldVectors (#41574)
    
    ### Rationale for this change
    
    While reviewing allocation profiling of an Arrow intensive application, I 
noticed significant allocations due to `ArrayList#grow()` originating from 
`org.apache.arrow.vector.VectorSchemaRoot#getFieldVectors()`. The 
`org.apache.arrow.vector.VectorSchemaRoot#getFieldVectors()` method uses an 
inefficient `fieldVectors.stream().collect(Collectors.toList())` to create a 
list copy, leading to reallocations as the target list is collected. This could 
be replaced with a more efficent `new Arr [...]
    
    ### What changes are included in this PR?
    
    * Use `Collections.unmodifiableList(List)` to return unmodifiable list view 
of `fieldVectors` from `getFieldVectors()`
    * Pre-size the `fieldVectors` `ArrayList` in static factory 
`VectorSchemaRoot#create(Schema, BufferAllocator)`
    * `VectorSchemaRoot#setRowCount(int)` iterates over instance `fieldVectors` 
instead of copied list (similar to existing `allocateNew()`, `clear()`, 
`contentToTSVString()`).
    
    ### Are these changes tested?
    
    These changes are covered by existing unit and integration tests.
    
    ### Are there any user-facing changes?
    
    No
    
    * GitHub Issue: #41573
    
    Authored-by: David Schlosnagle <[email protected]>
    Signed-off-by: David Li <[email protected]>
---
 .../src/main/java/org/apache/arrow/vector/VectorSchemaRoot.java    | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git 
a/java/vector/src/main/java/org/apache/arrow/vector/VectorSchemaRoot.java 
b/java/vector/src/main/java/org/apache/arrow/vector/VectorSchemaRoot.java
index 8768a90c80..9a92ce5060 100644
--- a/java/vector/src/main/java/org/apache/arrow/vector/VectorSchemaRoot.java
+++ b/java/vector/src/main/java/org/apache/arrow/vector/VectorSchemaRoot.java
@@ -19,6 +19,7 @@ package org.apache.arrow.vector;
 
 import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.Collections;
 import java.util.LinkedHashMap;
 import java.util.List;
 import java.util.Map;
@@ -121,7 +122,7 @@ public class VectorSchemaRoot implements AutoCloseable {
    * Creates a new set of empty vectors corresponding to the given schema.
    */
   public static VectorSchemaRoot create(Schema schema, BufferAllocator 
allocator) {
-    List<FieldVector> fieldVectors = new ArrayList<>();
+    List<FieldVector> fieldVectors = new 
ArrayList<>(schema.getFields().size());
     for (Field field : schema.getFields()) {
       FieldVector vector = field.createVector(allocator);
       fieldVectors.add(vector);
@@ -160,7 +161,7 @@ public class VectorSchemaRoot implements AutoCloseable {
   }
 
   public List<FieldVector> getFieldVectors() {
-    return fieldVectors.stream().collect(Collectors.toList());
+    return Collections.unmodifiableList(fieldVectors);
   }
 
   /**
@@ -236,7 +237,7 @@ public class VectorSchemaRoot implements AutoCloseable {
    */
   public void setRowCount(int rowCount) {
     this.rowCount = rowCount;
-    for (FieldVector v : getFieldVectors()) {
+    for (FieldVector v : fieldVectors) {
       v.setValueCount(rowCount);
     }
   }

Reply via email to