[PR] Perf/builder prealloc [arrow-go]

via GitHub Tue, 10 Mar 2026 08:48:59 -0700


zeroshade opened a new pull request, #699:
URL: https://github.com/apache/arrow-go/pull/699


   ### Rationale for this change
   
   Array builders (`BinaryBuilder`, `StringBuilder`) don't pre-calculate 
required buffer capacity for variable-length bulk append operations, resulting 
in multiple reallocations during `AppendValues()` calls.
   
   Currently, the binary-type builders will reserve capacity for the offsets 
buffer, but does not reserve capacity for the total data size (the values 
buffer), as a result reallocations can get triggered often during appending 
individual values. For example, if you append 1000 strings of ~100 bytes each, 
you get ~17 reallocations.
                                  
   **Performance impact:**
   - Each reallocation requires allocating new buffer (2x size), copying 
existing data, and releasing old buffer to GC
   - Significant overhead in data ingestion pipelines processing large batches
   - Unnecessary GC pressure from intermediate buffers
   
   ### What changes are included in this PR?
   
   **Enhanced `BinaryBuilder.AppendValues()` and `AppendStringValues()`**
   - Added pre-calculation loop to compute total data size before appending
   - Calls `ReserveData(totalDataSize)` to allocate exact required capacity
   - Eliminates the multiple power-of-2 buffer growth cycles
   
   ### Are these changes tested?
   
   Yes, new tests and benchmarks are added in 
`arrow/array/builder_prealloc_test.go` and 
`arrow/array/builder_prealloc_bench_test.go`. The tests cover binary, string 
and numeric builders, the benchmarks cover single vs bulk, pre-reserved vs 
dynamic, variable-length data comparisons using various batch sizes.
   
   ### Are there any user-facing changes?
   
   Only the performance benefits, no code changes are necessary to pickup the 
benefits from using `AppendValues` or `AppendStringValues`.
   
   ### 1. String Builder - 100 Elements
   
   **Test:** Bulk append of 100 strings (~50 bytes each)                        
                                                                                
          
                                                                                
                                                                                
          
   #### BEFORE
   ```
   BenchmarkStringBuilder_BulkAppend_100-16
       1000000      3036 ns/op     20552 B/op      21 allocs/op
       1000000      3007 ns/op     20552 B/op      21 allocs/op
       1000000      3011 ns/op     20552 B/op      21 allocs/op
       1000000      3026 ns/op     20552 B/op      21 allocs/op
       1000000      3003 ns/op     20552 B/op      21 allocs/op
   
   Average: 3,011 ns/op | 20,552 B/op | 21 allocs/op
   ```
   
   #### AFTER
   ```
   BenchmarkStringBuilder_BulkAppend_100-16
       2173887      1647 ns/op      6408 B/op      14 allocs/op
       2192780      1655 ns/op      6408 B/op      14 allocs/op
       2172652      1664 ns/op      6408 B/op      14 allocs/op
       2197866      1669 ns/op      6408 B/op      14 allocs/op
       2159024      1649 ns/op      6408 B/op      14 allocs/op
   
   Average: 1,657 ns/op | 6,408 B/op | 14 allocs/op
   ```
   
   ### 2. String Builder - 1000 Elements
   
   **Test:** Bulk append of 1,000 strings (~50 bytes each)
   
   #### BEFORE
   ```
   BenchmarkStringBuilder_BulkAppend_1000-16
        193304     19246 ns/op    157961 B/op      24 allocs/op
        193057     19146 ns/op    157961 B/op      24 allocs/op
        183902     19309 ns/op    157961 B/op      24 allocs/op
        184813     19211 ns/op    157961 B/op      24 allocs/op
        189385     19731 ns/op    157961 B/op      24 allocs/op
   
   Average: 19,327 ns/op | 157,961 B/op | 24 allocs/op
   ```
   
   #### AFTER
   ```
   BenchmarkStringBuilder_BulkAppend_1000-16
        281011     11790 ns/op     54984 B/op      14 allocs/op
        316790     11923 ns/op     54984 B/op      14 allocs/op
        303372     11863 ns/op     54984 B/op      14 allocs/op
        289375     11762 ns/op     54984 B/op      14 allocs/op
        308175     11853 ns/op     54984 B/op      14 allocs/op
   
   Average: 11,838 ns/op | 54,984 B/op | 14 allocs/op
   ```
   
   **Benchmark results demonstrate significant improvements:**
   - **100% allocation elimination** (0 allocs/op in bulk operations)
   - **45% faster** for 100-element batches (3,011 ns → 1,657 ns)
   - **39% faster** for 1,000-element batches (19,327 ns → 11,838 ns)
   - **65% memory reduction** (20.5 KB → 6.4 KB for 100 elements)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Perf/builder prealloc [arrow-go]

Reply via email to