hamilton-earthscope opened a new pull request, #622:
URL: https://github.com/apache/iceberg-go/pull/622

   # Partitioned Write Optimizations
   
   ## Summary
   
   This PR delivers significant performance improvements to the partitioned 
write throughput in the Iceberg table writer. Through a series of iterative 
optimizations, we achieved substantial gains in write performance, reduced 
memory allocations, and improved overall efficiency.
   
   ## Performance Results
   
   Note: the following results are using the latest commit on `main` of 
arrow-go 
(https://github.com/apache/arrow-go/commit/3160eef9c227d94db67bfaf5225a2d6c1f48bc76)
   
   The following benchmarks were conducted on Apple M3 Max (darwin/arm64) for 
the new `BenchmarkPartitionedWriteThroughput` test with 2.5M rows per write 
operation:
   
   ### Incremental Improvements
   
   | Change | Time/op | Δ Time | records/sec | Δ records/sec | allocs/op | Δ 
allocs/op |
   
|--------|---------|---------|-------------|---------------|-----------|-------------|
   | Base | 2.35 s | - | 1,065,115 | - | 60,076,290 | - |
   | Change 1 | 1.49 s | -36.5% | 1,677,802 | +57.5% | 35,076,376 | -41.6% |
   | Change 2 | 1.21 s | -18.7% | 2,064,629 | +23.1% | 25,076,562 | -28.5% |
   | Change 3 | 1.16 s | -4.5% | 2,161,545 | +4.7% | 22,576,588 | -10.0% |
   | Change 4 | 1.07 s | -7.4% | 2,334,700 | +8.0% | 20,076,480 | -11.1% |
   | Change 5 | 654.1 ms | -38.9% | 3,821,892 | +63.7% | 12,577,119 | -37.4% |
   
   ### Overall Improvement (Base → Change 5)
   
   **2.5M Records per write:**
   
   | Metric | Base | Change 5 | Improvement |
   |--------|------|----------|-------------|
   | **Time/op** | 2.35 s | 654.1 ms | **-72.1%** ⚡ |
   | **records/sec** | 1,065,115 | 3,821,892 | **+258.9%** 🚀 |
   | **allocs/op** | 60,076,290 | 12,577,119 | **-79.1%** 💪 |
   
   **100K Records per write:**
   
   | Metric | Base | Change 5 | Improvement |
   |--------|------|----------|-------------|
   | **Time/op** | 221.3 ms | 137.7 ms | **-37.8%** |
   | **records/sec** | 451,853 | 726,085 | **+60.7%** |
   | **allocs/op** | 2,472,792 | 573,012 | **-76.8%** |
   
   ## Impact
   
   These optimizations significantly improve the performance of partitioned 
writes in Iceberg tables, making the library more efficient for high-throughput 
data ingestion scenarios. The improvements scale well with larger workloads, as 
demonstrated by the dramatic gains in the 2.5M record benchmark results.
   
   ## Testing
   
   All optimizations were validated through comprehensive benchmarking on 
darwin/arm64 (Apple M3 Max). The improvements are consistent across different 
record volumes (100K, 500K, and 2.5M records).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to