HippoBaro opened a new pull request, #9724:
URL: https://github.com/apache/arrow-rs/pull/9724

   # Which issue does this PR close?
   
   - Depends on #9723
   - Contributes to #9722
   
   # Rationale for this change
   
   `WriterProperties::offset_index_disabled()` checked whether any column in 
the `column_properties` HashMap has page-level statistics enabled, scanning the 
entire map on every call. This method is called from `GenericColumnWriter::new` 
— once per column per row group. With N columns each having per-column 
properties, this resulted in quadratic HashMap iterations during row group 
construction.
   
   # What changes are included in this PR?
   
   Move the scan into `WriterPropertiesBuilder::build()` so it runs once at 
construction time.
   
   Benchmark results (vs baseline):
   
   ```
     writer_overhead/1000_cols/per_column_props     2.44 ms  (was   3.25 ms, 
−25%)
     writer_overhead/5000_cols/per_column_props    13.28 ms  (was  47.45 ms, 
−72%)
     writer_overhead/10000_cols/per_column_props   27.97 ms  (was 197.97 ms, 
−86%)
   ```
   
   Scaling now linear.
   
   # Are these changes tested?
   
   All tests passing.
   
   # Are there any user-facing changes?
   
   None.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to