[I] Column performance: run-proportional read/write cost [arrow-rs]

via GitHub Wed, 15 Apr 2026 09:14:02 -0700


HippoBaro opened a new issue, #9731:
URL: https://github.com/apache/arrow-rs/issues/9731


   **Describe the bug**
   
   Several Parquet read and write paths allocate memory and perform computation 
proportional to the total row count, even when most values are repetitions, or 
null.  For a column that is 99% null, this means the current implementation 
sometime does ~100x more work than necessary. This issue tracks the general 
problem; individual PRs will reference it for context.
   
   **To Reproduce**
   N/A
   
   **Expected behavior**
   For Parquet file storing definition and repetition levels in RLE encoding, 
the cost should be proportional to the number of runs. 
   
   **Additional context**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Column performance: run-proportional read/write cost [arrow-rs]

Reply via email to