HippoBaro opened a new issue, #9731: URL: https://github.com/apache/arrow-rs/issues/9731
**Describe the bug** Several Parquet read and write paths allocate memory and perform computation proportional to the total row count, even when most values are repetitions, or null. For a column that is 99% null, this means the current implementation sometime does ~100x more work than necessary. This issue tracks the general problem; individual PRs will reference it for context. **To Reproduce** N/A **Expected behavior** For Parquet file storing definition and repetition levels in RLE encoding, the cost should be proportional to the number of runs. **Additional context** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
