alamb opened a new issue, #9059: URL: https://github.com/apache/arrow-rs/issues/9059
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I am profiling clickbench query 10 with predicate pushdown enabled as part of - https://github.com/apache/datafusion/issues/3463 ```shell samply record -- /Users/andrewlamb/Software/datafusion2/target/profiling/datafusion-cli -f q.sql > /dev/null 2>&1 ``` ```sql SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10; ``` While looking at the profile, I noticed that 7% of the time is spent in allocating / regrowing vectors (aka reallocating and copying) <img width="1723" height="662" alt="Image" src="https://github.com/user-attachments/assets/efe96f7a-d3aa-426f-a700-be26585f5c6b" /> **Describe the solution you'd like** Avoid the time spent regrowing these vectors It appears that the vectors in question are part of the `ViewBuffer` struct: https://github.com/apache/arrow-rs/blob/02fa779a9cb122c5218293be3afb980832701683/parquet/src/arrow/buffer/view_buffer.rs#L30-L33 **Describe alternatives you've considered** Since we know how many views will be in each output buffer, we could create the `ViewBuffers` with the correct size initially Something like like ```rust ViewBuffers::with_capacity ``` **Additional context** <!-- Add any other context or screenshots about the feature request here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
