alamb opened a new issue, #9059:
URL: https://github.com/apache/arrow-rs/issues/9059

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   I am profiling clickbench query 10 with predicate pushdown enabled as part 
of 
   - https://github.com/apache/datafusion/issues/3463
   
   ```shell
   samply record -- 
/Users/andrewlamb/Software/datafusion2/target/profiling/datafusion-cli   -f 
q.sql  > /dev/null  2>&1
   ```
   
   ```sql
   SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE 
"MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;
   ```
   
   While looking at the profile, I noticed that 7% of the time is spent in 
allocating / regrowing vectors (aka reallocating and copying)
   
   <img width="1723" height="662" alt="Image" 
src="https://github.com/user-attachments/assets/efe96f7a-d3aa-426f-a700-be26585f5c6b";
 />
   
   **Describe the solution you'd like**
   Avoid the time spent regrowing these vectors 
   
   It appears that the vectors in question are part of the `ViewBuffer` struct:
   
   
https://github.com/apache/arrow-rs/blob/02fa779a9cb122c5218293be3afb980832701683/parquet/src/arrow/buffer/view_buffer.rs#L30-L33
   
   **Describe alternatives you've considered**
   Since we know how many views will be in each output buffer, we could create 
the `ViewBuffers` with the correct size initially
   
   Something like like
   
   ```rust
   ViewBuffers::with_capacity
   ```
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to