alamb opened a new issue, #9060:
URL: https://github.com/apache/arrow-rs/issues/9060

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   
   I am profiling clickbench query 26 with predicate pushdown enabled as part 
of 
   - https://github.com/apache/datafusion/issues/3463
   
   ```shell
   samply record -- 
/Users/andrewlamb/Software/datafusion2/target/profiling/datafusion-cli   -f 
q.sql  > /dev/null  2>&1
   ```
   
   ```sql
   SELECT "SearchPhrase" FROM hits WHERE "SearchPhrase" <> '' ORDER BY 
"EventTime", "SearchPhrase" LIMIT 10;
   ```
   
   While looking at the profile, I noticed that 3% of the time is spent 
concatenating in the cached array reader
   
   <img width="1379" height="659" alt="Image" 
src="https://github.com/user-attachments/assets/d80fa075-55c6-42f3-8c60-41bcbc685d43";
 />
   
   I believe the call is here:
   
https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/parquet/src/arrow/array_reader/cached_array_reader.rs#L333
   
   **Describe the solution you'd like**
   I would like to make this faster 
   
   **Describe alternatives you've considered**
   I think we can use the 
[`BatchCoalescer`](https://docs.rs/arrow/latest/arrow/compute/struct.BatchCoalescer.html)
 for this task and potentially save at least one copy
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to