alamb opened a new issue, #13188:
URL: https://github.com/apache/datafusion/issues/13188

   ### Describe the bug
   
   While enabling `StringView` reading from Parquet in 
https://github.com/apache/datafusion/pull/13101 @Dandandan noticed a slight 
regression for TPCH 18 
https://github.com/apache/datafusion/pull/13101#issuecomment-2437865910
   
   
   here is the query
   ```sql
   select
       c_name,
       c_custkey,
       o_orderkey,
       o_orderdate,
       o_totalprice,
       sum(l_quantity)
   from
       customer,
       orders,
       lineitem
   where
           o_orderkey in (
           select
               l_orderkey
           from
               lineitem
           group by
               l_orderkey having
                   sum(l_quantity) > 300
       )
     and c_custkey = o_custkey
     and o_orderkey = l_orderkey
   group by
       c_name,
       c_custkey,
       o_orderkey,
       o_orderdate,
       o_totalprice
   order by
       o_totalprice desc,
       o_orderdate;
   ```
   
   ### To Reproduce
   
   
   
   To reproduce
   
   Make data
   ````shell
   # make the data and get to the correct location
   cd datafusion/benchmarks
   ./bench.sh data tpch
   cd data/tpch_sf1
   ```
   
   Run query:
   ```
   datafusion-cli -f ../../queries/q18.sql  | grep Elapsed
   Elapsed 0.088 seconds.
   ```
   
   When StringView is enabled it seems like it is slightly slower
   
   ### Expected behavior
   
   StringView should always be faster
   
   ### Additional context
   
   I took a brief look at the flamegraphs -- it seems like one difference could 
be `BatchCoalescer::push_batch`
   
   ![Screenshot 2024-10-30 at 2 13 38 
PM](https://github.com/user-attachments/assets/1ee2cbbc-de7c-429e-b613-c82767be0870)
   
   There is a special case for StringView here: 
   
https://github.com/apache/datafusion/blob/6034be42808b43e3f48f6e58ec38cc35fa253abb/datafusion/physical-plan/src/coalesce/mod.rs#L117-L116
   
   
   Here are the explain plans for the query before and after the change
   - 
[q18-before.txt](https://github.com/user-attachments/files/17577287/q18-before.txt)
   - 
[q18-after.txt](https://github.com/user-attachments/files/17577286/q18-after.txt)
   
   Here are the flamegraphs for the query before/after the change
   - 
![q18-flamegraph-after](https://github.com/user-attachments/assets/2bdef92a-0c9e-40d9-acf0-6e9f7d6c9737)
   - 
![q18-flamegraph-before](https://github.com/user-attachments/assets/04045895-5e56-4d27-b35a-9e36818165f6)
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to