alamb opened a new issue, #8698:
URL: https://github.com/apache/arrow-datafusion/issues/8698

   ### Is your feature request related to a problem or challenge?
   
   A report from Twitter  https://twitter.com/mim_djo/status/1740542585410814393
   
   Says:
   
   > a new release of #datafusion 34, still reading #Deltatable via arrow is 
suboptimal compared to reading Parquet Directly :( something to do with passing 
stats to get correct join orders.
   
   
![image](https://github.com/apache/arrow-datafusion/assets/490673/c75637da-6408-461a-be27-513a13443c3f)
   
   
   I think the issue is that 
https://github.com/apache/arrow-datafusion/issues/7949 and 
https://github.com/apache/arrow-datafusion/issues/7950 rely on statistics to 
pick non bad join orders for TPCH queries. 
   
   These statistics are not available from the delta provider it seems. 
   
   @andygrove  says  
   
   > RelCommon (common to all operators in Substrait) can contain a hint that 
has stats
   
   ```
    message Stats {
         double row_count = 1;
         double record_size = 2;
         substrait.extensions.AdvancedExtension advanced_extension = 10;
       }
   ```
   
   
   ### Describe the solution you'd like
   
   I would like the Datafusion substrait consumer/producer to handle 
translating 
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   This was brought up by @Dandandan  on the ASF slack: 
https://the-asf.slack.com/archives/C04RJ0C85UZ/p1703885214702039


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to