gene-bordegaray commented on PR #19124:
URL: https://github.com/apache/datafusion/pull/19124#issuecomment-3627758173

   > If adding per-scan configuration is too much for this PR, it’s probably 
enough to call it out explicitly as follow-up work, but I think this skew 
aspect is important to keep in mind for the feature’s overall design.
   
   This is a great call out and I think could see some improvements for file 
I/O while using this behavior. I did some poking and looks like one approach 
may be to expose a new listing option, 
`ListingOptions::with_preserve_file_partititons(threshold)` that will force 
grouping logic on that table alone. Thus, you could run two table in the same 
session with the optimization on for one and off for the other.
   
   I think this would be great follow-up work as I am trying to keep the scope 
of this PR tight with good benefits 😄 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to