adriangb commented on PR #20047: URL: https://github.com/apache/datafusion/pull/20047#issuecomment-4070806701
Hiya, welcome back! > Quick question on ordering and caching. Ordering is table-specific, right? Statistics are file-specific and can be cached session-wide. So I’d assume ordering caching should be table-scoped, not session-wide? Sorry I missed this. The ordering of a file is static / physically encoded in the file. The order of the table however is not: a table constitutes a collection of files, we can (and plan to) be able to re-sort the file groups or scan order to produce orderings. There's some nuance around exact vs. inexact orderings (only the former is a real ordering). But in summary I think the physical ordering of the files is fine to cache session-wide like you say. The ordering of a table maybe is not since we can dynamically adjust the ordering of the scan to best suit each particular query (even if the physical output order of the data is not exactly sorted). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
