alamb commented on code in PR #12466: URL: https://github.com/apache/datafusion/pull/12466#discussion_r1761626074
########## datafusion/sql/src/statement.rs: ########## @@ -1028,8 +1030,26 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { .into_iter() .collect(); - let schema = self.build_schema(columns)?; - let df_schema = schema.to_dfschema_ref()?; + let df_schema = match file_type.as_str() { Review Comment: I am sorry for the delayed feeback @devanbenz -- I swear I typed this feedback but i must not have clicked "submit" Basically my concerns about this approach are twofold: 1. This code assumes the parquet file is on the local filesystem (when for many systems it may be on remote object storage) 2. It also adds a dependency in sql parsing to the parquet format. Since `parquet` has quite a few dependencies, this new dependency is likely non ideal for systems that are using DataFusion for sql parsing (like dask-sql for example) Perhaps you could delay the creation of the ORDER BY until the table provider is resolved? The table provider: https://github.com/apache/datafusion/blob/2521043ddcb3895a2010b8e328f3fa10f77fc094/datafusion/expr/src/planner.rs#L35-L34 Once the table provider is resolved then the schema's table can be known Another benefit of this approach is that it would work for all formats, not just parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org