alamb commented on code in PR #7972:
URL: https://github.com/apache/arrow-datafusion/pull/7972#discussion_r1377692256
##########
datafusion/core/src/execution/context/mod.rs:
##########
@@ -849,6 +849,32 @@ impl SessionContext {
let table_paths = table_paths.to_urls()?;
let session_config = self.copied_config();
let listing_options = options.to_listing_options(&session_config);
+
+ let option_extension = listing_options.file_extension.clone();
+
+ if table_paths.is_empty() {
+ return exec_err!("No table paths were provided");
+ }
+
+ let extension = table_paths[0]
Review Comment:
I think this code is somewhat unclear if you don't have the context of this
PR.
Also, why does it only look at a single file? What if the files were like
```
output1.parquet
output2.parquet.snappy
```
What would you think about extracting this logic into its own function like
```fn
/// Heuristically determines the format (e.g. parquet, csv) to use with
the `table_paths`
fn infer_types<'a, P: DataFilePaths>(
table_paths: P,
) -> Option<String> {
...
}
```
A unit test could then be added that demonstrates the behavior of examples
such as @comphead 's in
https://github.com/apache/arrow-datafusion/pull/7972/files#r1376546426
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]