alamb commented on code in PR #14021:
URL: https://github.com/apache/datafusion/pull/14021#discussion_r1907365457


##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -114,19 +114,22 @@ impl ListingTableConfig {
         }
     }
 
-    fn infer_file_extension(path: &str) -> Result<String> {
+    fn infer_file_extension_and_compression_type(

Review Comment:
   Can you please document what the return values are? Something like
   
   ```suggestion
       /// Returns a tuple of (file_extension, optional compression_extension)
       ///
       /// For example `("csv", Some("gz"))`
       fn infer_file_extension_and_compression_type(
   ```



##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -147,18 +150,31 @@ impl ListingTableConfig {
             .await
             .ok_or_else(|| DataFusionError::Internal("No files for 
table".into()))??;
 
-        let file_extension =
-            ListingTableConfig::infer_file_extension(file.location.as_ref())?;
+        let (file_extension, maybe_compression_type) =
+            ListingTableConfig::infer_file_extension_and_compression_type(
+                file.location.as_ref(),
+            )?;
+
+        let mut format_options = HashMap::new();
+
+        let listing_file_extension =
+            if let Some(compression_type) = maybe_compression_type {
+                format_options

Review Comment:
   Very minor: I think if you reoranized this code you could avoid `clone`ing 
the compression_type and file_extension. 
   
   



##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -2194,4 +2210,23 @@ mod tests {
 
         Ok(())
     }
+
+    #[tokio::test]
+    async fn test_infer_options_compressed_csv() -> Result<()> {
+        let testdata = crate::test_util::arrow_test_data();
+        let filename = format!("{}/csv/aggregate_test_100.csv.gz", testdata);
+        let table_path = ListingTableUrl::parse(filename).unwrap();
+
+        let ctx = SessionContext::new();
+
+        let config = ListingTableConfig::new(table_path);
+        let config_with_opts = config.infer_options(&ctx.state()).await?;
+        let config_with_schema = 
config_with_opts.infer_schema(&ctx.state()).await?;
+
+        let schema = config_with_schema.file_schema.unwrap();
+
+        assert_eq!(schema.fields.len(), 13);

Review Comment:
   I verified that without the code in this PR this test fails like this:
   
   ```
   
   assertion `left == right` failed
     left: 0
    right: 13
   
   Left:  0
   Right: 13
   <Click to see difference>
   
   thread 
'datasource::listing::table::tests::test_infer_options_compressed_csv' panicked 
at datafusion/core/src/datasource/listing/table.rs:2212:9:
   assertion `left == right` failed
     left: 0
    right: 13
   stack backtrace:
      0: rust_begin_unwind
                at 
/rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:665:5
      1: core::panicking::panic_fmt
                at 
/rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:74:14
      2: core::panicking::assert_failed_inner
      3: core::panicking::assert_failed
                at 
/Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/panicking.rs:367:5
      4: 
datafusion::datasource::listing::table::tests::test_infer_options_compressed_csv::{{closure}}
                at ./src/datasource/listing/table.rs:2212:9
      5: <core::pin::Pin<P> as core::future::future::Future>::poll
                at 
/Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/future/future.rs:123:9
      6: <core::pin::Pin<P> as core::future::future::Future>::poll
                at 
/Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/future/future.rs:123:9
      7: 
tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:729:57
      8: tokio::runtime::coop::with_budget
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/coop.rs:107:5
      9: tokio::runtime::coop::budget
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/coop.rs:73:5
     10: 
tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:729:25
     11: tokio::runtime::scheduler::current_thread::Context::enter
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:428:19
     12: 
tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:728:36
     13: 
tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:807:68
     14: tokio::runtime::context::scoped::Scoped<T>::set
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/context/scoped.rs:40:9
     15: tokio::runtime::context::set_scheduler::{{closure}}
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/context.rs:180:26
     16: std::thread::local::LocalKey<T>::try_with
                at 
/Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/thread/local.rs:283:12
     17: std::thread::local::LocalKey<T>::with
                at 
/Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/thread/local.rs:260:9
     18: tokio::runtime::context::set_scheduler
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/context.rs:180:9
     19: tokio::runtime::scheduler::current_thread::CoreGuard::enter
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:807:27
     20: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:716:19
     21: 
tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}}
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:196:28
     22: tokio::runtime::context::runtime::enter_runtime
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/context/runtime.rs:65:16
     23: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:184:9
     24: tokio::runtime::runtime::Runtime::block_on_inner
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/runtime.rs:368:47
     25: tokio::runtime::runtime::Runtime::block_on
                at 
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/runtime.rs:342:13
     26: 
datafusion::datasource::listing::table::tests::test_infer_options_compressed_csv
                at ./src/datasource/listing/table.rs:2214:9
     27: 
datafusion::datasource::listing::table::tests::test_infer_options_compressed_csv::{{closure}}
                at ./src/datasource/listing/table.rs:2199:53
     28: core::ops::function::FnOnce::call_once
                at 
/Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
     29: core::ops::function::FnOnce::call_once
                at 
/rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/ops/function.rs:250:5
   note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose 
backtrace.
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to