alamb commented on code in PR #14021:
URL: https://github.com/apache/datafusion/pull/14021#discussion_r1907365457
##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -114,19 +114,22 @@ impl ListingTableConfig {
}
}
- fn infer_file_extension(path: &str) -> Result<String> {
+ fn infer_file_extension_and_compression_type(
Review Comment:
Can you please document what the return values are? Something like
```suggestion
/// Returns a tuple of (file_extension, optional compression_extension)
///
/// For example `("csv", Some("gz"))`
fn infer_file_extension_and_compression_type(
```
##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -147,18 +150,31 @@ impl ListingTableConfig {
.await
.ok_or_else(|| DataFusionError::Internal("No files for
table".into()))??;
- let file_extension =
- ListingTableConfig::infer_file_extension(file.location.as_ref())?;
+ let (file_extension, maybe_compression_type) =
+ ListingTableConfig::infer_file_extension_and_compression_type(
+ file.location.as_ref(),
+ )?;
+
+ let mut format_options = HashMap::new();
+
+ let listing_file_extension =
+ if let Some(compression_type) = maybe_compression_type {
+ format_options
Review Comment:
Very minor: I think if you reoranized this code you could avoid `clone`ing
the compression_type and file_extension.
##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -2194,4 +2210,23 @@ mod tests {
Ok(())
}
+
+ #[tokio::test]
+ async fn test_infer_options_compressed_csv() -> Result<()> {
+ let testdata = crate::test_util::arrow_test_data();
+ let filename = format!("{}/csv/aggregate_test_100.csv.gz", testdata);
+ let table_path = ListingTableUrl::parse(filename).unwrap();
+
+ let ctx = SessionContext::new();
+
+ let config = ListingTableConfig::new(table_path);
+ let config_with_opts = config.infer_options(&ctx.state()).await?;
+ let config_with_schema =
config_with_opts.infer_schema(&ctx.state()).await?;
+
+ let schema = config_with_schema.file_schema.unwrap();
+
+ assert_eq!(schema.fields.len(), 13);
Review Comment:
I verified that without the code in this PR this test fails like this:
```
assertion `left == right` failed
left: 0
right: 13
Left: 0
Right: 13
<Click to see difference>
thread
'datasource::listing::table::tests::test_infer_options_compressed_csv' panicked
at datafusion/core/src/datasource/listing/table.rs:2212:9:
assertion `left == right` failed
left: 0
right: 13
stack backtrace:
0: rust_begin_unwind
at
/rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:665:5
1: core::panicking::panic_fmt
at
/rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:74:14
2: core::panicking::assert_failed_inner
3: core::panicking::assert_failed
at
/Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/panicking.rs:367:5
4:
datafusion::datasource::listing::table::tests::test_infer_options_compressed_csv::{{closure}}
at ./src/datasource/listing/table.rs:2212:9
5: <core::pin::Pin<P> as core::future::future::Future>::poll
at
/Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/future/future.rs:123:9
6: <core::pin::Pin<P> as core::future::future::Future>::poll
at
/Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/future/future.rs:123:9
7:
tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:729:57
8: tokio::runtime::coop::with_budget
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/coop.rs:107:5
9: tokio::runtime::coop::budget
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/coop.rs:73:5
10:
tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:729:25
11: tokio::runtime::scheduler::current_thread::Context::enter
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:428:19
12:
tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:728:36
13:
tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:807:68
14: tokio::runtime::context::scoped::Scoped<T>::set
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/context/scoped.rs:40:9
15: tokio::runtime::context::set_scheduler::{{closure}}
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/context.rs:180:26
16: std::thread::local::LocalKey<T>::try_with
at
/Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/thread/local.rs:283:12
17: std::thread::local::LocalKey<T>::with
at
/Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/thread/local.rs:260:9
18: tokio::runtime::context::set_scheduler
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/context.rs:180:9
19: tokio::runtime::scheduler::current_thread::CoreGuard::enter
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:807:27
20: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:716:19
21:
tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}}
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:196:28
22: tokio::runtime::context::runtime::enter_runtime
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/context/runtime.rs:65:16
23: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/current_thread/mod.rs:184:9
24: tokio::runtime::runtime::Runtime::block_on_inner
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/runtime.rs:368:47
25: tokio::runtime::runtime::Runtime::block_on
at
/Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/runtime.rs:342:13
26:
datafusion::datasource::listing::table::tests::test_infer_options_compressed_csv
at ./src/datasource/listing/table.rs:2214:9
27:
datafusion::datasource::listing::table::tests::test_infer_options_compressed_csv::{{closure}}
at ./src/datasource/listing/table.rs:2199:53
28: core::ops::function::FnOnce::call_once
at
/Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
29: core::ops::function::FnOnce::call_once
at
/rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose
backtrace.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]