BlakeOrth commented on code in PR #18370:
URL: https://github.com/apache/datafusion/pull/18370#discussion_r2478808009
##########
datafusion/core/tests/datasource/object_store_access.rs:
##########
@@ -123,6 +126,43 @@ async fn query_multi_csv_file() {
);
}
+#[tokio::test]
+async fn query_partitioned_csv_file() {
+ let test = Test::new().with_partitioned_csv().await;
+ assert_snapshot!(
+ test.query("select * from csv_table_partitioned").await,
+ @r"
+ ------- Query Output (6 rows) -------
+ +---------+-------+-------+---+----+-----+
+ | d1 | d2 | d3 | a | b | c |
+ +---------+-------+-------+---+----+-----+
+ | 0.00001 | 1e-12 | true | 1 | 10 | 100 |
+ | 0.00003 | 5e-12 | false | 1 | 10 | 100 |
+ | 0.00002 | 2e-12 | true | 2 | 20 | 200 |
+ | 0.00003 | 5e-12 | false | 2 | 20 | 200 |
+ | 0.00003 | 3e-12 | true | 3 | 30 | 300 |
+ | 0.00003 | 5e-12 | false | 3 | 30 | 300 |
+ +---------+-------+-------+---+----+-----+
+ ------- Object Store Request Summary -------
+ RequestCountingObjectStore()
+ Total Requests: 13
+ - LIST (with delimiter) prefix=data
Review Comment:
Yes, one for each directory! I had initially used 3 files in each directory,
but I thought this test produced an even more interesting result because there
are more list requests than there are data files.
I will say one thing we can't easily see here is the sequencing and
parallelism of the list requests. The current implementation does a pretty good
job of hiding the latency behind concurrency.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]