alamb commented on PR #14918:
URL: https://github.com/apache/datafusion/pull/14918#issuecomment-2694766133
I still could not reproduce any improvement with this PR, FWIW. I still
think it is a good change so i merged it in, but it might be cool to find some
benchmark results that showed the improvement
<details><summary>Details</summary>
<p>
```rust
use std::sync::Arc;
use std::time::Instant;
use datafusion::datasource::file_format::parquet::ParquetFormat;
use datafusion::datasource::listing::{ListingOptions, ListingTable,
ListingTableConfig, ListingTableUrl};
use datafusion::execution::object_store::ObjectStoreUrl;
use datafusion::prelude::SessionContext;
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
let ctx = SessionContext::new();
let object_store_url =
ObjectStoreUrl::parse("https://datasets.clickhouse.com").unwrap();
let object_store = object_store::http::HttpBuilder::new()
.with_url(object_store_url.as_str())
.build()
.unwrap();
ctx.register_object_store(object_store_url.as_ref(),
Arc::new(object_store));
// urls are like
//
https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet'
//let base_url =
ObjectStoreUrl::parse("https://datasets.clickhouse.com").unwrap();
let paths: Vec<ListingTableUrl> = (1..100).map(|i|
format!("https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_{i}.parquet"))
.map(|url| ListingTableUrl::parse(&url).unwrap())
.collect();
let listing_options = ListingOptions::new(Arc::new(ParquetFormat::new()))
.with_collect_stat(true);
let start = Instant::now();
println!("Creating table / reading statistics....");
let config = ListingTableConfig::new_with_multi_paths(paths)
.with_listing_options(listing_options)
.infer_schema(&ctx.state()).await?;
let listing_table = ListingTable::try_new(config).unwrap();
let df = ctx.read_table(Arc::new(listing_table))?;
println!("Done in {:?}", Instant::now() - start);
println!("running query");
let start = Instant::now();
let batches = df.limit(0, Some(10))?.collect().await.unwrap();
println!("Got {} batches in {:?}", batches.len(), Instant::now() -
start);
Ok(())
}
```
</p>
</details>
Some testing numbers (the results vary wildly)
On this branch
```
Creating table / reading statistics....
Done in 250.333042ms
running query
Got 1 batches in 1.943637416s
hello world!
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/rust_playground$ cargo
run --release
Finished `release` profile [optimized] target(s) in 0.21s
Running `target/release/rust_playground`
Creating table / reading statistics....
Done in 174.578ms
running query
Got 1 batches in 1.62131175s
hello world!
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/rust_playground$ cargo
run --release
Finished `release` profile [optimized] target(s) in 0.12s
Running `target/release/rust_playground`
Creating table / reading statistics....
Done in 191.24325ms
running query
Got 1 batches in 1.257049458s
hello world!
```
On main
```
Creating table / reading statistics....
Done in 165.25ms
running query
Got 1 batches in 819.607625ms
hello world!
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/rust_playground$ cargo
run --release
Finished `release` profile [optimized] target(s) in 0.20s
Running `target/release/rust_playground`
Creating table / reading statistics....
Done in 165.120666ms
running query
Got 1 batches in 1.036410625s
hello world!
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/rust_playground$ cargo
run --release
Finished `release` profile [optimized] target(s) in 0.10s
Running `target/release/rust_playground`
Creating table / reading statistics....
Done in 198.459166ms
running query
Got 1 batches in 831.307041ms
hello world!
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]