alamb commented on PR #14918:
URL: https://github.com/apache/datafusion/pull/14918#issuecomment-2694766133

   I still could not reproduce any improvement with this PR, FWIW. I still 
think it is a good change so i merged it in, but it might be cool to find some 
benchmark results that showed the improvement
   
   
   <details><summary>Details</summary>
   <p>
   
   ```rust
   
   use std::sync::Arc;
   use std::time::Instant;
   use datafusion::datasource::file_format::parquet::ParquetFormat;
   use datafusion::datasource::listing::{ListingOptions, ListingTable, 
ListingTableConfig, ListingTableUrl};
   use datafusion::execution::object_store::ObjectStoreUrl;
   use datafusion::prelude::SessionContext;
   
   #[tokio::main]
   async fn main() -> datafusion::error::Result<()> {
       let ctx = SessionContext::new();
       let object_store_url = 
ObjectStoreUrl::parse("https://datasets.clickhouse.com";).unwrap();
       let object_store = object_store::http::HttpBuilder::new()
           .with_url(object_store_url.as_str())
           .build()
           .unwrap();
   
       ctx.register_object_store(object_store_url.as_ref(),
           Arc::new(object_store));
   
   
       // urls are like
       // 
https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet'
       //let base_url = 
ObjectStoreUrl::parse("https://datasets.clickhouse.com";).unwrap();
       let paths: Vec<ListingTableUrl> = (1..100).map(|i| 
format!("https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_{i}.parquet";))
           .map(|url| ListingTableUrl::parse(&url).unwrap())
           .collect();
   
       let listing_options = ListingOptions::new(Arc::new(ParquetFormat::new()))
           .with_collect_stat(true);
   
       let start = Instant::now();
       println!("Creating table / reading statistics....");
       let config = ListingTableConfig::new_with_multi_paths(paths)
           .with_listing_options(listing_options)
           .infer_schema(&ctx.state()).await?;
       let listing_table = ListingTable::try_new(config).unwrap();
       let df = ctx.read_table(Arc::new(listing_table))?;
       println!("Done in {:?}", Instant::now() - start);
   
       println!("running query");
       let start = Instant::now();
       let batches = df.limit(0, Some(10))?.collect().await.unwrap();
       println!("Got {} batches in  {:?}", batches.len(), Instant::now() - 
start);
   
       Ok(())
   }
   
   
   ```
   
   </p>
   </details> 
   
   Some testing numbers (the results vary wildly)
   
   
   On this branch
   ```
   Creating table / reading statistics....
   Done in 250.333042ms
   running query
   Got 1 batches in  1.943637416s
   hello world!
   (venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/rust_playground$ cargo 
run --release
       Finished `release` profile [optimized] target(s) in 0.21s
        Running `target/release/rust_playground`
   Creating table / reading statistics....
   Done in 174.578ms
   running query
   Got 1 batches in  1.62131175s
   hello world!
   (venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/rust_playground$ cargo 
run --release
       Finished `release` profile [optimized] target(s) in 0.12s
        Running `target/release/rust_playground`
   Creating table / reading statistics....
   Done in 191.24325ms
   running query
   Got 1 batches in  1.257049458s
   hello world!
   ```
   
   On main
   
   ```
   Creating table / reading statistics....
   Done in 165.25ms
   running query
   Got 1 batches in  819.607625ms
   hello world!
   (venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/rust_playground$ cargo 
run --release
       Finished `release` profile [optimized] target(s) in 0.20s
        Running `target/release/rust_playground`
   Creating table / reading statistics....
   Done in 165.120666ms
   running query
   Got 1 batches in  1.036410625s
   hello world!
   (venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/rust_playground$ cargo 
run --release
       Finished `release` profile [optimized] target(s) in 0.10s
        Running `target/release/rust_playground`
   Creating table / reading statistics....
   Done in 198.459166ms
   running query
   Got 1 batches in  831.307041ms
   hello world!
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to