BlakeOrth commented on PR #16971: URL: https://github.com/apache/datafusion/pull/16971#issuecomment-3145516316
I figured I'd put my money where my mouth was with regards to my comment here: https://github.com/apache/datafusion/pull/16971#discussion_r2248565529 specifically with regards to the latency penalty. I've done a very quick modification to this branch to implement the standard `get` method that omits any requests for metadata. Results on a remote dataset can be seen below: ```sql DataFusion CLI v49.0.0 > CREATE EXTERNAL TABLE athena_partitioned STORED AS PARQUET LOCATION 's3://clickhouse-public-datasets/hits_compatible/athena_partitioned/'; 0 row(s) fetched. Elapsed 3.277 seconds. > select count(*) from athena_partitioned; +----------+ | count(*) | +----------+ | 99997497 | +----------+ 1 row(s) fetched. Elapsed 2.469 seconds. > select count(*) from athena_partitioned; +----------+ | count(*) | +----------+ | 99997497 | +----------+ 1 row(s) fetched. Elapsed 0.309 seconds. > select count(*) from athena_partitioned; +----------+ | count(*) | +----------+ | 99997497 | +----------+ 1 row(s) fetched. Elapsed 0.159 seconds. > ``` @alamb are these results more in line with what you were expecting to see regarding your comment here? https://github.com/apache/datafusion/pull/16971#pullrequestreview-3073221882 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org