bharath-techie commented on code in PR #18971:
URL: https://github.com/apache/datafusion/pull/18971#discussion_r2570606415
##########
datafusion/core/src/datasource/listing_table_factory.rs:
##########
@@ -190,6 +190,12 @@ impl TableProviderFactory for ListingTableFactory {
.with_definition(cmd.definition.clone())
.with_constraints(cmd.constraints.clone())
.with_column_defaults(cmd.column_defaults.clone());
+
+ // Pre-warm statistics cache if collect_statistics is enabled
+ if session_state.config().collect_statistics() {
+ let _ = table.list_files_for_scan(state, &[], None).await?;
Review Comment:
Thanks @martin-g for reviewing.
Agree on having limit.
But doing it in background will result in inconsistent behavior ?
```
Should DataFusion collect statistics when first creating a table. Has no
effect after the table is created. Applies to the default ListingTableProvider
in DataFusion. Defaults to true.
```
Will a user not expect the statistics to be collected when creating the
table and expect any query post that to be optimized based on the above
documentation ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]