kosiew commented on code in PR #20047:
URL: https://github.com/apache/datafusion/pull/20047#discussion_r3092631605


##########
datafusion/core/src/execution/context/mod.rs:
##########
@@ -1752,7 +1766,9 @@ impl SessionContext {
         let config = ListingTableConfig::new(table_path)
             .with_listing_options(options)
             .with_schema(resolved_schema);
-        let table = 
ListingTable::try_new(config)?.with_definition(sql_definition);
+        let table = ListingTable::try_new(config)?
+            .with_definition(sql_definition)
+            
.with_cache(self.runtime_env().cache_manager.get_file_statistic_cache());

Review Comment:
   I noticed that only the registered-listing-table path was updated to call 
`.with_cache(...)`. Was it intentional that the generic reader path in 
`_read_type()` still uses `ListingTable::try_new(config)?` without attaching 
the runtime cache?
   
   With `ListingTable` now defaulting `collected_statistics` to `None`, could 
this mean that `SessionContext::read_parquet`, `read_csv`, `read_json`, 
`read_avro`, and similar methods no longer benefit from the file statistics 
cache?
   
   If so, would this be considered a regression in the default-cache rollout? 
How do you see this affecting users who rely on those reader paths?



##########
datafusion/execution/src/runtime_env.rs:
##########
@@ -514,6 +535,7 @@ impl RuntimeEnvBuilder {
             Some("50M".to_owned()),
             Some("1M".to_owned()),
             None,
+            Some("1M".to_owned()),

Review Comment:
   I see that `DEFAULT_FILE_STATISTICS_MEMORY_LIMIT` was increased to 20 MiB, 
but `RuntimeEnvBuilder::entries()` still hard-codes `Some("1M".to_owned())` for 
`datafusion.runtime.file_statistics_cache_limit`.
   
   Should these two defaults be aligned? As it stands, could this create 
confusion where generated configuration docs and callers of `entries()` still 
reflect the old default while runtime behavior uses the new one?
   
   Would it make sense to update `entries()` to match the new default?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to