alamb commented on code in PR #7620:
URL: https://github.com/apache/arrow-datafusion/pull/7620#discussion_r1334963433


##########
datafusion/execution/src/cache/cache_manager.rs:
##########
@@ -61,6 +79,10 @@ pub struct CacheManagerConfig {
     /// Avoid get same file statistics repeatedly in same datafusion session.
     /// Default is disable. Fow now only supports Parquet files.
     pub table_files_statistics_cache: Option<FileStatisticsCache>,
+    /// Enable cache of files statistics when listing files.

Review Comment:
   ```suggestion
       /// Enable cache of files to list when listing files.
   ```



##########
datafusion/execution/src/cache/cache_unit.rs:
##########
@@ -94,9 +94,69 @@ impl CacheAccessor<Path, Arc<Statistics>> for 
DefaultFileStatisticsCache {
     }
 }
 
+/// Collected files metadata for listing files.
+/// Cache will not invalided until user call remove or clear.
+#[derive(Default)]
+pub struct DefaultListFilesCache {
+    statistics: DashMap<Path, Arc<Vec<ObjectMeta>>>,

Review Comment:
   the use of `statistics` as a name here seems confusing as this doesn't hold 
statistics, but rather paths.
   
   Perhaps a name like `paths` would be clearer 🤔 



##########
datafusion/core/src/execution/context.rs:
##########
@@ -1405,6 +1405,15 @@ pub fn default_session_builder(config: SessionConfig) -> 
SessionState {
     SessionState::with_config_rt(config, Arc::new(RuntimeEnv::default()))
 }
 
+impl Default for SessionState {

Review Comment:
   I am worried that providing a Default for SessionState might allow subtle 
bugs, specifically accidentally running queries or other operations without 
properly threading through the configuration or runtime. I believe that is why 
there is no `SessionState::new()` or `Default` 
   
   🤔  this context would be nice to add as a comment to SessonState
   
   



##########
datafusion/execution/src/cache/cache_unit.rs:
##########
@@ -94,9 +94,69 @@ impl CacheAccessor<Path, Arc<Statistics>> for 
DefaultFileStatisticsCache {
     }
 }
 
+/// Collected files metadata for listing files.
+/// Cache will not invalided until user call remove or clear.
+#[derive(Default)]
+pub struct DefaultListFilesCache {
+    statistics: DashMap<Path, Arc<Vec<ObjectMeta>>>,

Review Comment:
   the use of `statistics` as a name here seems confusing as this doesn't hold 
statistics, but rather paths.
   
   Perhaps a name like `paths` would be clearer 🤔 



##########
datafusion/core/src/execution/context.rs:
##########
@@ -1405,6 +1405,15 @@ pub fn default_session_builder(config: SessionConfig) -> 
SessionState {
     SessionState::with_config_rt(config, Arc::new(RuntimeEnv::default()))
 }
 
+impl Default for SessionState {

Review Comment:
   I am worried that providing a Default for SessionState might allow subtle 
bugs, specifically accidentally running queries or other operations without 
properly threading through the configuration or runtime. I believe that is why 
there is no `SessionState::new()` or `Default` 
   
   🤔  this context would be nice to add as a comment to SessonState
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to