[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #2029: Refactor SessionContext, SessionState and SessionConfig to support multi-tenancy configurations - Part 2

GitBox Fri, 18 Mar 2022 22:10:06 -0700


yjshen commented on a change in pull request #2029:
URL: https://github.com/apache/arrow-datafusion/pull/2029#discussion_r830439765




##########
File path: datafusion/src/execution/context.rs
##########
@@ -816,17 +732,28 @@ impl QueryPlanner for DefaultQueryPlanner {
     }
 }
 
-/// Configuration options for execution context
+/// Session Configuration entry name
+pub const BATCH_SIZE: &str = "batch_size";
+/// Session Configuration entry name
+pub const TARGET_PARTITIONS: &str = "target_partitions";
+/// Session Configuration entry name
+pub const REPARTITION_JOINS: &str = "repartition_joins";
+/// Session Configuration entry name
+pub const REPARTITION_AGGREGATIONS: &str = "repartition_aggregations";
+/// Session Configuration entry name
+pub const REPARTITION_WINDOWS: &str = "repartition_windows";
+/// Session Configuration entry name

Review comment:
       Please update these docs

##########
File path: datafusion/src/execution/context.rs
##########
@@ -846,39 +773,14 @@ pub struct SessionConfig {
     /// parallel using the provided `target_partitions` level
     pub repartition_windows: bool,
     /// Should DataFusion parquet reader using the predicate to prune data
-    parquet_pruning: bool,
-    /// Runtime configurations such as memory threshold and local disk for 
spill
-    pub runtime: RuntimeConfig,
+    pub parquet_pruning: bool,

Review comment:
       My feelings are getting stronger that we should use a hashmap or an 
existing [`config`](https://crates.io/crates/config) lib to store the growing 
number of configurations.
   
   We could provide "helper" methods to fast read/write typed confs, and for 
the existing ones:
   
   ```rust
   pub fn get_string(&self, key: &str) -> Result<String> {
       self.get(key).and_then(Value::into_string)
   }
   
   pub fn get_usize(&self, key: &str) -> Result<usize> {
       self.get(key).and_then(Value::into_usize)
   }
   
   pub fn get_bool(&self, key: &str) -> Result<bool> {
       self.get(key).and_then(Value::into_bool)
   }
   
   pub fn batch_size(&self) -> usize {
       self.get_usize(BATCH_SIZE).unwrap_or_default(DEFAULT_BATCH_SIZE)
   }
   ```
   
   And we will provide more flexibility for DataFusion users of passing down 
customized configurations into `PhysicalExec` or `UDF`s

##########
File path: datafusion/src/execution/context.rs
##########
@@ -1289,14 +1245,45 @@ impl TaskContext {
             runtime,
         }
     }
+
+    /// Return the SessionConfig associated with the Task
+    pub fn session_config(&self) -> SessionConfig {
+        let task_settings = &self.task_settings;

Review comment:
       Rename `task_settings` to `properties` as suggested in the previous PR.

##########
File path: datafusion/src/execution/context.rs
##########
@@ -846,39 +773,14 @@ pub struct SessionConfig {
     /// parallel using the provided `target_partitions` level
     pub repartition_windows: bool,
     /// Should DataFusion parquet reader using the predicate to prune data
-    parquet_pruning: bool,
-    /// Runtime configurations such as memory threshold and local disk for 
spill
-    pub runtime: RuntimeConfig,
+    pub parquet_pruning: bool,

Review comment:
       We could also unify `TaskProperties`:
   ``` rust
   pub enum TaskProperties {
       ///SessionConfig
       SessionConfig(SessionConfig),
       /// Name-value pairs of task properties
       KVPairs(HashMap<String, String>),
   }
   ```
   

##########
File path: datafusion/src/execution/context.rs
##########
@@ -208,6 +186,11 @@ impl SessionContext {
         self.state.lock().runtime_env.clone()
     }
 
+    /// Return a copied version of config for this Session
+    pub fn copied_config(&self) -> SessionConfig {

Review comment:
       `copied_` is unnecessary.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #2029: Refactor SessionContext, SessionState and SessionConfig to support multi-tenancy configurations - Part 2

Reply via email to