yjshen commented on a change in pull request #2029:
URL: https://github.com/apache/arrow-datafusion/pull/2029#discussion_r830439765
##########
File path: datafusion/src/execution/context.rs
##########
@@ -816,17 +732,28 @@ impl QueryPlanner for DefaultQueryPlanner {
}
}
-/// Configuration options for execution context
+/// Session Configuration entry name
+pub const BATCH_SIZE: &str = "batch_size";
+/// Session Configuration entry name
+pub const TARGET_PARTITIONS: &str = "target_partitions";
+/// Session Configuration entry name
+pub const REPARTITION_JOINS: &str = "repartition_joins";
+/// Session Configuration entry name
+pub const REPARTITION_AGGREGATIONS: &str = "repartition_aggregations";
+/// Session Configuration entry name
+pub const REPARTITION_WINDOWS: &str = "repartition_windows";
+/// Session Configuration entry name
Review comment:
Please update these docs
##########
File path: datafusion/src/execution/context.rs
##########
@@ -846,39 +773,14 @@ pub struct SessionConfig {
/// parallel using the provided `target_partitions` level
pub repartition_windows: bool,
/// Should DataFusion parquet reader using the predicate to prune data
- parquet_pruning: bool,
- /// Runtime configurations such as memory threshold and local disk for
spill
- pub runtime: RuntimeConfig,
+ pub parquet_pruning: bool,
Review comment:
My feelings are getting stronger that we should use a hashmap or an
existing [`config`](https://crates.io/crates/config) lib to store the growing
number of configurations.
We could provide "helper" methods to fast read/write typed confs, and for
the existing ones:
```rust
pub fn get_string(&self, key: &str) -> Result<String> {
self.get(key).and_then(Value::into_string)
}
pub fn get_usize(&self, key: &str) -> Result<usize> {
self.get(key).and_then(Value::into_usize)
}
pub fn get_bool(&self, key: &str) -> Result<bool> {
self.get(key).and_then(Value::into_bool)
}
pub fn batch_size(&self) -> usize {
self.get_usize(BATCH_SIZE).unwrap_or_default(DEFAULT_BATCH_SIZE)
}
```
And we will provide more flexibility for DataFusion users of passing down
customized configurations into `PhysicalExec` or `UDF`s
##########
File path: datafusion/src/execution/context.rs
##########
@@ -1289,14 +1245,45 @@ impl TaskContext {
runtime,
}
}
+
+ /// Return the SessionConfig associated with the Task
+ pub fn session_config(&self) -> SessionConfig {
+ let task_settings = &self.task_settings;
Review comment:
Rename `task_settings` to `properties` as suggested in the previous PR.
##########
File path: datafusion/src/execution/context.rs
##########
@@ -846,39 +773,14 @@ pub struct SessionConfig {
/// parallel using the provided `target_partitions` level
pub repartition_windows: bool,
/// Should DataFusion parquet reader using the predicate to prune data
- parquet_pruning: bool,
- /// Runtime configurations such as memory threshold and local disk for
spill
- pub runtime: RuntimeConfig,
+ pub parquet_pruning: bool,
Review comment:
We could also unify `TaskProperties`:
``` rust
pub enum TaskProperties {
///SessionConfig
SessionConfig(SessionConfig),
/// Name-value pairs of task properties
KVPairs(HashMap<String, String>),
}
```
##########
File path: datafusion/src/execution/context.rs
##########
@@ -208,6 +186,11 @@ impl SessionContext {
self.state.lock().runtime_env.clone()
}
+ /// Return a copied version of config for this Session
+ pub fn copied_config(&self) -> SessionConfig {
Review comment:
`copied_` is unnecessary.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]