tfeda commented on issue #479:
URL: https://github.com/apache/arrow-ballista/issues/479#issuecomment-1324451046

   I'm interested in picking this up, mostly because there are some other 
configurations I'm interested in adding that would just add to the work needed 
here. I'm not partial to keeping things in a `HashMap` vs a `struct`, but I 
think that it should be consistent across the components (the client uses the 
former, while the scheduler uses the latter). 
   
   configure_me (current library) has support for env variables that can be 
added in. [config-rs](https://github.com/mehcode/config-rs) is another popular 
library that supports both env variables and configuration files. A downside to 
switching to envy or config-rs is the loss of man page generation and argument 
parsing that configure_me provides to the scheduler/executor binaries, but 
maybe that's not needed.
   
   Considering DataFusion, right now we just use the default configurations 
with some custom inputs. 
   ```
   pub fn create_df_ctx_with_ballista_query_planner() {
      ...
      let session_config = SessionConfig::new()
           .with_target_partitions(config.default_shuffle_partitions())
           .with_information_schema(true);
      ...
   }
   ```
   I think it would be good to allow users to input their own `SessionConfig`, 
or to use its `from_env()` function in addition to the BallistaConfig when 
working with environment variables on the client side. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to