tfeda commented on issue #479: URL: https://github.com/apache/arrow-ballista/issues/479#issuecomment-1324451046
I'm interested in picking this up, mostly because there are some other configurations I'm interested in adding that would just add to the work needed here. I'm not partial to keeping things in a `HashMap` vs a `struct`, but I think that it should be consistent across the components (the client uses the former, while the scheduler uses the latter). configure_me (current library) has support for env variables that can be added in. [config-rs](https://github.com/mehcode/config-rs) is another popular library that supports both env variables and configuration files. A downside to switching to envy or config-rs is the loss of man page generation and argument parsing that configure_me provides to the scheduler/executor binaries, but maybe that's not needed. Considering DataFusion, right now we just use the default configurations with some custom inputs. ``` pub fn create_df_ctx_with_ballista_query_planner() { ... let session_config = SessionConfig::new() .with_target_partitions(config.default_shuffle_partitions()) .with_information_schema(true); ... } ``` I think it would be good to allow users to input their own `SessionConfig`, or to use its `from_env()` function in addition to the BallistaConfig when working with environment variables on the client side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
