A few years ago when I was doing more deployment management I kicked around the idea of having different types of configs or different ways to specify the configs. Though one of the problems at the time was actually with users specifying a properties file and not picking up the spark-defaults.conf. So I was thinking about creating like a spark-admin.conf or something to that nature. I think there is benefit in it, it just comes down to how to implement it best. The other thing I don't think I saw addressed was the the ability prevent user from overriding configs. If you just do the defaults I presume users could still override it. That gets a bit trickier especially if they can override the entire spark-defaults.conf file.
Tom On Thursday, August 11, 2022, 12:16:10 PM CDT, Mridul Muralidharan <mri...@gmail.com> wrote: Hi, Wenchen, would be great if you could chime in with your thoughts - given the feedback you originally had on the PR.It would be great to hear feedback from others on this, particularly folks managing spark deployments - how this is mitigated/avoided in your case, any other pain points with configs in this context. Regards,Mridul On Wed, Jul 27, 2022 at 12:28 PM Erik Krogen <xkro...@apache.org> wrote: I find there's substantial value in being able to set defaults, and I think we can see that the community finds value in it as well, given the handful of "default"-like configs that exist today as mentioned in Shardul's email. The mismatch of conventions used today (suffix with ".defaultList", change "extra" to "default", ...) is confusing and inconsistent, plus requires one-off additions for each config. My proposal here would be: - Define a clear convention, e.g. a suffix of ".default" that enables a default to be set and merged - Document this convention in configuration.md so that we can avoid separately documenting each default-config, and instead just add a note in the docs for the normal config. - Adjust the withPrepended method added in #24804 to leverage this convention instead of each usage instance re-defining the additional config name - Do a comprehensive review of applicable configs and enable them all to use the newly updated withPrepended method Wenchen, you expressed some concerns with adding more default configs in #34856, would this proposal address those concerns? Thanks,Erik On Wed, Jul 13, 2022 at 11:54 PM Shardul Mahadik <shardulsmaha...@gmail.com> wrote: Hi Spark devs, Spark contains a bunch of array-like configs (comma separated lists). Some examples include `spark.sql.extensions`, `spark.sql.queryExecutionListeners`, `spark.jars.repositories`, `spark.extraListeners`, `spark.driver.extraClassPath` and so on (there are a dozen or so more). As owners of the Spark platform in our organization, we would like to set platform-level defaults, e.g. custom SQL extension and listeners, and we use some of the above mentioned properties to do so. At the same time, we have power users writing their own listeners, setting the same Spark confs and thus unintentionally overriding our platform defaults. This leads to a loss of functionality within our platform. Previously, Spark has introduced "default" confs for a few of these array-like configs, e.g. `spark.plugins.defaultList` for `spark.plugins`, `spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`. These properties are meant to only be set by cluster admins thus allowing separation between platform default and user configs. However, as discussed in https://github.com/apache/spark/pull/34856, these configs are still client-side and can still be overridden, while also not being a scalable solution as we cannot introduce 1 new "default" config for every array-like config. I wanted to know if others have experienced this issue and what systems have been implemented to tackle this. Are there any existing solutions for this; either client-side or server-side? (e.g. at job submission server). Even though we cannot easily enforce this at the client-side, the simplicity of a solution may make it more appealing. Thanks, Shardul