Thank you, Holden! Yes, having everything live in the ConfigEntry is attractive.
The main reason I proposed an alternative where the groups are defined in YAML is that if the config groups are defined in ConfigEntry, then altering the groupings – which is relevant only to the display of config documentation – requires rebuilding Spark. This feels a bit off to me in terms of design. For example, on the SQL performance tuning page there is some narrative documentation about caching <https://spark.apache.org/docs/3.5.0/sql-performance-tuning.html#caching-data-in-memory>, plus a table of relevant configs. If I want an additional config to show up in this table, I need to add it to the config group that backs the table. With the ConfigEntry approach in #44755 <https://github.com/apache/spark/pull/44755>, that means editing the appropriate ConfigEntry and rebuilding Spark before I can regenerate the config table. val SOME_CONFIG = buildConf("spark.sql.someCachingRelatedConfig") .doc("some documentation") .version("2.1.0") .withDocumentationGroup("sql-tuning-caching-data") // assign group to the config With the YAML approach in #44756 <https://github.com/apache/spark/pull/44756>, that means editing the config group defined in the YAML file and regenerating the config table. No Spark rebuild required. sql-tuning-caching-data: - spark.sql.inMemoryColumnarStorage.compressed - spark.sql.inMemoryColumnarStorage.batchSize - spark.sql.someCachingRelatedConfig # add config to the group In both cases the config names, descriptions, defaults, etc. will be pulled from the ConfigEntry when building the HTML tables. I prefer the latter approach but I’m open to whatever committers are more comfortable with. If you prefer the former, then I’ll focus on that and ping you for reviews accordingly! > On Feb 21, 2024, at 11:43 AM, Holden Karau <hol...@pigscanfly.ca> wrote: > > I think this is a good idea. I like having everything in one source of truth > rather than two (so option 1 sounds like a good idea); but that’s just my > opinion. I'd be happy to help with reviews though. > > On Wed, Feb 21, 2024 at 6:37 AM Nicholas Chammas <nicholas.cham...@gmail.com > <mailto:nicholas.cham...@gmail.com>> wrote: >> I know config documentation is not the most exciting thing. If there is >> anything I can do to make this as easy as possible for a committer to >> shepherd, I’m all ears! >> >> >>> On Feb 14, 2024, at 8:53 PM, Nicholas Chammas <nicholas.cham...@gmail.com >>> <mailto:nicholas.cham...@gmail.com>> wrote: >>> >>> I’m interested in automating our config documentation and need input from a >>> committer who is interested in shepherding this work. >>> >>> We have around 60 tables of configs across our documentation. Here’s a >>> typical example. >>> <https://github.com/apache/spark/blob/736d8ab3f00e7c5ba1b01c22f6398b636b8492ea/docs/sql-performance-tuning.md?plain=1#L65-L159> >>> >>> These tables span several thousand lines of manually maintained HTML, which >>> poses a few problems: >>> The documentation for a given config is sometimes out of sync across the >>> HTML table and its source `ConfigEntry`. >>> Internal configs that are not supposed to be documented publicly sometimes >>> are. >>> Many config names and defaults are extremely long, posing formatting >>> problems. >>> >>> Contributors waste time dealing with these issues in a losing battle to >>> keep everything up-to-date and consistent. >>> >>> I’d like to solve all these problems by generating HTML tables >>> automatically from the `ConfigEntry` instances where the configs are >>> defined. >>> >>> I’ve proposed two alternative solutions: >>> #44755 <https://github.com/apache/spark/pull/44755>: Enhance `ConfigEntry` >>> so a config can be associated with one or more groups, and use that new >>> metadata to generate the tables we need. >>> #44756 <https://github.com/apache/spark/pull/44756>: Add a standalone YAML >>> file where we define config groups, and use that to generate the tables we >>> need. >>> >>> If you’re a committer and are interested in this problem, please chime in >>> on whatever approach appeals to you. If you think this is a bad idea, I’m >>> also eager to hear your feedback. >>> >>> Nick >>> > >