I’m interested in automating our config documentation and need input from a committer who is interested in shepherding this work.
We have around 60 tables of configs across our documentation. Here’s a typical example. <https://github.com/apache/spark/blob/736d8ab3f00e7c5ba1b01c22f6398b636b8492ea/docs/sql-performance-tuning.md?plain=1#L65-L159> These tables span several thousand lines of manually maintained HTML, which poses a few problems: The documentation for a given config is sometimes out of sync across the HTML table and its source `ConfigEntry`. Internal configs that are not supposed to be documented publicly sometimes are. Many config names and defaults are extremely long, posing formatting problems. Contributors waste time dealing with these issues in a losing battle to keep everything up-to-date and consistent. I’d like to solve all these problems by generating HTML tables automatically from the `ConfigEntry` instances where the configs are defined. I’ve proposed two alternative solutions: #44755 <https://github.com/apache/spark/pull/44755>: Enhance `ConfigEntry` so a config can be associated with one or more groups, and use that new metadata to generate the tables we need. #44756 <https://github.com/apache/spark/pull/44756>: Add a standalone YAML file where we define config groups, and use that to generate the tables we need. If you’re a committer and are interested in this problem, please chime in on whatever approach appeals to you. If you think this is a bad idea, I’m also eager to hear your feedback. Nick