I’m interested in automating our config documentation and need input from a 
committer who is interested in shepherding this work.

We have around 60 tables of configs across our documentation. Here’s a typical 
example. 
<https://github.com/apache/spark/blob/736d8ab3f00e7c5ba1b01c22f6398b636b8492ea/docs/sql-performance-tuning.md?plain=1#L65-L159>

These tables span several thousand lines of manually maintained HTML, which 
poses a few problems:
The documentation for a given config is sometimes out of sync across the HTML 
table and its source `ConfigEntry`.
Internal configs that are not supposed to be documented publicly sometimes are.
Many config names and defaults are extremely long, posing formatting problems.

Contributors waste time dealing with these issues in a losing battle to keep 
everything up-to-date and consistent.

I’d like to solve all these problems by generating HTML tables automatically 
from the `ConfigEntry` instances where the configs are defined.

I’ve proposed two alternative solutions:
#44755 <https://github.com/apache/spark/pull/44755>: Enhance `ConfigEntry` so a 
config can be associated with one or more groups, and use that new metadata to 
generate the tables we need.
#44756 <https://github.com/apache/spark/pull/44756>: Add a standalone YAML file 
where we define config groups, and use that to generate the tables we need.

If you’re a committer and are interested in this problem, please chime in on 
whatever approach appeals to you. If you think this is a bad idea, I’m also 
eager to hear your feedback.

Nick

Reply via email to