I know config documentation is not the most exciting thing. If there is 
anything I can do to make this as easy as possible for a committer to shepherd, 
I’m all ears!


> On Feb 14, 2024, at 8:53 PM, Nicholas Chammas <nicholas.cham...@gmail.com> 
> wrote:
> 
> I’m interested in automating our config documentation and need input from a 
> committer who is interested in shepherding this work.
> 
> We have around 60 tables of configs across our documentation. Here’s a 
> typical example. 
> <https://github.com/apache/spark/blob/736d8ab3f00e7c5ba1b01c22f6398b636b8492ea/docs/sql-performance-tuning.md?plain=1#L65-L159>
> 
> These tables span several thousand lines of manually maintained HTML, which 
> poses a few problems:
> The documentation for a given config is sometimes out of sync across the HTML 
> table and its source `ConfigEntry`.
> Internal configs that are not supposed to be documented publicly sometimes 
> are.
> Many config names and defaults are extremely long, posing formatting problems.
> 
> Contributors waste time dealing with these issues in a losing battle to keep 
> everything up-to-date and consistent.
> 
> I’d like to solve all these problems by generating HTML tables automatically 
> from the `ConfigEntry` instances where the configs are defined.
> 
> I’ve proposed two alternative solutions:
> #44755 <https://github.com/apache/spark/pull/44755>: Enhance `ConfigEntry` so 
> a config can be associated with one or more groups, and use that new metadata 
> to generate the tables we need.
> #44756 <https://github.com/apache/spark/pull/44756>: Add a standalone YAML 
> file where we define config groups, and use that to generate the tables we 
> need.
> 
> If you’re a committer and are interested in this problem, please chime in on 
> whatever approach appeals to you. If you think this is a bad idea, I’m also 
> eager to hear your feedback.
> 
> Nick
> 

Reply via email to