I know config documentation is not the most exciting thing. If there is anything I can do to make this as easy as possible for a committer to shepherd, I’m all ears!
> On Feb 14, 2024, at 8:53 PM, Nicholas Chammas <nicholas.cham...@gmail.com> > wrote: > > I’m interested in automating our config documentation and need input from a > committer who is interested in shepherding this work. > > We have around 60 tables of configs across our documentation. Here’s a > typical example. > <https://github.com/apache/spark/blob/736d8ab3f00e7c5ba1b01c22f6398b636b8492ea/docs/sql-performance-tuning.md?plain=1#L65-L159> > > These tables span several thousand lines of manually maintained HTML, which > poses a few problems: > The documentation for a given config is sometimes out of sync across the HTML > table and its source `ConfigEntry`. > Internal configs that are not supposed to be documented publicly sometimes > are. > Many config names and defaults are extremely long, posing formatting problems. > > Contributors waste time dealing with these issues in a losing battle to keep > everything up-to-date and consistent. > > I’d like to solve all these problems by generating HTML tables automatically > from the `ConfigEntry` instances where the configs are defined. > > I’ve proposed two alternative solutions: > #44755 <https://github.com/apache/spark/pull/44755>: Enhance `ConfigEntry` so > a config can be associated with one or more groups, and use that new metadata > to generate the tables we need. > #44756 <https://github.com/apache/spark/pull/44756>: Add a standalone YAML > file where we define config groups, and use that to generate the tables we > need. > > If you’re a committer and are interested in this problem, please chime in on > whatever approach appeals to you. If you think this is a bad idea, I’m also > eager to hear your feedback. > > Nick >