Hi all, I'm proposing a new solution in https://github.com/apache/iceberg/pull/13977, that is to lint markdown files under `site` directory as part of site build CI with PyMarkdownLinter <https://pymarkdown.readthedocs.io/en/latest/>. Since a symlink to `docs/docs` directory is created during site build, we can lint nightly markdown files as well. I prefer not to linting docs of previous versions. The rules are configurable and I've enabled a subset of them which are fixable. I've also added a command `make lint` to automatically fix style issues.
Please let me know what you think. Thanks, Manu On Tue, Aug 26, 2025 at 11:25 PM Manu Zhang <owenzhang1...@gmail.com> wrote: > Hi Kevin, > > You are correct that the docs are not rendered correctly. The major issue > is that the python-markdown extensions[1] are not supported by flexmark > which follows CommanMark spec[2]. Let me explore other approaches and get > back to the discussion. > > 1. https://github.com/apache/iceberg/blob/main/site/mkdocs.yml#L57 > 2. https://spec.commonmark.org/0.28/ > > Thanks, > Manu > > On Tue, Aug 26, 2025 at 1:43 AM Kevin Liu <kevinjq...@apache.org> wrote: > >> Awesome to see that spotless works here! >> >> Looks like the `flink` and `kafka-connect` folders are missing from >> `settings.gradle`, there are markdown files in those folders too. >> We should spot check a few of these changes to make sure the docs are >> rendered correctly. For example, this change >> <https://github.com/apache/iceberg/pull/13908/files#diff-691d93e2f9eb8070c1170b38a401e6005a6fd7957f2e67aa3a2a89922046848fR76-R81> >> might mess up the formatting, I've seen this issue when dealing with >> pyiceberg's docs. >> >> Best, >> Kevin Liu >> >> On Mon, Aug 25, 2025 at 10:28 AM Fokko Driesprong <fo...@apache.org> >> wrote: >> >>> Hey Manu, >>> >>> Thanks for creating the draft PR. If we go for it, we should add the >>> command to auto-fix the voilations to the top README.md >>> <https://github.com/apache/iceberg?tab=readme-ov-file#building>. I like >>> it a lot, curious to learn what others' think. >>> >>> Kind regards, >>> Fokko >>> >>> Op ma 25 aug 2025 om 18:20 schreef Manu Zhang <owenzhang1...@gmail.com>: >>> >>>> Thanks @Eduard, it's working now after including >>>> docker/docs/format/site in settings.gradle[1]. All style issues in markdown >>>> files under these folders can be spotted and fixed by gradlew commands. I >>>> agree it's the best approach. Please help double check. >>>> >>>> >>>> 1. >>>> https://github.com/apache/iceberg/pull/13908/files#diff-7f825392aa37acd1cee0c2e7b9bb7366ad6eac64f3e6cdd816e156bcb69d30de >>>> >>>> On Mon, Aug 25, 2025 at 3:12 PM Eduard Tudenhöfner < >>>> etudenhoef...@apache.org> wrote: >>>> >>>>> @Manu my guess is that it only found the markdown file that is inside >>>>> a gradle project folder, whereas other markdown files under *site* or >>>>> *format* haven't been found. Maybe check whether there's a way to >>>>> apply the formatting to folders like *site* or *format*. >>>>> >>>>> On Sat, Aug 23, 2025 at 5:41 PM Manu Zhang <owenzhang1...@gmail.com> >>>>> wrote: >>>>> >>>>>> Not sure I've configured correctly but the spotless flexmark plugin >>>>>> is only able to fix one markdown file[1]. Meanwhile, this plugin doesn't >>>>>> support any options provided by flexmark. >>>>>> >>>>>> Hi Fokko, does pre-commit require Python and we need a gradle task to >>>>>> integrate it? >>>>>> >>>>>> 1. https://github.com/apache/iceberg/pull/13908 >>>>>> >>>>>> On Fri, Aug 22, 2025 at 12:22 PM Jean-Baptiste Onofré < >>>>>> j...@nanthrax.net> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Great suggestion Manu ! Indeed, if spotless can support it, for >>>>>>> consistency, it's probably better to use it. >>>>>>> >>>>>>> Regards >>>>>>> JB >>>>>>> >>>>>>> On Thu, Aug 21, 2025 at 6:12 PM Fokko Driesprong <fo...@apache.org> >>>>>>> wrote: >>>>>>> > >>>>>>> > Hey Manu, >>>>>>> > >>>>>>> > Thanks for suggesting this, and I strongly support using a linter. >>>>>>> Recently I noticed that we use different flavors of Markdown in the >>>>>>> table, >>>>>>> and the linter would take care of that. >>>>>>> > >>>>>>> > I do have a similar remark as Eduard. If Spotless supports this, I >>>>>>> think that would be the easiest. Otherwise, I think pre-commit would >>>>>>> also >>>>>>> be a good option within the Java repo as this is also easy to run >>>>>>> locally. >>>>>>> Using pre-commit we can also add other linters (shell, end-of-line, >>>>>>> detecting debug statements, credential detection, spell-checker, etc). >>>>>>> > >>>>>>> > The biggest downside is that we might lose some version history >>>>>>> due to just reformatting. For example, if you widen a column in a >>>>>>> table, I >>>>>>> think the linter will realign the whole table. However, through GitHub >>>>>>> we >>>>>>> can easily track down the lineage. >>>>>>> > >>>>>>> > Kind regards, >>>>>>> > Fokko >>>>>>> > >>>>>>> > Off-topic: At some point, we can replace pre-commit by prek when >>>>>>> it gets mature enough. As Atwood's law states; Any application that can >>>>>>> be >>>>>>> written in Rust, will eventually be written in Rust (slightly adapted). >>>>>>> > >>>>>>> > >>>>>>> > Op do 21 aug 2025 om 17:59 schreef Eduard Tudenhöfner < >>>>>>> etudenhoef...@apache.org>: >>>>>>> >> >>>>>>> >> We're already using spotless to format Java code and spotless >>>>>>> also supports markdown files so maybe worth exploring how we could >>>>>>> achieve >>>>>>> this through spotless? >>>>>>> >> The main advantage would be that people would be able to catch >>>>>>> linting errors already locally before CI runs. >>>>>>> >> >>>>>>> >> On Thu, Aug 21, 2025 at 5:38 PM Manu Zhang < >>>>>>> owenzhang1...@gmail.com> wrote: >>>>>>> >>> >>>>>>> >>> Hi all, >>>>>>> >>> >>>>>>> >>> What do you think of adding a GitHub action to lint markdown >>>>>>> files? It can catch markdown rendering issues early and ensure a >>>>>>> consistent >>>>>>> style across markdown files. iceberg-python has already included >>>>>>> markdown >>>>>>> lint[1] in pre-commit hook. (Thanks Fokko for the suggestion!) >>>>>>> >>> >>>>>>> >>> I've a draft PR[2] that adds a Docs CI triggered on changes to >>>>>>> any markdown files. The lint rules are highly customizable via a config >>>>>>> file[3]. While fixing existing issues spotted by the CI, I'd like to get >>>>>>> early feedback from the community. >>>>>>> >>> >>>>>>> >>> 1. >>>>>>> https://github.com/apache/iceberg-python/blob/main/.pre-commit-config.yaml#L41 >>>>>>> >>> 2. https://github.com/apache/iceberg/pull/13826 >>>>>>> >>> 3. >>>>>>> https://github.com/manuzhang/iceberg/blob/markdownlint/.markdownlint.jsonc >>>>>>> >>> >>>>>>> >>> Regards, >>>>>>> >>> Manu >>>>>>> >>>>>>