Hi all,

I'm proposing a new solution in https://github.com/apache/iceberg/pull/13977,
that is to lint markdown files under `site` directory as part of site build
CI with PyMarkdownLinter <https://pymarkdown.readthedocs.io/en/latest/>.
Since a symlink to `docs/docs` directory is created during site build, we
can lint nightly markdown files as well. I prefer not to linting docs of
previous versions.
The rules are configurable and I've enabled a subset of them which are
fixable. I've also added a command `make lint` to automatically fix style
issues.

Please let me know what you think.

Thanks,
Manu

On Tue, Aug 26, 2025 at 11:25 PM Manu Zhang <owenzhang1...@gmail.com> wrote:

> Hi Kevin,
>
> You are correct that the docs are not rendered correctly. The major issue
> is that the python-markdown extensions[1] are not supported by flexmark
> which follows CommanMark spec[2]. Let me explore other approaches and get
> back to the discussion.
>
> 1. https://github.com/apache/iceberg/blob/main/site/mkdocs.yml#L57
> 2. https://spec.commonmark.org/0.28/
>
> Thanks,
> Manu
>
> On Tue, Aug 26, 2025 at 1:43 AM Kevin Liu <kevinjq...@apache.org> wrote:
>
>> Awesome to see that spotless works here!
>>
>> Looks like the `flink` and `kafka-connect` folders are missing from
>> `settings.gradle`, there are markdown files in those folders too.
>> We should spot check a few of these changes to make sure the docs are
>> rendered correctly. For example, this change
>> <https://github.com/apache/iceberg/pull/13908/files#diff-691d93e2f9eb8070c1170b38a401e6005a6fd7957f2e67aa3a2a89922046848fR76-R81>
>> might mess up the formatting, I've seen this issue when dealing with
>> pyiceberg's docs.
>>
>> Best,
>> Kevin Liu
>>
>> On Mon, Aug 25, 2025 at 10:28 AM Fokko Driesprong <fo...@apache.org>
>> wrote:
>>
>>> Hey Manu,
>>>
>>> Thanks for creating the draft PR. If we go for it, we should add the
>>> command to auto-fix the voilations to the top README.md
>>> <https://github.com/apache/iceberg?tab=readme-ov-file#building>. I like
>>> it a lot, curious to learn what others' think.
>>>
>>> Kind regards,
>>> Fokko
>>>
>>> Op ma 25 aug 2025 om 18:20 schreef Manu Zhang <owenzhang1...@gmail.com>:
>>>
>>>> Thanks @Eduard, it's working now after including
>>>> docker/docs/format/site in settings.gradle[1]. All style issues in markdown
>>>> files under these folders can be spotted and fixed by gradlew commands. I
>>>> agree it's the best approach. Please help double check.
>>>>
>>>>
>>>> 1.
>>>> https://github.com/apache/iceberg/pull/13908/files#diff-7f825392aa37acd1cee0c2e7b9bb7366ad6eac64f3e6cdd816e156bcb69d30de
>>>>
>>>> On Mon, Aug 25, 2025 at 3:12 PM Eduard Tudenhöfner <
>>>> etudenhoef...@apache.org> wrote:
>>>>
>>>>> @Manu my guess is that it only found the markdown file that is inside
>>>>> a gradle project folder, whereas other markdown files under *site* or
>>>>> *format* haven't been found. Maybe check whether there's a way to
>>>>> apply the formatting to folders like *site* or *format*.
>>>>>
>>>>> On Sat, Aug 23, 2025 at 5:41 PM Manu Zhang <owenzhang1...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Not sure I've configured correctly but the spotless flexmark plugin
>>>>>> is only able to fix one markdown file[1]. Meanwhile, this plugin doesn't
>>>>>> support any options provided by flexmark.
>>>>>>
>>>>>> Hi Fokko, does pre-commit require Python and we need a gradle task to
>>>>>> integrate it?
>>>>>>
>>>>>> 1. https://github.com/apache/iceberg/pull/13908
>>>>>>
>>>>>> On Fri, Aug 22, 2025 at 12:22 PM Jean-Baptiste Onofré <
>>>>>> j...@nanthrax.net> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Great suggestion Manu ! Indeed, if spotless can support it, for
>>>>>>> consistency, it's probably better to use it.
>>>>>>>
>>>>>>> Regards
>>>>>>> JB
>>>>>>>
>>>>>>> On Thu, Aug 21, 2025 at 6:12 PM Fokko Driesprong <fo...@apache.org>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > Hey Manu,
>>>>>>> >
>>>>>>> > Thanks for suggesting this, and I strongly support using a linter.
>>>>>>> Recently I noticed that we use different flavors of Markdown in the 
>>>>>>> table,
>>>>>>> and the linter would take care of that.
>>>>>>> >
>>>>>>> > I do have a similar remark as Eduard. If Spotless supports this, I
>>>>>>> think that would be the easiest. Otherwise, I think pre-commit would 
>>>>>>> also
>>>>>>> be a good option within the Java repo as this is also easy to run 
>>>>>>> locally.
>>>>>>> Using pre-commit we can also add other linters (shell, end-of-line,
>>>>>>> detecting debug statements, credential detection, spell-checker, etc).
>>>>>>> >
>>>>>>> > The biggest downside is that we might lose some version history
>>>>>>> due to just reformatting. For example, if you widen a column in a 
>>>>>>> table, I
>>>>>>> think the linter will realign the whole table. However, through GitHub 
>>>>>>> we
>>>>>>> can easily track down the lineage.
>>>>>>> >
>>>>>>> > Kind regards,
>>>>>>> > Fokko
>>>>>>> >
>>>>>>> > Off-topic: At some point, we can replace pre-commit by prek when
>>>>>>> it gets mature enough. As Atwood's law states; Any application that can 
>>>>>>> be
>>>>>>> written in Rust, will eventually be written in Rust (slightly adapted).
>>>>>>> >
>>>>>>> >
>>>>>>> > Op do 21 aug 2025 om 17:59 schreef Eduard Tudenhöfner <
>>>>>>> etudenhoef...@apache.org>:
>>>>>>> >>
>>>>>>> >> We're already using spotless to format Java code and spotless
>>>>>>> also supports markdown files so maybe worth exploring how we could 
>>>>>>> achieve
>>>>>>> this through spotless?
>>>>>>> >> The main advantage would be that people would be able to catch
>>>>>>> linting errors already locally before CI runs.
>>>>>>> >>
>>>>>>> >> On Thu, Aug 21, 2025 at 5:38 PM Manu Zhang <
>>>>>>> owenzhang1...@gmail.com> wrote:
>>>>>>> >>>
>>>>>>> >>> Hi all,
>>>>>>> >>>
>>>>>>> >>> What do you think of adding a GitHub action to lint markdown
>>>>>>> files? It can catch markdown rendering issues early and ensure a 
>>>>>>> consistent
>>>>>>> style across markdown files. iceberg-python has already included 
>>>>>>> markdown
>>>>>>> lint[1] in pre-commit hook. (Thanks Fokko for the suggestion!)
>>>>>>> >>>
>>>>>>> >>> I've a draft PR[2] that adds a Docs CI triggered on changes to
>>>>>>> any markdown files. The lint rules are highly customizable via a config
>>>>>>> file[3]. While fixing existing issues spotted by the CI, I'd like to get
>>>>>>> early feedback from the community.
>>>>>>> >>>
>>>>>>> >>> 1.
>>>>>>> https://github.com/apache/iceberg-python/blob/main/.pre-commit-config.yaml#L41
>>>>>>> >>> 2. https://github.com/apache/iceberg/pull/13826
>>>>>>> >>> 3.
>>>>>>> https://github.com/manuzhang/iceberg/blob/markdownlint/.markdownlint.jsonc
>>>>>>> >>>
>>>>>>> >>> Regards,
>>>>>>> >>> Manu
>>>>>>>
>>>>>>

Reply via email to