Package: release.debian.org Severity: normal Tags: bookworm X-Debbugs-Cc: [email protected] Control: affects -1 + src:python-markdown User: [email protected] Usertags: pu
[ Reason ]
This upload fixes two issues:
1. CVE-2025-69534: parser crash on malformed <![ sequences. There are two
patches for this bug: bogus_comments.diff (backported from 3.5.2, pre-CVE)
and incomplete_markup_declaration.diff (backported from 3.8.1).
2. Bug #1137043: Fix for tests failures with python3.11 >= 3.11.2-6+deb12u7,
where some changes were made to html.parser to address CVE-2025-6069, which
broke Python-Markdown because it heavily relies on html.parser internals.
[ Impact ]
CVE-2025-69534 enables remote, unauthenticated Denial of Service in web
applications, documentation systems, CI/CD pipelines, and any service that
renders untrusted Markdown.
[ Tests ]
All changes are covered by automated tests, which are run during build.
[ Risks ]
The changes have been part of upstream Python-Markdown for a while, and well
covered by tests, so they should be safe.
[ Checklist ]
[x] *all* changes are documented in the d/changelog
[x] I reviewed all changes and I approve them
[x] attach debdiff against the package in (old)stable
[x] the issue is verified as fixed in unstable
[ Changes ]
* Backport upstream fixes for parsing bogus HTML markup (CVE-2025-69534).
* Adapt to changes in html.parser module in the new Python, backported
to Bookworm as part of CVE fixes (closes: #1137043).
There are also branch changes in debian/gbp.conf and debian/gitlab-ci.yml,
which are needed for the CI, but those files to not affect the built package
in any way.
[ Other info ]
See also #1137180: similar upload to Trixie.
See also #1131896: discussion about whether CVE-2025-69534 needs to be
addressed in Python 3.11 itself, not in Python-Markdown. In case it happens
at some point, Python-Markdown should not break. I am not waiting for a fix
in Python and including a workaround in Python-Markdown itself, since it is
needed for the #1137043 patch to apply cleanly anyway.
--
Dmitry Shachnev
--- a/debian/changelog +++ b/debian/changelog @@ -1,3 +1,11 @@ +python-markdown (3.4.1-2+deb12u1) bookworm; urgency=medium + + * Backport upstream fixes for parsing bogus HTML markup (CVE-2025-69534). + * Adapt to changes in html.parser module in the new Python, backported + to Bookworm as part of CVE fixes (closes: #1137043). + + -- Dmitry Shachnev <[email protected]> Wed, 20 May 2026 14:32:50 +0300 + python-markdown (3.4.1-2) unstable; urgency=medium * Team upload. --- a/debian/gbp.conf +++ b/debian/gbp.conf @@ -1,2 +1,2 @@ [DEFAULT] -debian-branch=debian/master +debian-branch=debian/bookworm --- a/debian/gitlab-ci.yml +++ b/debian/gitlab-ci.yml @@ -3,4 +3,4 @@ include: - https://salsa.debian.org/salsa-ci-team/pipeline/raw/master/pipeline-jobs.yml variables: - RELEASE: 'unstable' + RELEASE: 'bookworm' --- /dev/null +++ b/debian/patches/bogus_comments.diff @@ -0,0 +1,66 @@ +From: Waylan Limberg <[email protected]> +Date: Wed, 3 Jan 2024 13:24:33 -0500 +Subject: Fix handling of bogus comments. + +As with most implementations, we now pass through bogus comments (as +defined by the HTML Spec) unaltered except that they are HTML escaped. +This deviates from the reference implementation which completely ignores +them. As the reference implementation seems to not have even contemplated +their existence, it is not being used as a reference in this instance. +Fixes #1425. + +(cherry picked from commit e466f381d09692f484f8ff022273e2ac8cea0b16) +--- + markdown/htmlparser.py | 9 +++++++++ + tests/test_syntax/blocks/test_html_blocks.py | 16 ++++++++-------- + 2 files changed, 17 insertions(+), 8 deletions(-) + +diff --git a/markdown/htmlparser.py b/markdown/htmlparser.py +index 3512d1a..586bddd 100644 +--- a/markdown/htmlparser.py ++++ b/markdown/htmlparser.py +@@ -262,6 +262,15 @@ class HTMLExtractor(htmlparser.HTMLParser): + self.handle_data('<!') + return i + 2 + ++ def parse_bogus_comment(self, i: int, report: int = 0) -> int: ++ # Override the default behavior so that bogus comments get passed ++ # through unaltered by setting `report` to `0` (see #1425). ++ pos = super().parse_bogus_comment(i, report) ++ if pos == -1: # pragma: no cover ++ return -1 ++ self.handle_empty_tag(self.rawdata[i:pos], is_block=False) ++ return pos ++ + # The rest has been copied from base class in standard lib to address #1036. + # As __startag_text is private, all references to it must be in this subclass. + # The last few lines of parse_starttag are reversed so that handle_starttag +diff --git a/tests/test_syntax/blocks/test_html_blocks.py b/tests/test_syntax/blocks/test_html_blocks.py +index 9ec0668..4a4a06e 100644 +--- a/tests/test_syntax/blocks/test_html_blocks.py ++++ b/tests/test_syntax/blocks/test_html_blocks.py +@@ -782,16 +782,16 @@ class TestHTMLBlocks(TestCase): + '<!-- *foo* -->' + ) + +- # Note: this is a change in behavior for Python-Markdown, which does *not* match the reference +- # implementation. However, it does match the HTML5 spec. Declarations must start with either +- # `<!DOCTYPE` or `<![`. Anything else that starts with `<!` is a comment. According to the +- # HTML5 spec, a comment without the hyphens is a "bogus comment", but a comment nonetheless. +- # See https://www.w3.org/TR/html52/syntax.html#markup-declaration-open-state. +- # If we wanted to change this behavior, we could override `HTMLParser.parse_bogus_comment()`. + def test_bogus_comment(self): + self.assertMarkdownRenders( +- '<!*foo*>', +- '<!--*foo*-->' ++ '<!invalid>', ++ '<p><!invalid></p>' ++ ) ++ ++ def test_bogus_comment_endtag(self): ++ self.assertMarkdownRenders( ++ '</#invalid>', ++ '<p></#invalid></p>' + ) + + def test_raw_multiline_comment(self): --- /dev/null +++ b/debian/patches/fixes_for_new_python.diff @@ -0,0 +1,91 @@ +From: Isaac Muse <[email protected]> +Date: Thu, 19 Jun 2025 09:46:13 -0600 +Subject: Fixes for Python 3.14 + +- Fix issue with unclosed `<![` +- Fix issue with unclosed HTML tag `<foo` +- Fix issue with unclosed comments + +Fixes #1537 + +(cherry picked from commit 9980cb5b27b07ff48283178d98213e41543701ec) +--- + markdown/extensions/md_in_html.py | 6 +++++- + markdown/htmlparser.py | 24 ++++++++++++++++++++++-- + 2 files changed, 27 insertions(+), 3 deletions(-) + +diff --git a/markdown/extensions/md_in_html.py b/markdown/extensions/md_in_html.py +index 16f0ef6..9b8f8bf 100644 +--- a/markdown/extensions/md_in_html.py ++++ b/markdown/extensions/md_in_html.py +@@ -222,7 +222,11 @@ class HTMLExtractorExtra(HTMLExtractor): + if self.rawdata[i:i+3] == '<![' and not self.rawdata[i:i+9] == '<![CDATA[': + # We have encountered the bug in #1534 (Python bug `gh-77057`). + # Provide an override until we drop support for Python < 3.13. +- return self.parse_bogus_comment(i) ++ result = self.parse_bogus_comment(i) ++ if result == -1: ++ self.handle_data(self.rawdata[i:i + 1]) ++ return i + 1 ++ return result + # The same override exists in HTMLExtractor without the check + # for mdstack. Therefore, use HTMLExtractor's parent instead. + return super(HTMLExtractor, self).parse_html_declaration(i) +diff --git a/markdown/htmlparser.py b/markdown/htmlparser.py +index b6e3247..ecb92a5 100644 +--- a/markdown/htmlparser.py ++++ b/markdown/htmlparser.py +@@ -76,6 +76,8 @@ class HTMLExtractor(htmlparser.HTMLParser): + # Block tags that should contain no content (self closing) + self.empty_tags = set(['hr']) + ++ self.override_comment_update = False ++ + # This calls self.reset + super().__init__(*args, **kwargs) + self.md = md +@@ -234,8 +236,21 @@ class HTMLExtractor(htmlparser.HTMLParser): + self.handle_empty_tag('&{};'.format(name), is_block=False) + + def handle_comment(self, data): ++ # Check if the comment is unclosed, if so, we need to override position ++ i = self.line_offset + self.offset + len(data) + 4 ++ if self.rawdata[i:i + 3] != '-->': ++ self.handle_data('<') ++ self.override_comment_update = True ++ return + self.handle_empty_tag('<!--{}-->'.format(data), is_block=True) + ++ def updatepos(self, i: int, j: int) -> int: ++ if self.override_comment_update: ++ self.override_comment_update = False ++ i = 0 ++ j = 1 ++ return super().updatepos(i, j) ++ + def handle_decl(self, data): + self.handle_empty_tag('<!{}>'.format(data), is_block=True) + +@@ -259,7 +274,11 @@ class HTMLExtractor(htmlparser.HTMLParser): + if self.rawdata[i:i+3] == '<![' and not self.rawdata[i:i+9] == '<![CDATA[': + # We have encountered the bug in #1534 (Python bug `gh-77057`). + # Provide an override until we drop support for Python < 3.13. +- return self.parse_bogus_comment(i) ++ result = self.parse_bogus_comment(i) ++ if result == -1: ++ self.handle_data(self.rawdata[i:i + 1]) ++ return i + 1 ++ return result + return super().parse_html_declaration(i) + # This is not the beginning of a raw block so treat as plain data + # and avoid consuming any tags which may follow (see #1066). +@@ -289,7 +308,8 @@ class HTMLExtractor(htmlparser.HTMLParser): + self.__starttag_text = None + endpos = self.check_for_whole_start_tag(i) + if endpos < 0: +- return endpos ++ self.handle_data(self.rawdata[i:i + 1]) ++ return i + 1 + rawdata = self.rawdata + self.__starttag_text = rawdata[i:endpos] + --- /dev/null +++ b/debian/patches/incomplete_markup_declaration.diff @@ -0,0 +1,64 @@ +From: Waylan Limberg <[email protected]> +Date: Wed, 18 Jun 2025 10:29:03 -0400 +Subject: Ensure incomplete markup declaration in raw HTML doesn't crash + parser. + +See Python bug report at gh-77057 for details. Until we drop support for +Python < 3.13 (where this was fixed upstream), we need to avoid the +unwanted error by checking for it explicitly. Fixes #1534. + +(cherry picked from commit 820721485c928c6f97f3d74f37afb6d2450aef9e) +--- + markdown/extensions/md_in_html.py | 4 ++++ + markdown/htmlparser.py | 4 ++++ + tests/test_syntax/blocks/test_html_blocks.py | 7 +++++++ + 3 files changed, 15 insertions(+) + +diff --git a/markdown/extensions/md_in_html.py b/markdown/extensions/md_in_html.py +index ec7dcba..16f0ef6 100644 +--- a/markdown/extensions/md_in_html.py ++++ b/markdown/extensions/md_in_html.py +@@ -219,6 +219,10 @@ class HTMLExtractorExtra(HTMLExtractor): + + def parse_html_declaration(self, i): + if self.at_line_start() or self.intail or self.mdstack: ++ if self.rawdata[i:i+3] == '<![' and not self.rawdata[i:i+9] == '<![CDATA[': ++ # We have encountered the bug in #1534 (Python bug `gh-77057`). ++ # Provide an override until we drop support for Python < 3.13. ++ return self.parse_bogus_comment(i) + # The same override exists in HTMLExtractor without the check + # for mdstack. Therefore, use HTMLExtractor's parent instead. + return super(HTMLExtractor, self).parse_html_declaration(i) +diff --git a/markdown/htmlparser.py b/markdown/htmlparser.py +index 586bddd..b6e3247 100644 +--- a/markdown/htmlparser.py ++++ b/markdown/htmlparser.py +@@ -256,6 +256,10 @@ class HTMLExtractor(htmlparser.HTMLParser): + + def parse_html_declaration(self, i): + if self.at_line_start() or self.intail: ++ if self.rawdata[i:i+3] == '<![' and not self.rawdata[i:i+9] == '<
