Bug#1137181: bookworm-pu: package python-markdown/3.4.1-2+deb12u1

Dmitry Shachnev Wed, 20 May 2026 05:23:25 -0700

Package: release.debian.org
Severity: normal
Tags: bookworm
X-Debbugs-Cc: [email protected]
Control: affects -1 + src:python-markdown
User: [email protected]
Usertags: pu


[ Reason ]
This upload fixes two issues:

1. CVE-2025-69534: parser crash on malformed <![ sequences. There are two
   patches for this bug: bogus_comments.diff (backported from 3.5.2, pre-CVE)
   and incomplete_markup_declaration.diff (backported from 3.8.1).

2. Bug #1137043: Fix for tests failures with python3.11 >= 3.11.2-6+deb12u7,
   where some changes were made to html.parser to address CVE-2025-6069, which
   broke Python-Markdown because it heavily relies on html.parser internals.

[ Impact ]
CVE-2025-69534 enables remote, unauthenticated Denial of Service in web
applications, documentation systems, CI/CD pipelines, and any service that
renders untrusted Markdown.

[ Tests ]
All changes are covered by automated tests, which are run during build.

[ Risks ]
The changes have been part of upstream Python-Markdown for a while, and well
covered by tests, so they should be safe.

[ Checklist ]
  [x] *all* changes are documented in the d/changelog
  [x] I reviewed all changes and I approve them
  [x] attach debdiff against the package in (old)stable
  [x] the issue is verified as fixed in unstable

[ Changes ]
  * Backport upstream fixes for parsing bogus HTML markup (CVE-2025-69534).
  * Adapt to changes in html.parser module in the new Python, backported
    to Bookworm as part of CVE fixes (closes: #1137043).

There are also branch changes in debian/gbp.conf and debian/gitlab-ci.yml,
which are needed for the CI, but those files to not affect the built package
in any way.

[ Other info ]
See also #1137180: similar upload to Trixie.

See also #1131896: discussion about whether CVE-2025-69534 needs to be
addressed in Python 3.11 itself, not in Python-Markdown. In case it happens
at some point, Python-Markdown should not break. I am not waiting for a fix
in Python and including a workaround in Python-Markdown itself, since it is
needed for the #1137043 patch to apply cleanly anyway.

--
Dmitry Shachnev

--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,11 @@
+python-markdown (3.4.1-2+deb12u1) bookworm; urgency=medium
+
+  * Backport upstream fixes for parsing bogus HTML markup (CVE-2025-69534).
+  * Adapt to changes in html.parser module in the new Python, backported
+    to Bookworm as part of CVE fixes (closes: #1137043).
+
+ -- Dmitry Shachnev <[email protected]>  Wed, 20 May 2026 14:32:50 +0300
+
 python-markdown (3.4.1-2) unstable; urgency=medium
 
   * Team upload.
--- a/debian/gbp.conf
+++ b/debian/gbp.conf
@@ -1,2 +1,2 @@
 [DEFAULT]
-debian-branch=debian/master
+debian-branch=debian/bookworm
--- a/debian/gitlab-ci.yml
+++ b/debian/gitlab-ci.yml
@@ -3,4 +3,4 @@ include:
   - https://salsa.debian.org/salsa-ci-team/pipeline/raw/master/pipeline-jobs.yml
 
 variables:
-  RELEASE: 'unstable'
+  RELEASE: 'bookworm'
--- /dev/null
+++ b/debian/patches/bogus_comments.diff
@@ -0,0 +1,66 @@
+From: Waylan Limberg <[email protected]>
+Date: Wed, 3 Jan 2024 13:24:33 -0500
+Subject: Fix handling of bogus comments.
+
+As with most implementations, we now pass through bogus comments (as
+defined by the HTML Spec) unaltered except that they are HTML escaped.
+This deviates from the reference implementation which completely ignores
+them. As the reference implementation seems to not have even contemplated
+their existence, it is not being used as a reference in this instance.
+Fixes #1425.
+
+(cherry picked from commit e466f381d09692f484f8ff022273e2ac8cea0b16)
+---
+ markdown/htmlparser.py                       |  9 +++++++++
+ tests/test_syntax/blocks/test_html_blocks.py | 16 ++++++++--------
+ 2 files changed, 17 insertions(+), 8 deletions(-)
+
+diff --git a/markdown/htmlparser.py b/markdown/htmlparser.py
+index 3512d1a..586bddd 100644
+--- a/markdown/htmlparser.py
++++ b/markdown/htmlparser.py
+@@ -262,6 +262,15 @@ class HTMLExtractor(htmlparser.HTMLParser):
+         self.handle_data('<!')
+         return i + 2
+ 
++    def parse_bogus_comment(self, i: int, report: int = 0) -> int:
++        # Override the default behavior so that bogus comments get passed
++        # through unaltered by setting `report` to `0` (see #1425).
++        pos = super().parse_bogus_comment(i, report)
++        if pos == -1:  # pragma: no cover
++            return -1
++        self.handle_empty_tag(self.rawdata[i:pos], is_block=False)
++        return pos
++
+     # The rest has been copied from base class in standard lib to address #1036.
+     # As __startag_text is private, all references to it must be in this subclass.
+     # The last few lines of parse_starttag are reversed so that handle_starttag
+diff --git a/tests/test_syntax/blocks/test_html_blocks.py b/tests/test_syntax/blocks/test_html_blocks.py
+index 9ec0668..4a4a06e 100644
+--- a/tests/test_syntax/blocks/test_html_blocks.py
++++ b/tests/test_syntax/blocks/test_html_blocks.py
+@@ -782,16 +782,16 @@ class TestHTMLBlocks(TestCase):
+             '<!-- *foo* -->'
+         )
+ 
+-    # Note: this is a change in behavior for Python-Markdown, which does *not* match the reference
+-    # implementation. However, it does match the HTML5 spec. Declarations must start with either
+-    # `<!DOCTYPE` or `<![`. Anything else that starts with `<!` is a comment. According to the
+-    # HTML5 spec, a comment without the hyphens is a "bogus comment", but a comment nonetheless.
+-    # See https://www.w3.org/TR/html52/syntax.html#markup-declaration-open-state.
+-    # If we wanted to change this behavior, we could override `HTMLParser.parse_bogus_comment()`.
+     def test_bogus_comment(self):
+         self.assertMarkdownRenders(
+-            '<!*foo*>',
+-            '<!--*foo*-->'
++            '<!invalid>',
++            '<p>&lt;!invalid&gt;</p>'
++        )
++
++    def test_bogus_comment_endtag(self):
++        self.assertMarkdownRenders(
++            '</#invalid>',
++            '<p>&lt;/#invalid&gt;</p>'
+         )
+ 
+     def test_raw_multiline_comment(self):
--- /dev/null
+++ b/debian/patches/fixes_for_new_python.diff
@@ -0,0 +1,91 @@
+From: Isaac Muse <[email protected]>
+Date: Thu, 19 Jun 2025 09:46:13 -0600
+Subject: Fixes for Python 3.14
+
+- Fix issue with unclosed `<![`
+- Fix issue with unclosed HTML tag `<foo`
+- Fix issue with unclosed comments
+
+Fixes #1537
+
+(cherry picked from commit 9980cb5b27b07ff48283178d98213e41543701ec)
+---
+ markdown/extensions/md_in_html.py |  6 +++++-
+ markdown/htmlparser.py            | 24 ++++++++++++++++++++++--
+ 2 files changed, 27 insertions(+), 3 deletions(-)
+
+diff --git a/markdown/extensions/md_in_html.py b/markdown/extensions/md_in_html.py
+index 16f0ef6..9b8f8bf 100644
+--- a/markdown/extensions/md_in_html.py
++++ b/markdown/extensions/md_in_html.py
+@@ -222,7 +222,11 @@ class HTMLExtractorExtra(HTMLExtractor):
+             if self.rawdata[i:i+3] == '<![' and not self.rawdata[i:i+9] == '<![CDATA[':
+                 # We have encountered the bug in #1534 (Python bug `gh-77057`).
+                 # Provide an override until we drop support for Python < 3.13.
+-                return self.parse_bogus_comment(i)
++                result = self.parse_bogus_comment(i)
++                if result == -1:
++                    self.handle_data(self.rawdata[i:i + 1])
++                    return i + 1
++                return result
+             # The same override exists in HTMLExtractor without the check
+             # for mdstack. Therefore, use HTMLExtractor's parent instead.
+             return super(HTMLExtractor, self).parse_html_declaration(i)
+diff --git a/markdown/htmlparser.py b/markdown/htmlparser.py
+index b6e3247..ecb92a5 100644
+--- a/markdown/htmlparser.py
++++ b/markdown/htmlparser.py
+@@ -76,6 +76,8 @@ class HTMLExtractor(htmlparser.HTMLParser):
+         # Block tags that should contain no content (self closing)
+         self.empty_tags = set(['hr'])
+ 
++        self.override_comment_update = False
++
+         # This calls self.reset
+         super().__init__(*args, **kwargs)
+         self.md = md
+@@ -234,8 +236,21 @@ class HTMLExtractor(htmlparser.HTMLParser):
+         self.handle_empty_tag('&{};'.format(name), is_block=False)
+ 
+     def handle_comment(self, data):
++        # Check if the comment is unclosed, if so, we need to override position
++        i = self.line_offset + self.offset + len(data) + 4
++        if self.rawdata[i:i + 3] != '-->':
++            self.handle_data('<')
++            self.override_comment_update = True
++            return
+         self.handle_empty_tag('<!--{}-->'.format(data), is_block=True)
+ 
++    def updatepos(self, i: int, j: int) -> int:
++        if self.override_comment_update:
++            self.override_comment_update = False
++            i = 0
++            j = 1
++        return super().updatepos(i, j)
++
+     def handle_decl(self, data):
+         self.handle_empty_tag('<!{}>'.format(data), is_block=True)
+ 
+@@ -259,7 +274,11 @@ class HTMLExtractor(htmlparser.HTMLParser):
+             if self.rawdata[i:i+3] == '<![' and not self.rawdata[i:i+9] == '<![CDATA[':
+                 # We have encountered the bug in #1534 (Python bug `gh-77057`).
+                 # Provide an override until we drop support for Python < 3.13.
+-                return self.parse_bogus_comment(i)
++                result = self.parse_bogus_comment(i)
++                if result == -1:
++                    self.handle_data(self.rawdata[i:i + 1])
++                    return i + 1
++                return result
+             return super().parse_html_declaration(i)
+         # This is not the beginning of a raw block so treat as plain data
+         # and avoid consuming any tags which may follow (see #1066).
+@@ -289,7 +308,8 @@ class HTMLExtractor(htmlparser.HTMLParser):
+         self.__starttag_text = None
+         endpos = self.check_for_whole_start_tag(i)
+         if endpos < 0:
+-            return endpos
++            self.handle_data(self.rawdata[i:i + 1])
++            return i + 1
+         rawdata = self.rawdata
+         self.__starttag_text = rawdata[i:endpos]
+ 
--- /dev/null
+++ b/debian/patches/incomplete_markup_declaration.diff
@@ -0,0 +1,64 @@
+From: Waylan Limberg <[email protected]>
+Date: Wed, 18 Jun 2025 10:29:03 -0400
+Subject: Ensure incomplete markup declaration in raw HTML doesn't crash
+ parser.
+
+See Python bug report at gh-77057 for details. Until we drop support for
+Python < 3.13 (where this was fixed upstream), we need to avoid the
+unwanted error by checking for it explicitly. Fixes #1534.
+
+(cherry picked from commit 820721485c928c6f97f3d74f37afb6d2450aef9e)
+---
+ markdown/extensions/md_in_html.py            | 4 ++++
+ markdown/htmlparser.py                       | 4 ++++
+ tests/test_syntax/blocks/test_html_blocks.py | 7 +++++++
+ 3 files changed, 15 insertions(+)
+
+diff --git a/markdown/extensions/md_in_html.py b/markdown/extensions/md_in_html.py
+index ec7dcba..16f0ef6 100644
+--- a/markdown/extensions/md_in_html.py
++++ b/markdown/extensions/md_in_html.py
+@@ -219,6 +219,10 @@ class HTMLExtractorExtra(HTMLExtractor):
+ 
+     def parse_html_declaration(self, i):
+         if self.at_line_start() or self.intail or self.mdstack:
++            if self.rawdata[i:i+3] == '<![' and not self.rawdata[i:i+9] == '<![CDATA[':
++                # We have encountered the bug in #1534 (Python bug `gh-77057`).
++                # Provide an override until we drop support for Python < 3.13.
++                return self.parse_bogus_comment(i)
+             # The same override exists in HTMLExtractor without the check
+             # for mdstack. Therefore, use HTMLExtractor's parent instead.
+             return super(HTMLExtractor, self).parse_html_declaration(i)
+diff --git a/markdown/htmlparser.py b/markdown/htmlparser.py
+index 586bddd..b6e3247 100644
+--- a/markdown/htmlparser.py
++++ b/markdown/htmlparser.py
+@@ -256,6 +256,10 @@ class HTMLExtractor(htmlparser.HTMLParser):
+ 
+     def parse_html_declaration(self, i):
+         if self.at_line_start() or self.intail:
++            if self.rawdata[i:i+3] == '<![' and not self.rawdata[i:i+9] == '<![CDATA[':
++                # We have encountered the bug in #1534 (Python bug `gh-77057`).
++                # Provide an override until we drop support for Python < 3.13.
++                return self.parse_bogus_comment(i)
+             return super().parse_html_declaration(i)
+         # This is not the beginning of a raw block so treat as plain data
+         # and avoid consuming any tags which may follow (see #1066).
+diff --git a/tests/test_syntax/blocks/test_html_blocks.py b/tests/test_syntax/blocks/test_html_blocks.py
+index 4a4a06e..5545062 100644
+--- a/tests/test_syntax/blocks/test_html_blocks.py
++++ b/tests/test_syntax/blocks/test_html_blocks.py
+@@ -1275,6 +1275,13 @@ class TestHTMLBlocks(TestCase):
+             )
+         )
+ 
++    def test_not_actually_cdata(self):
++        # Ensure bug reported in #1534 is avoided.
++        self.assertMarkdownRenders(
++            '<![',
++            '<p>&lt;![</p>'
++        )
++
+     def test_raw_cdata_code_span(self):
+         self.assertMarkdownRenders(
+             self.dedent(
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1 +1,4 @@
 disable_directory_urls.diff
+bogus_comments.diff
+incomplete_markup_declaration.diff
+fixes_for_new_python.diff

signature.asc
Description: PGP signature

Bug#1137181: bookworm-pu: package python-markdown/3.4.1-2+deb12u1

Reply via email to