Script 'mail_helper' called by obssrc
Hello community,
here is the log from the commit of package python-lxml_html_clean for
openSUSE:Factory checked in at 2026-03-10 17:58:05
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-lxml_html_clean (Old)
and /work/SRC/openSUSE:Factory/.python-lxml_html_clean.new.8177 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-lxml_html_clean"
Tue Mar 10 17:58:05 2026 rev:5 rq:1337971 version:0.4.4
Changes:
--------
---
/work/SRC/openSUSE:Factory/python-lxml_html_clean/python-lxml_html_clean.changes
2025-10-10 17:11:38.847040704 +0200
+++
/work/SRC/openSUSE:Factory/.python-lxml_html_clean.new.8177/python-lxml_html_clean.changes
2026-03-10 18:58:40.485846414 +0100
@@ -1,0 +2,13 @@
+Tue Mar 10 09:58:53 UTC 2026 - Nico Krapp <[email protected]>
+
+- Update to 0.4.4
+ * Fixed a bug where Unicode escapes in CSS were not properly decoded before
+ security checks. This prevents attackers from bypassing filters using
+ escape sequences. (CVE-2026-28348) (bsc#1259378)
+ * Fixed a security issue where <base> tags could be used for URL hijacking
+ attacks. The <base> tag is now automatically removed whenever the <head>
+ tag is removed (via page_structure=True or manual configuration), as <base>
+ must be inside <head> according to HTML specifications. (CVE-2026-28350)
+ (bsc#1259379)
+
+-------------------------------------------------------------------
Old:
----
lxml_html_clean-0.4.3.tar.gz
New:
----
lxml_html_clean-0.4.4.tar.gz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Other differences:
------------------
++++++ python-lxml_html_clean.spec ++++++
--- /var/tmp/diff_new_pack.IxY1qz/_old 2026-03-10 18:58:44.846014383 +0100
+++ /var/tmp/diff_new_pack.IxY1qz/_new 2026-03-10 18:58:44.858014845 +0100
@@ -1,7 +1,7 @@
#
# spec file for package python-lxml_html_clean
#
-# Copyright (c) 2025 SUSE LLC and contributors
+# Copyright (c) 2026 SUSE LLC and contributors
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
@@ -18,7 +18,7 @@
%{?sle15_python_module_pythons}
Name: python-lxml_html_clean
-Version: 0.4.3
+Version: 0.4.4
Release: 0
Summary: HTML cleaner from lxml project
License: BSD-3-Clause
++++++ lxml_html_clean-0.4.3.tar.gz -> lxml_html_clean-0.4.4.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/lxml_html_clean-0.4.3/CHANGES.rst
new/lxml_html_clean-0.4.4/CHANGES.rst
--- old/lxml_html_clean-0.4.3/CHANGES.rst 2025-10-02 22:46:25.000000000
+0200
+++ new/lxml_html_clean-0.4.4/CHANGES.rst 2026-02-27 10:32:54.000000000
+0100
@@ -6,6 +6,21 @@
Unreleased
==========
+0.4.4 (2026-02-26)
+==================
+
+Bugs fixed
+----------
+
+* Fixed a bug where Unicode escapes in CSS were not properly decoded
+ before security checks. This prevents attackers from bypassing filters
+ using escape sequences.
+* Fixed a security issue where ``<base>`` tags could be used for URL
+ hijacking attacks. The ``<base>`` tag is now automatically removed
+ whenever the ``<head>`` tag is removed (via ``page_structure=True``
+ or manual configuration), as ``<base>`` must be inside ``<head>``
+ according to HTML specifications.
+
0.4.3 (2025-10-02)
==================
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/lxml_html_clean-0.4.3/PKG-INFO
new/lxml_html_clean-0.4.4/PKG-INFO
--- old/lxml_html_clean-0.4.3/PKG-INFO 2025-10-02 22:46:55.823964600 +0200
+++ new/lxml_html_clean-0.4.4/PKG-INFO 2026-02-27 10:33:33.721878000 +0100
@@ -1,6 +1,6 @@
Metadata-Version: 2.4
Name: lxml_html_clean
-Version: 0.4.3
+Version: 0.4.4
Summary: HTML cleaner from lxml project
Home-page: https://github.com/fedora-python/lxml_html_clean/
Author: Lumír Balhar
@@ -14,6 +14,7 @@
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: 3.14
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: lxml
@@ -25,7 +26,7 @@
This project was initially a part of [lxml](https://github.com/lxml/lxml).
Because HTML cleaner is designed as blocklist-based, many reports about
possible security vulnerabilities were filed for lxml and that make the project
problematic for security-sensitive environments. Therefore we decided to
extract the problematic part to a separate project.
-**Important**: the HTML Cleaner in ``lxml_html_clean`` is **not** considered
appropriate **for security sensitive environments**. See e.g.
[bleach](https://pypi.org/project/bleach/) for an alternative.
+**Important**: the HTML Cleaner in ``lxml_html_clean`` is **not** considered
appropriate **for security sensitive environments**. See e.g.
[nh3](https://pypi.org/project/nh3/) for an alternative.
This project uses functions from Python's `urllib.parse` for URL parsing which
**do not validate inputs**. For more information on potential security risks,
refer to the [URL parsing
security](https://docs.python.org/3/library/urllib.parse.html#url-parsing-security)
documentation. A maliciously crafted URL could potentially bypass the allowed
hosts check in `Cleaner`.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/lxml_html_clean-0.4.3/README.md
new/lxml_html_clean-0.4.4/README.md
--- old/lxml_html_clean-0.4.3/README.md 2025-10-02 22:37:00.000000000 +0200
+++ new/lxml_html_clean-0.4.4/README.md 2026-02-27 10:32:41.000000000 +0100
@@ -4,7 +4,7 @@
This project was initially a part of [lxml](https://github.com/lxml/lxml).
Because HTML cleaner is designed as blocklist-based, many reports about
possible security vulnerabilities were filed for lxml and that make the project
problematic for security-sensitive environments. Therefore we decided to
extract the problematic part to a separate project.
-**Important**: the HTML Cleaner in ``lxml_html_clean`` is **not** considered
appropriate **for security sensitive environments**. See e.g.
[bleach](https://pypi.org/project/bleach/) for an alternative.
+**Important**: the HTML Cleaner in ``lxml_html_clean`` is **not** considered
appropriate **for security sensitive environments**. See e.g.
[nh3](https://pypi.org/project/nh3/) for an alternative.
This project uses functions from Python's `urllib.parse` for URL parsing which
**do not validate inputs**. For more information on potential security risks,
refer to the [URL parsing
security](https://docs.python.org/3/library/urllib.parse.html#url-parsing-security)
documentation. A maliciously crafted URL could potentially bypass the allowed
hosts check in `Cleaner`.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/lxml_html_clean-0.4.3/lxml_html_clean/clean.py
new/lxml_html_clean-0.4.4/lxml_html_clean/clean.py
--- old/lxml_html_clean-0.4.3/lxml_html_clean/clean.py 2025-10-02
22:37:00.000000000 +0200
+++ new/lxml_html_clean-0.4.4/lxml_html_clean/clean.py 2026-02-27
10:32:54.000000000 +0100
@@ -422,6 +422,12 @@
if self.annoying_tags:
remove_tags.update(('blink', 'marquee'))
+ # Remove <base> tags whenever <head> is being removed.
+ # According to HTML spec, <base> must be in <head>, but browsers
+ # may interpret it even when misplaced, allowing URL hijacking attacks.
+ if 'head' in kill_tags or 'head' in remove_tags:
+ kill_tags.add('base')
+
_remove = deque()
_kill = deque()
for el in doc.iter():
@@ -578,6 +584,26 @@
_comments_re = re.compile(r'/\*.*?\*/', re.S)
_find_comments = _comments_re.finditer
_substitute_comments = _comments_re.sub
+ _css_unicode_escape_re = re.compile(r'\\([0-9a-fA-F]{1,6})\s?')
+
+ def _decode_css_unicode_escapes(self, style):
+ """
+ Decode CSS Unicode escape sequences like \\69 or \\000069 to their
+ actual character values. This prevents bypassing security checks
+ using CSS escape sequences.
+
+ CSS escape syntax: backslash followed by 1-6 hex digits,
+ optionally followed by a whitespace character.
+ """
+ def replace_escape(match):
+ hex_value = match.group(1)
+ try:
+ return chr(int(hex_value, 16))
+ except (ValueError, OverflowError):
+ # Invalid unicode codepoint, keep original
+ return match.group(0)
+
+ return self._css_unicode_escape_re.sub(replace_escape, style)
def _has_sneaky_javascript(self, style):
"""
@@ -591,6 +617,7 @@
more sneaky attempts.
"""
style = self._substitute_comments('', style)
+ style = self._decode_css_unicode_escapes(style)
style = style.replace('\\', '')
style = _substitute_whitespace('', style)
style = style.lower()
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/lxml_html_clean-0.4.3/lxml_html_clean.egg-info/PKG-INFO
new/lxml_html_clean-0.4.4/lxml_html_clean.egg-info/PKG-INFO
--- old/lxml_html_clean-0.4.3/lxml_html_clean.egg-info/PKG-INFO 2025-10-02
22:46:55.000000000 +0200
+++ new/lxml_html_clean-0.4.4/lxml_html_clean.egg-info/PKG-INFO 2026-02-27
10:33:33.000000000 +0100
@@ -1,6 +1,6 @@
Metadata-Version: 2.4
Name: lxml_html_clean
-Version: 0.4.3
+Version: 0.4.4
Summary: HTML cleaner from lxml project
Home-page: https://github.com/fedora-python/lxml_html_clean/
Author: Lumír Balhar
@@ -14,6 +14,7 @@
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: 3.14
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: lxml
@@ -25,7 +26,7 @@
This project was initially a part of [lxml](https://github.com/lxml/lxml).
Because HTML cleaner is designed as blocklist-based, many reports about
possible security vulnerabilities were filed for lxml and that make the project
problematic for security-sensitive environments. Therefore we decided to
extract the problematic part to a separate project.
-**Important**: the HTML Cleaner in ``lxml_html_clean`` is **not** considered
appropriate **for security sensitive environments**. See e.g.
[bleach](https://pypi.org/project/bleach/) for an alternative.
+**Important**: the HTML Cleaner in ``lxml_html_clean`` is **not** considered
appropriate **for security sensitive environments**. See e.g.
[nh3](https://pypi.org/project/nh3/) for an alternative.
This project uses functions from Python's `urllib.parse` for URL parsing which
**do not validate inputs**. For more information on potential security risks,
refer to the [URL parsing
security](https://docs.python.org/3/library/urllib.parse.html#url-parsing-security)
documentation. A maliciously crafted URL could potentially bypass the allowed
hosts check in `Cleaner`.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/lxml_html_clean-0.4.3/setup.cfg
new/lxml_html_clean-0.4.4/setup.cfg
--- old/lxml_html_clean-0.4.3/setup.cfg 2025-10-02 22:46:55.824597600 +0200
+++ new/lxml_html_clean-0.4.4/setup.cfg 2026-02-27 10:33:33.722296500 +0100
@@ -1,6 +1,6 @@
[metadata]
name = lxml_html_clean
-version = 0.4.3
+version = 0.4.4
description = HTML cleaner from lxml project
long_description = file:README.md
long_description_content_type = text/markdown
@@ -19,6 +19,7 @@
Programming Language :: Python :: 3.11
Programming Language :: Python :: 3.12
Programming Language :: Python :: 3.13
+ Programming Language :: Python :: 3.14
[options]
packages =
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/lxml_html_clean-0.4.3/tests/test_clean.py
new/lxml_html_clean-0.4.4/tests/test_clean.py
--- old/lxml_html_clean-0.4.3/tests/test_clean.py 2025-10-02
22:37:07.000000000 +0200
+++ new/lxml_html_clean-0.4.4/tests/test_clean.py 2026-02-27
10:32:54.000000000 +0100
@@ -393,3 +393,195 @@
self.assertEqual(len(w), 0)
self.assertNotIn("google.com", result)
self.assertNotIn("example.com", result)
+
+ def test_base_tag_removed_with_page_structure(self):
+ # Test that <base> tags are removed when page_structure=True (default)
+ # This prevents URL hijacking attacks where <base> redirects all
relative URLs
+
+ test_cases = [
+ # <base> in proper location (inside <head>)
+ '<html><head><base href="http://evil.com/"></head><body><a
href="page.html">link</a></body></html>',
+ # <base> outside <head>
+ '<div><base href="http://evil.com/"><a
href="page.html">link</a></div>',
+ # Multiple <base> tags
+ '<base href="http://evil.com/"><div><base
href="http://evil2.com/"></div>',
+ # <base> with target attribute
+ '<base target="_blank"><div>content</div>',
+ # <base> at various positions
+ '<html><base href="http://evil.com/"><body>test</body></html>',
+ ]
+
+ for html in test_cases:
+ with self.subTest(html=html):
+ cleaned = clean_html(html)
+ # Verify <base> tag is completely removed
+ self.assertNotIn('base', cleaned.lower())
+ self.assertNotIn('evil.com', cleaned)
+ self.assertNotIn('evil2.com', cleaned)
+
+ def test_base_tag_kept_when_page_structure_false(self):
+ # When page_structure=False and head is not removed, <base> should be
kept
+ cleaner = Cleaner(page_structure=False)
+ html = '<html><head><base
href="http://example.com/"></head><body>test</body></html>'
+ cleaned = cleaner.clean_html(html)
+ self.assertIn('<base href="http://example.com/">', cleaned)
+
+ def test_base_tag_removed_when_head_in_remove_tags(self):
+ # Even with page_structure=False, <base> should be removed if head is
manually removed
+ cleaner = Cleaner(page_structure=False, remove_tags=['head'])
+ html = '<html><head><base
href="http://evil.com/"></head><body>test</body></html>'
+ cleaned = cleaner.clean_html(html)
+ self.assertNotIn('base', cleaned.lower())
+ self.assertNotIn('evil.com', cleaned)
+
+ def test_base_tag_removed_when_head_in_kill_tags(self):
+ # Even with page_structure=False, <base> should be removed if head is
in kill_tags
+ cleaner = Cleaner(page_structure=False, kill_tags=['head'])
+ html = '<html><head><base
href="http://evil.com/"></head><body>test</body></html>'
+ cleaned = cleaner.clean_html(html)
+ self.assertNotIn('base', cleaned.lower())
+ self.assertNotIn('evil.com', cleaned)
+
+ def test_unicode_escape_in_style(self):
+ # Test that CSS Unicode escapes are properly decoded before security
checks
+ # This prevents attackers from bypassing filters using escape sequences
+ # CSS escape syntax: \HHHHHH where H is a hex digit (1-6 digits)
+
+ # Test inline style attributes (requires safe_attrs_only=False)
+ cleaner = Cleaner(safe_attrs_only=False)
+ inline_style_cases = [
+ # \6a\61\76\61\73\63\72\69\70\74 = "javascript"
+ ('<div style="background:
url(\\6a\\61\\76\\61\\73\\63\\72\\69\\70\\74:alert(1))">test</div>',
'<div>test</div>'),
+ # \69 = 'i', so \69mport = "import"
+ ('<div style="@\\69mport url(evil.css)">test</div>',
'<div>test</div>'),
+ # \69 with space after = 'i', space consumed as part of escape
+ ('<div style="@\\69 mport url(evil.css)">test</div>',
'<div>test</div>'),
+ # \65\78\70\72\65\73\73\69\6f\6e = "expression"
+ ('<div
style="\\65\\78\\70\\72\\65\\73\\73\\69\\6f\\6e(alert(1))">test</div>',
'<div>test</div>'),
+ ]
+
+ for html, expected in inline_style_cases:
+ with self.subTest(html=html):
+ cleaned = cleaner.clean_html(html)
+ self.assertEqual(expected, cleaned)
+
+ # Test <style> tag content (uses default clean_html)
+ style_tag_cases = [
+ # Unicode-escaped "javascript:" in url()
+
'<style>url(\\6a\\61\\76\\61\\73\\63\\72\\69\\70\\74:alert(1))</style>',
+ # Unicode-escaped "javascript:" without url()
+ '<style>\\6a\\61\\76\\61\\73\\63\\72\\69\\70\\74:alert(1)</style>',
+ # Unicode-escaped "expression"
+
'<style>\\65\\78\\70\\72\\65\\73\\73\\69\\6f\\6e(alert(1))</style>',
+ # Unicode-escaped @import with 'i'
+ '<style>@\\69mport url(evil.css)</style>',
+ # Unicode-escaped "data:" scheme
+
'<style>url(\\64\\61\\74\\61:image/svg+xml;base64,PHN2ZyBvbmxvYWQ9YWxlcnQoMSk+)</style>',
+ # Space after escape is consumed: \69 mport = "import"
+ '<style>@\\69 mport url(evil.css)</style>',
+ # 6-digit escape: \000069 = 'i'
+ '<style>@\\000069mport url(evil.css)</style>',
+ # 6-digit escape with space
+ '<style>@\\000069 mport url(evil.css)</style>',
+ ]
+
+ for html in style_tag_cases:
+ with self.subTest(html=html):
+ cleaned = clean_html(html)
+ self.assertEqual('<div><style>/* deleted */</style></div>',
cleaned)
+
+ def test_unicode_escape_mixed_with_comments(self):
+ # Unicode escapes mixed with CSS comments should still be caught
+ test_cases = [
+ # \69 = 'i' with comment before
+ '<style>@/*comment*/\\69mport url(evil.css)</style>',
+ # \69 = 'i' with comment after
+ '<style>@\\69mport/*comment*/ url(evil.css)</style>',
+ # Multiple escapes with comments
+
'<style>\\65\\78/*comment*/\\70\\72\\65\\73\\73\\69\\6f\\6e(alert(1))</style>',
+ ]
+
+ for html in test_cases:
+ with self.subTest(html=html):
+ cleaned = clean_html(html)
+ self.assertEqual('<div><style>/* deleted */</style></div>',
cleaned)
+
+ def test_unicode_escape_case_insensitive(self):
+ # CSS hex escapes should work with both uppercase and lowercase hex
digits
+ # \69 = 'i', \6D = 'm', etc.
+ test_cases = [
+ # @import with uppercase hex digits: \69\6D\70\6F\72\74
+ '<style>@\\69\\6D\\70\\6F\\72\\74 url(evil.css)</style>',
+ # @import with some uppercase
+ '<style>@\\69\\6D\\70\\6f\\72\\74 url(evil.css)</style>',
+ ]
+
+ for html in test_cases:
+ with self.subTest(html=html):
+ cleaned = clean_html(html)
+ self.assertEqual('<div><style>/* deleted */</style></div>',
cleaned)
+
+ def test_unicode_escape_various_schemes(self):
+ # Test Unicode escapes for various malicious schemes
+ test_cases = [
+ # \76\62\73\63\72\69\70\74 = "vbscript"
+ '<style>url(\\76\\62\\73\\63\\72\\69\\70\\74:alert(1))</style>',
+ # \6a\73\63\72\69\70\74 = "jscript"
+ '<style>url(\\6a\\73\\63\\72\\69\\70\\74:alert(1))</style>',
+ # \6c\69\76\65\73\63\72\69\70\74 = "livescript"
+
'<style>url(\\6c\\69\\76\\65\\73\\63\\72\\69\\70\\74:alert(1))</style>',
+ # \6d\6f\63\68\61 = "mocha"
+ '<style>url(\\6d\\6f\\63\\68\\61:alert(1))</style>',
+ ]
+
+ for html in test_cases:
+ with self.subTest(html=html):
+ cleaned = clean_html(html)
+ self.assertEqual('<div><style>/* deleted */</style></div>',
cleaned)
+
+ def test_unicode_escape_with_whitespace_variations(self):
+ # Test different whitespace characters after Unicode escapes
+ cleaner = Cleaner(safe_attrs_only=False)
+ test_cases = [
+ # Tab after escape
+ ('<div style="@\\69\tmport url(evil.css)">test</div>',
'<div>test</div>'),
+ # Newline after escape (note: actual newline, not \n)
+ ('<div style="@\\69\nmport url(evil.css)">test</div>',
'<div>test</div>'),
+ # Form feed after escape
+ ('<div style="@\\69\fmport url(evil.css)">test</div>',
'<div>test</div>'),
+ ]
+
+ for html, expected in test_cases:
+ with self.subTest(html=html):
+ cleaned = cleaner.clean_html(html)
+ self.assertEqual(expected, cleaned)
+
+ def test_backslash_removal_after_unicode_decode(self):
+ # After decoding Unicode escapes, remaining backslashes are removed
+ # This ensures double-obfuscation (unicode + backslashes) is caught
+ test_cases = [
+ # Step 1: \69 → 'i', Step 2: remove \, Result: @import
+ '<style>@\\69\\m\\p\\o\\r\\t url(evil.css)</style>',
+ # Multiple unicode escapes with backslashes mixed in
+ '<style>@\\69\\6d\\p\\6f\\r\\t url(evil.css)</style>',
+ ]
+
+ for html in test_cases:
+ with self.subTest(html=html):
+ cleaned = clean_html(html)
+ self.assertEqual('<div><style>/* deleted */</style></div>',
cleaned)
+
+ def test_backslash_obfuscation_without_unicode(self):
+ # Test that patterns using ONLY backslash obfuscation (no unicode) are
caught
+ # Step 1: No unicode escapes, Step 2: remove \, Result: malicious
pattern
+ test_cases = [
+ # @\i\m\p\o\r\t → @import (caught by '@import' check)
+ '<style>@\\i\\m\\p\\o\\r\\t url(evil.css)</style>',
+ # Can also test combinations that create javascript schemes
+ '<style>@\\import url(evil.css)</style>',
+ ]
+
+ for html in test_cases:
+ with self.subTest(html=html):
+ cleaned = clean_html(html)
+ self.assertEqual('<div><style>/* deleted */</style></div>',
cleaned)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/lxml_html_clean-0.4.3/tox.ini
new/lxml_html_clean-0.4.4/tox.ini
--- old/lxml_html_clean-0.4.3/tox.ini 2025-10-02 22:37:07.000000000 +0200
+++ new/lxml_html_clean-0.4.4/tox.ini 2026-02-27 10:32:41.000000000 +0100
@@ -1,5 +1,5 @@
[tox]
-envlist = py38,py39,py310,py311,py312,py313,mypy
+envlist = py39,py310,py311,py312,py313,py314,mypy
skipsdist = True
[testenv]