Bug#1063723: bullseye-pu: package pypdf2/1.26.0-4

Scott Kitterman Sun, 11 Feb 2024 11:15:23 -0800

Package: release.debian.org
Severity: normal
Tags: bullseye
User: release.debian....@packages.debian.org
Usertags: pu


[ Reason ]
Fixes two no-DSA CVE issues.

[ Impact ]
Continued low level security risk.

[ Tests ]
None that are run in the Debian package or autopkgtest.

[ Risks ]
Risk is low.  Code changes are relatively simple and were provided by
upstream.  These same two patches were previously released for LTS with
no known issues.

[ Checklist ]
  [X] *all* changes are documented in the d/changelog
  [X] I reviewed all changes and I approve them
  [X] attach debdiff against the package in (old)stable
  [X] the issue is verified as fixed in unstable

[ Changes ]
  * Forward-port CVE fixes by LTS team
    - CVE-2023-36810: Quadratic runtime with malformed PDF missing xref marker.
    - Fix CVE-2022-24859:
        Sebastian Krause discovered that manipulated inline images can force
        PyPDF2, a pure Python PDF library, into an infinite loop, if a
        maliciously crafted PDF file is processed.

[ Other info ]
This may show up as an NMU (lintian things so), but I'm the maintainer
now for Testing/Unstable.  I chose not to update the maintainer fields
in this update to keep it minimal to address the issues.

Scott K

diff -Nru pypdf2-1.26.0/debian/changelog pypdf2-1.26.0/debian/changelog
--- pypdf2-1.26.0/debian/changelog      2020-01-19 03:08:58.000000000 -0500
+++ pypdf2-1.26.0/debian/changelog      2024-02-11 13:50:22.000000000 -0500
@@ -1,3 +1,14 @@
+pypdf2 (1.26.0-4+deb11u1) bullseye; urgency=medium
+
+  * Forward-port CVE fixes by LTS team
+    - CVE-2023-36810: Quadratic runtime with malformed PDF missing xref marker.
+    - Fix CVE-2022-24859:
+        Sebastian Krause discovered that manipulated inline images can force
+        PyPDF2, a pure Python PDF library, into an infinite loop, if a
+        maliciously crafted PDF file is processed.
+
+ -- Scott Kitterman <sc...@kitterman.com>  Sun, 11 Feb 2024 13:50:22 -0500
+
 pypdf2 (1.26.0-4) unstable; urgency=medium
 
   * Remove Python 2 from build dependencies (closes: #937505).
diff -Nru 
pypdf2-1.26.0/debian/patches/0001-MAINT-Quadratic-runtime-while-parsing-reduced-to-lin.patch
 
pypdf2-1.26.0/debian/patches/0001-MAINT-Quadratic-runtime-while-parsing-reduced-to-lin.patch
--- 
pypdf2-1.26.0/debian/patches/0001-MAINT-Quadratic-runtime-while-parsing-reduced-to-lin.patch
        1969-12-31 19:00:00.000000000 -0500
+++ 
pypdf2-1.26.0/debian/patches/0001-MAINT-Quadratic-runtime-while-parsing-reduced-to-lin.patch
        2024-02-11 13:49:50.000000000 -0500
@@ -0,0 +1,50 @@
+From 82ee233ea82a40c626e95a191fe2d52c745db870 Mon Sep 17 00:00:00 2001
+From: dsk7 <je...@posteo.de>
+Date: Sat, 23 Apr 2022 19:12:13 +0200
+Subject: MAINT: Quadratic runtime while parsing reduced to linear  (#808)
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When the PdfFileReader tries to find the xref marker, the readNextEndLine 
methods builds a so called line by reading byte-for-byte. Every time a new byte 
is read, it is concatenated with the currently read line. This leads to 
quadratic runtime O(n²) behavior as Python strings (also byte-strings) are 
immutable and have to be copied where n is the size of the file.
+For files where the xref marker can not be found at the end this takes a 
enormous amount of time:
+
+* 1mb of zeros at the end: 45.54 seconds
+* 2mb of zeros at the end: 357.04 seconds
+(measured on a laptop made in 2015)
+
+This pull request changes the relevant section of the code to become linear 
runtime O(n), leading to a run time of less then a second for both cases 
mentioned above. Furthermore this PR adds a regression test.
+---
+ PyPDF2/pdf.py | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+diff --git a/PyPDF2/pdf.py b/PyPDF2/pdf.py
+index 9979414..8b355e0 100644
+--- a/PyPDF2/pdf.py
++++ b/PyPDF2/pdf.py
+@@ -1930,7 +1930,7 @@ class PdfFileReader(object):
+     def readNextEndLine(self, stream):
+         debug = False
+         if debug: print(">>readNextEndLine")
+-        line = b_("")
++        line_parts = []
+         while True:
+             # Prevent infinite loops in malformed PDFs
+             if stream.tell() == 0:
+@@ -1957,10 +1957,10 @@ class PdfFileReader(object):
+                 break
+             else:
+                 if debug: print("  x is neither")
+-                line = x + line
+-                if debug: print(("  RNEL line:", line))
++                line_parts.append(x)
+         if debug: print("leaving RNEL")
+-        return line
++        line_parts.reverse()
++        return b"".join(line_parts)
+ 
+     def decrypt(self, password):
+         """
+-- 
+2.30.2
+
diff -Nru pypdf2-1.26.0/debian/patches/CVE-2022-24859.patch 
pypdf2-1.26.0/debian/patches/CVE-2022-24859.patch
--- pypdf2-1.26.0/debian/patches/CVE-2022-24859.patch   1969-12-31 
19:00:00.000000000 -0500
+++ pypdf2-1.26.0/debian/patches/CVE-2022-24859.patch   2024-02-11 
13:49:50.000000000 -0500
@@ -0,0 +1,63 @@
+From: Markus Koschany <a...@debian.org>
+Date: Fri, 3 Jun 2022 08:12:01 +0200
+Subject: CVE-2022-24859
+
+Bug-Debian: https://bugs.debian.org/1009879
+Origin: https://github.com/py-pdf/PyPDF2/pull/740
+---
+ PyPDF2/pdf.py | 32 ++++++++++++++++++++++----------
+ 1 file changed, 22 insertions(+), 10 deletions(-)
+
+diff --git a/PyPDF2/pdf.py b/PyPDF2/pdf.py
+index 9979414..b55dfba 100644
+--- a/PyPDF2/pdf.py
++++ b/PyPDF2/pdf.py
+@@ -2723,11 +2723,25 @@ class ContentStream(DecodedStreamObject):
+         # left at beginning of ID
+         tmp = stream.read(3)
+         assert tmp[:2] == b_("ID")
+-        data = b_("")
++        data = BytesIO()
++        # Read the inline image, while checking for EI (End Image) operator.
+         while True:
+-            # Read the inline image, while checking for EI (End Image) 
operator.
+-            tok = stream.read(1)
+-            if tok == b_("E"):
++            # Read 8 kB at a time and check if the chunk contains the E 
operator.
++            buf = stream.read(8192)
++            # We have reached the end of the stream, but haven't found the EI 
operator.
++            if not buf:
++                raise utils.PdfReadError("Unexpected end of stream")
++            loc = buf.find(b_("E"))
++
++            if loc == -1:
++                data.write(buf)
++            else:
++                # Write out everything before the E.
++                data.write(buf[0:loc])
++
++                # Seek back in the stream to read the E next.
++                stream.seek(loc - len(buf), 1)
++                tok = stream.read(1)
+                 # Check for End Image
+                 tok2 = stream.read(1)
+                 if tok2 == b_("I"):
+@@ -2744,14 +2758,12 @@ class ContentStream(DecodedStreamObject):
+                         stream.seek(-1, 1)
+                         break
+                     else:
+-                        stream.seek(-1,1)
+-                        data += info
++                        stream.seek(-1, 1)
++                        data.write(info)
+                 else:
+                     stream.seek(-1, 1)
+-                    data += tok
+-            else:
+-                data += tok
+-        return {"settings": settings, "data": data}
++                    data.write(tok)
++        return {"settings": settings, "data": data.getvalue()}
+ 
+     def _getData(self):
+         newdata = BytesIO()
diff -Nru pypdf2-1.26.0/debian/patches/series 
pypdf2-1.26.0/debian/patches/series
--- pypdf2-1.26.0/debian/patches/series 2016-09-05 13:14:14.000000000 -0400
+++ pypdf2-1.26.0/debian/patches/series 2024-02-11 13:49:50.000000000 -0500
@@ -1 +1,3 @@
 Prevent_infinite_loop_in_readObject.patch
+CVE-2022-24859.patch
+0001-MAINT-Quadratic-runtime-while-parsing-reduced-to-lin.patch

Bug#1063723: bullseye-pu: package pypdf2/1.26.0-4

Reply via email to