https://bugs.kde.org/show_bug.cgi?id=450597
Bug ID: 450597
Summary: Incorrect handling of zip files with data descriptors
Product: frameworks-karchive
Version: unspecified
Platform: Compiled Sources
OS: Linux
Status: REPORTED
Severity: normal
Priority: NOR
Component: general
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected]
Target Milestone: ---
Created attachment 146961
--> https://bugs.kde.org/attachment.cgi?id=146961&action=edit
A zip file that karchive cannot read
SUMMARY
Kzip in karchive does not correctly handle zip files with data descriptors
(ones where file headers do not include the lengths and CRCs, but they must be
read from the data descriptor after the file contents).
STEPS TO REPRODUCE
1. Download test.zip file from attachments
2. Try to open the zip in Dolphin, or run `kziptest list test.zip`
OBSERVED RESULT
Reading the zip file will not succeed
EXPECTED RESULT
Opening the file in Dolphin should work
SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Kubuntu 21.10
(available in About System)
KDE Plasma Version: 5.22.5
KDE Frameworks Version: 5.91.0 (karchive compiled from sources)
Qt Version: 5.15.2
ADDITIONAL INFORMATION
I stumbled into this when trying to open a zip generated by Google Drive in
Dolphin. Trying to open the zip resulted in this error: "Could not open the
file, probably due to an unsupported file format." Ark and other tools that I
tried opened the zip just fine.
My problematic zip included multiple epub files that are zips themselves. This
seems to confuse karchive. Running `kziptest list` on this file prints the
following: "Invalid ZIP file. Unrecognized header at offset 22988782". Right
before that offset in my file there's a data descriptor block from _inside_ an
epub file in the zip.
When parsing a zip, after a file header, karchive seeks to the next data
descriptor magic value after it. This is incorrect, because the compressed file
contents may include the magic value. This apparently can happen with nested
zips, and likely others, too. A zip can be altered to trigger this issue (that
breaks CRCs, but other tools will still read the file):
$ head -c 1000 /dev/random | zip -fd -fz- test.zip -
$ printf 'PK\x07\x08' | dd of=test.zip seek=300 bs=1 count=4 conv=notrunc
$ unzip -v test.zip
Archive: test.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
1000 Defl:N 1005 -1% 2022-02-19 19:17 edfd80a2 -
-------- ------- --- -------
1000 1005 -1% 1 file
$ ./kziptest list test.zip
Could not open "test.zip" for reading. ZIP file doesn't exist or is invalid:
"Invalid ZIP file. Unrecognized header at offset 316"
Other similar files do not cause errors but can cause incorrect decompression
results:
$ echo "aaaa" > a; echo "bbbb" > b; echo "cccc" > c
$ zip -fd -fz- inner.zip a b c
adding: a (deflated -39%)
adding: b (deflated -39%)
adding: c (deflated -39%)
$ zip -fd -fz- -0 outer.zip inner.zip
adding: inner.zip (stored 0%)
$ unzip -v outer.zip
Archive: outer.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
481 Stored 481 0% 2022-02-20 06:40 72170cd6 inner.zip
-------- ------- --- -------
481 481 0% 1 file
$ ./kziptest list outer.zip
mode=0100664 luryus luryus "c" size: 5 pos: 223 isdir=0
mode=0100664 luryus luryus "a" size: 5 pos: 31 isdir=0
mode=0100664 luryus luryus "b" size: 5 pos: 141 isdir=0
$ ./kziptest print-all foo/outer.zip
Opening zip file
Listing toplevel of zip file
Printing "b"
SIZE=0
DATA=
Printing "c"
SIZE=0
DATA=
Printing "a"
SIZE=0
DATA=
Here Kzip incorrectly jumps to read the inner zip contents while reading the
outer file. Even then it cannot correctly read the files inside the inner zip
(this can be seen in the `print-all` output).
Wikipedia's ZIP file page
(https://en.wikipedia.org/wiki/ZIP_(file_format)#Structure) has this notion
about parsing the files:
"Tools that correctly read ZIP archives must scan for the end of central
directory record signature, and then, as appropriate, the other, indicated,
central directory records. They must not scan for entries from the top of the
ZIP file, because (as previously mentioned in this section) only the central
directory specifies where a file chunk starts and that it has not been deleted.
Scanning could lead to false positives, as the format does not forbid other
data to be between chunks, nor file data streams from containing such
signatures."
Karchive seems use the bad method, resulting in this issue.
--
You are receiving this mail because:
You are watching all bug changes.