https://bugs.kde.org/show_bug.cgi?id=450597

            Bug ID: 450597
           Summary: Incorrect handling of zip files with data descriptors
           Product: frameworks-karchive
           Version: unspecified
          Platform: Compiled Sources
                OS: Linux
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: general
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]
  Target Milestone: ---

Created attachment 146961
  --> https://bugs.kde.org/attachment.cgi?id=146961&action=edit
A zip file that karchive cannot read

SUMMARY
Kzip in karchive does not correctly handle zip files with data descriptors
(ones where file headers do not include the lengths and CRCs, but they must be
read from the data descriptor after the file contents).


STEPS TO REPRODUCE
1. Download test.zip file from attachments
2. Try to open the zip in Dolphin, or run `kziptest list test.zip`

OBSERVED RESULT
Reading the zip file will not succeed

EXPECTED RESULT
Opening the file in Dolphin should work

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Kubuntu 21.10
(available in About System)
KDE Plasma Version: 5.22.5
KDE Frameworks Version: 5.91.0 (karchive compiled from sources) 
Qt Version: 5.15.2

ADDITIONAL INFORMATION

I stumbled into this when trying to open a zip generated by Google Drive in
Dolphin. Trying to open the zip resulted in this error: "Could not open the
file, probably due to an unsupported file format." Ark and other tools that I
tried opened the zip just fine.

My problematic zip included multiple epub files that are zips themselves. This
seems to confuse karchive. Running `kziptest list` on this file prints the
following: "Invalid ZIP file. Unrecognized header at offset 22988782". Right
before that offset in my file there's a data descriptor block from _inside_ an
epub file in the zip. 

When parsing a zip, after a file header, karchive seeks to the next data
descriptor magic value after it. This is incorrect, because the compressed file
contents may include the magic value. This apparently can happen with nested
zips, and likely others, too. A zip can be altered to trigger this issue (that
breaks CRCs, but other tools will still read the file):

$ head -c 1000 /dev/random | zip -fd -fz- test.zip -
$ printf 'PK\x07\x08' | dd of=test.zip seek=300 bs=1 count=4 conv=notrunc
$ unzip -v test.zip                                             
Archive:  test.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
    1000  Defl:N     1005  -1% 2022-02-19 19:17 edfd80a2  -
--------          -------  ---                            -------
    1000             1005  -1%                            1 file

$ ./kziptest list test.zip
Could not open "test.zip" for reading. ZIP file doesn't exist or is invalid:
"Invalid ZIP file. Unrecognized header at offset 316"


Other similar files do not cause errors but can cause incorrect decompression
results:

$ echo "aaaa" > a; echo "bbbb" > b; echo "cccc" > c
$ zip -fd -fz- inner.zip a b c
  adding: a (deflated -39%)
  adding: b (deflated -39%)
  adding: c (deflated -39%)
$ zip -fd -fz- -0 outer.zip inner.zip
  adding: inner.zip (stored 0%)
$ unzip -v outer.zip 
Archive:  outer.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
     481  Stored      481   0% 2022-02-20 06:40 72170cd6  inner.zip
--------          -------  ---                            -------
     481              481   0%                            1 file

$ ./kziptest list outer.zip 
mode=0100664 luryus luryus "c" size: 5 pos: 223 isdir=0
mode=0100664 luryus luryus "a" size: 5 pos: 31 isdir=0
mode=0100664 luryus luryus "b" size: 5 pos: 141 isdir=0
$ ./kziptest print-all foo/outer.zip
Opening zip file
Listing toplevel of zip file
Printing "b"
SIZE=0
DATA=
Printing "c"
SIZE=0
DATA=
Printing "a"
SIZE=0
DATA=

Here Kzip incorrectly jumps to read the inner zip contents while reading the
outer file. Even then it cannot correctly read the files inside the inner zip
(this can be seen in the `print-all` output).


Wikipedia's ZIP file page
(https://en.wikipedia.org/wiki/ZIP_(file_format)#Structure) has this notion
about parsing the files:
"Tools that correctly read ZIP archives must scan for the end of central
directory record signature, and then, as appropriate, the other, indicated,
central directory records. They must not scan for entries from the top of the
ZIP file, because (as previously mentioned in this section) only the central
directory specifies where a file chunk starts and that it has not been deleted.
Scanning could lead to false positives, as the format does not forbid other
data to be between chunks, nor file data streams from containing such
signatures."

Karchive seems use the bad method, resulting in this issue.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to