Re: "loc: wrong sig" in ZipFileSystem on a valid ZIP file (extra data field of exactly 5 bytes)

Dawid Weiss Sat, 28 Nov 2020 06:35:56 -0800

I know what the problem is and I know what the fix should be (and tried to
convey it) but I don't have a patch. Writing the test case is not a problem
but it'll be a lengthy test on slower drives (has to create a zip file that
exceeds 4gb of disk space). I don't think there is a way to fake the file
system by using a sparse file, sadly. Would this be acceptable? If so then
sure - I can provide a reproducible test case that will fail for the
current implementation.


D.

On Fri, Nov 27, 2020 at 8:37 PM Lance Andersen <[email protected]>
wrote:

> Hi Dawid,
>
> If you believe you have a a fix, please submit a patch along with a JTREG
> test case.  Ideally the test case would create zip file as part of the test.
>
> Best
> Lance
>
> On Nov 27, 2020, at 7:37 AM, Dawid Weiss <[email protected]> wrote:
>
> Just for the archives - I'm not sure if this ended up being filed to
> the bug system - the repro below demonstrates the bug on all JDKs up
> to the newest one.
>
> # create a single random file of 250 megabytes
> head -c 250M < /dev/urandom > rnd.bin
> # create a bunch of hardlinks to the same file.
> for i in `seq -w 1 25`; do ln rnd.bin rnd-$i.bin; done
> # create a zip archive exceeding 4gb (in stored mode so that it's faster)
> zip -0 archive.zip rnd*.bin
> # show the content of the archive - for all entries beyond 4gb this should
> # show extra data blocks like below (note the order of extra data
> blocks - this is important).
> #
> #  The central-directory extra field contains:
> # - A subfield with ID 0x5455 (universal time) and 5 data bytes.
> #   The local extra field has UTC/GMT modification/access times.
> # - A subfield with ID 0x7875 (Unix UID/GID (any size)) and 11 data bytes:
> #   01 04 ea 03 00 00 04 ea 03 00 00.
> # - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 8 data bytes:
> #  a4 06 a0 86 01 00 00 00.
> #
> zipinfo -v archive.zip
>
> # Bug repro code:
> cat > Test.java <<EOF
> import java.io.*;
> import java.nio.*;
> import java.nio.file.*;
> import java.nio.file.attribute.*;
>
> public class Test {
>  public static void main(String[]args) throws IOException {
>    try (FileSystem fs =
> FileSystems.newFileSystem(Paths.get("archive.zip"))) {
>      for (Path root : fs.getRootDirectories()) {
>        Files.walkFileTree(root, new SimpleFileVisitor<>() {
>          public FileVisitResult visitFile(Path file,
> BasicFileAttributes attrs) throws IOException {
>            return FileVisitResult.CONTINUE;
>          }
>        });
>      }
>    }
>  }
> }
> EOF
> # and run it.
> java Test.java
>
> The above ends with:
>
> Exception in thread "main" java.util.zip.ZipException: loc: wrong sig
> ->841cd111
>        at
> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readExtra(ZipFileSystem.java:2899)
>        at
> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2600)
>        at
> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.<init>(ZipFileSystem.java:2536)
>        at
> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:532)
>        at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:767)
>        at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:777)
>        at
> jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:276)
>        at java.base/java.nio.file.Files.readAttributes(Files.java:1843)
>        at
> java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
>        at
> java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
>        at
> java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373)
>        at java.base/java.nio.file.Files.walkFileTree(Files.java:2840)
>        at java.base/java.nio.file.Files.walkFileTree(Files.java:2876)
>        at Test.main(Test.java:10)
>
> The fix is relatively simple I think - the current code assumes entry
> offset is < 4GB when parsing extra data for
> ID 0x5455. This should be evaluated lazily after all extra data blocks
> are parsed to ensure the file offset is correctly
> updated based on block ID 0x0001, regardless of its ordering within
> other extra data blocks.
>
>
>
> On Fri, Oct 23, 2020 at 9:12 AM Dawid Weiss <[email protected]> wrote:
>
>
>
> This looks like a legitimate bug to me (the zip file system implementation
> is sensitive to the order of extra parameters and may throw an unexpected
> error
> on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no
> permissions - but I think it'd be worth adding one?
>
> Dawid
>
> On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <[email protected]> wrote:
>
>
> Hi Lance,
>
> Yes, this is exactly the point where the problem is in JDK code. It's
> directly
> related to the zip entry beyond max int offset. The code tries to read data
> from a local zip entry header at locoff, here:
>
> if (zipfs.readFullyAt(buf, 0, buf.length , locoff)
>
> but the locoff is set to ~0 as per the spec for files exceeding
> 4 gigabytes, which says:
>
>   4.4.16 relative offset of local header: (4 bytes)
>
>       This is the offset from the start of the first disk on
>       which this file appears, to where the local header SHOULD
>       be found.  If an archive is in ZIP64 format and the value
>       in this field is 0xFFFFFFFF, the size will be in the
>       corresponding 8 byte zip64 extended information extra field.
>
> A proper fix would be to read the local header from zip64 extra data
> first. I don't know how this interferes with the rest of the code though -
> didn't have enough time to look at it. As it stands, zip entries
> beyond 4GB cause
> an unchecked exception while attribute-scanning. This simple snippet
> is enough to demonstrate it, given a ZIP entry beyond 4GB range:
>
> try (FileSystem fs =
> FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) {
>  for (Path root : fs.getRootDirectories()) {
>    Files.walkFileTree(root, new SimpleFileVisitor<>() {
>      @Override
>      public FileVisitResult visitFile(Path file, BasicFileAttributes
> attrs) throws IOException {
>        return FileVisitResult.CONTINUE;
>      }
>    });
>  }
> }
>
> The walkFileTree method is key here as it attempts to harvest
> attributes (why we use it is another story -- this saves a lot of time
> on large, network-mounted regular directories when you're collecting
> file metadata).
>
>
> Dawid
>
>
> On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen
> <[email protected]> wrote:
>
>
> Hi David,
>
> From  a quick look at ZipFileSystem this appears to be an optimization to
> avoid looking at the LOC extended Timestamp Extra field
>
> If a Info-ZIP Extended Timestamp (0x5455)is found then:
>
>  If the "zipinfo-time" entry was set  to “false” in the Map specified when
> creating the Zip FileSystem,
>
> FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false")
>
> and the  data size of the CEN extended Timestamp is 5 (flag + mod time)
>
> The modified time is used from the CEN Extended Timestamp extra field
>
> Otherwise get the modified time, creation time, and access time from the
> LOC Extended Timestamp extra field.
>
>
> —————
>
> Extended Timestamp Extra Field:
>
>          ==============================
>
>          The following is the layout of the extended-timestamp extra block.
>          (Last Revision 19970118)
>
>          Local-header version:
>
>          Value         Size        Description
>          -----         ----        -----------
>  (time)  0x5455        Short       tag for this extra block type ("UT")
>          TSize         Short       total data size for this block
>          Flags         Byte        info bits
>          (ModTime)     Long        time of last modification (UTC/GMT)
>          (AcTime)      Long        time of last access (UTC/GMT)
>          (CrTime)      Long        time of original creation (UTC/GMT)
>
>          Central-header version:
>
>          Value         Size        Description
>          -----         ----        -----------
>  (time)  0x5455        Short       tag for this extra block type ("UT")
>          TSize         Short       total data size for this block
>          Flags         Byte        info bits (refers to local header!)
>          (ModTime)     Long        time of last modification (UTC/GMT)
>
>          The central-header extra field contains the modification time
> only,
>          or no timestamp at all.  TSize is used to flag its presence or
>          absence.  But note:
>
>              If "Flags" indicates that Modtime is present in the local
> header
>              field, it MUST be present in the central header field, too!
>              This correspondence is required because the modification time
>              value may be used to support trans-timezone freshening and
>              updating operations with zip archives.
>
>          The time values are in standard Unix signed-long format,
> indicating
>          the number of seconds since 1 January 1970 00:00:00.  The times
>          are relative to Coordinated Universal Time (UTC), also sometimes
>          referred to as Greenwich Mean Time (GMT).  To convert to local
> time,
>          the software must know the local timezone offset from UTC/GMT.
>
>          The lower three bits of Flags in both headers indicate which time-
>          stamps are present in the LOCAL extra field:
>
>                bit 0           if set, modification time is present
>                bit 1           if set, access time is present
>                bit 2           if set, creation time is present
>                bits 3-7        reserved for additional timestamps; not set
>
>          Those times that are present will appear in the order indicated,
> but
>          any combination of times may be omitted.  (Creation time may be
>          present without access time, for example.)  TSize should equal
>          (1 + 4*(number of set bits in Flags)), as the block is currently
>          defined.  Other timestamps may be added in the future.
>
>
> --------------
>
> It's hard to comment on why you received the error that you did but it is
> possible the tool that was used for writing the entry  did something
> unexpected.
>
> Out of curiosity, have you tried using ZipFile/ZipEntry to access the
> entry?
>
>
> Best,
> Lance
>
>
> On Oct 21, 2020, at 4:55 AM, Dawid Weiss <[email protected]> wrote:
>
> Hello,
>
> We've encountered a seemingly valid ZIP file (zipinfo -v shows all its
> entries are intact) that causes a runtime exception when scanning its
> contents with ZipFileSystem. The exception indicates an invalid
> signature when parsing EXTID_EXTT. I don't quite understand this
> comment in the code:
>
> case EXTID_EXTT:
>   // spec says the Extened timestamp in cen only has mtime
>   // need to read the loc to get the extra a/ctime, if flag
>   // "zipinfo-time" is not specified to false;
>   // there is performance cost (move up to loc and read) to
>   // access the loc table foreach entry;
>   if (zipfs.noExtt) {
>       if (sz == 5)
>           mtime = unixToJavaTime(LG(extra, pos + 1));
>        break;
>   }
> ...
>
> but this ZIP file has the extra data block of exactly 5 bytes, as
> indicated by zipinfo:
>
> Central directory entry #6:
> ---------------------------
> ...
> file system or operating system of origin:      Unix
> version of encoding software:                   3.0
> minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
> minimum software version required to extract:   2.0
> compression method:                             deflated
> ...
> extended local header:                          no
> file last modified on (DOS date/time):          2018 Mar 1 04:56:20
> file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 local
> file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC
> ...
> Unix file attributes (100000 octal):            ----------
> MS-DOS file attributes (01 hex):                read-only
>
> The central-directory extra field contains:
> - A subfield with ID 0x5455 (universal time) and 5 data bytes.
>   The local extra field has UTC/GMT modification/access times.
>
> The above conditional block checking for length == 5 would have worked
> in ZipFileSystem but it's surrounded by a condition over an
> externally-provided property - zipfs.noExtt is only set to true if:
>
> this.noExtt = "false".equals(env.get("zipinfo-time"));
>
> I can't share this particular ZIP file with you and I don't know how
> it was created but it seems like that "zipinfo-time" flag could be
> omitted if the length of the extra data field is exactly 5?
>
> Dawid
>
>
>
> Best
> Lance
> ------------------
>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> [email protected]
>
>
>
>
>
>
> Best
> Lance
> ------------------
>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> [email protected]
>
>
>
>
>

Re: "loc: wrong sig" in ZipFileSystem on a valid ZIP file (extra data field of exactly 5 bytes)

Reply via email to