Just for the archives - I'm not sure if this ended up being filed to the bug system - the repro below demonstrates the bug on all JDKs up to the newest one.
# create a single random file of 250 megabytes head -c 250M < /dev/urandom > rnd.bin # create a bunch of hardlinks to the same file. for i in `seq -w 1 25`; do ln rnd.bin rnd-$i.bin; done # create a zip archive exceeding 4gb (in stored mode so that it's faster) zip -0 archive.zip rnd*.bin # show the content of the archive - for all entries beyond 4gb this should # show extra data blocks like below (note the order of extra data blocks - this is important). # # The central-directory extra field contains: # - A subfield with ID 0x5455 (universal time) and 5 data bytes. # The local extra field has UTC/GMT modification/access times. # - A subfield with ID 0x7875 (Unix UID/GID (any size)) and 11 data bytes: # 01 04 ea 03 00 00 04 ea 03 00 00. # - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 8 data bytes: # a4 06 a0 86 01 00 00 00. # zipinfo -v archive.zip # Bug repro code: cat > Test.java <<EOF import java.io.*; import java.nio.*; import java.nio.file.*; import java.nio.file.attribute.*; public class Test { public static void main(String[]args) throws IOException { try (FileSystem fs = FileSystems.newFileSystem(Paths.get("archive.zip"))) { for (Path root : fs.getRootDirectories()) { Files.walkFileTree(root, new SimpleFileVisitor<>() { public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException { return FileVisitResult.CONTINUE; } }); } } } } EOF # and run it. java Test.java The above ends with: Exception in thread "main" java.util.zip.ZipException: loc: wrong sig ->841cd111 at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readExtra(ZipFileSystem.java:2899) at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2600) at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.<init>(ZipFileSystem.java:2536) at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:532) at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:767) at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:777) at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:276) at java.base/java.nio.file.Files.readAttributes(Files.java:1843) at java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219) at java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276) at java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373) at java.base/java.nio.file.Files.walkFileTree(Files.java:2840) at java.base/java.nio.file.Files.walkFileTree(Files.java:2876) at Test.main(Test.java:10) The fix is relatively simple I think - the current code assumes entry offset is < 4GB when parsing extra data for ID 0x5455. This should be evaluated lazily after all extra data blocks are parsed to ensure the file offset is correctly updated based on block ID 0x0001, regardless of its ordering within other extra data blocks. On Fri, Oct 23, 2020 at 9:12 AM Dawid Weiss <dawid.we...@gmail.com> wrote: > > > This looks like a legitimate bug to me (the zip file system implementation is > sensitive to the order of extra parameters and may throw an unexpected error > on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no > permissions - but I think it'd be worth adding one? > > Dawid > > On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <dawid.we...@gmail.com> wrote: >> >> Hi Lance, >> >> Yes, this is exactly the point where the problem is in JDK code. It's >> directly >> related to the zip entry beyond max int offset. The code tries to read data >> from a local zip entry header at locoff, here: >> >> if (zipfs.readFullyAt(buf, 0, buf.length , locoff) >> >> but the locoff is set to ~0 as per the spec for files exceeding >> 4 gigabytes, which says: >> >> 4.4.16 relative offset of local header: (4 bytes) >> >> This is the offset from the start of the first disk on >> which this file appears, to where the local header SHOULD >> be found. If an archive is in ZIP64 format and the value >> in this field is 0xFFFFFFFF, the size will be in the >> corresponding 8 byte zip64 extended information extra field. >> >> A proper fix would be to read the local header from zip64 extra data >> first. I don't know how this interferes with the rest of the code though - >> didn't have enough time to look at it. As it stands, zip entries >> beyond 4GB cause >> an unchecked exception while attribute-scanning. This simple snippet >> is enough to demonstrate it, given a ZIP entry beyond 4GB range: >> >> try (FileSystem fs = >> FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) { >> for (Path root : fs.getRootDirectories()) { >> Files.walkFileTree(root, new SimpleFileVisitor<>() { >> @Override >> public FileVisitResult visitFile(Path file, BasicFileAttributes >> attrs) throws IOException { >> return FileVisitResult.CONTINUE; >> } >> }); >> } >> } >> >> The walkFileTree method is key here as it attempts to harvest >> attributes (why we use it is another story -- this saves a lot of time >> on large, network-mounted regular directories when you're collecting >> file metadata). >> >> >> Dawid >> >> >> On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen >> <lance.ander...@oracle.com> wrote: >> > >> > Hi David, >> > >> > From a quick look at ZipFileSystem this appears to be an optimization to >> > avoid looking at the LOC extended Timestamp Extra field >> > >> > If a Info-ZIP Extended Timestamp (0x5455)is found then: >> > >> > If the "zipinfo-time" entry was set to “false” in the Map specified >> > when creating the Zip FileSystem, >> > >> > FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false") >> > >> > and the data size of the CEN extended Timestamp is 5 (flag + mod time) >> > >> > The modified time is used from the CEN Extended Timestamp extra field >> > >> > Otherwise get the modified time, creation time, and access time from the >> > LOC Extended Timestamp extra field. >> > >> > >> > ————— >> > >> > Extended Timestamp Extra Field: >> > >> > ============================== >> > >> > The following is the layout of the extended-timestamp extra >> > block. >> > (Last Revision 19970118) >> > >> > Local-header version: >> > >> > Value Size Description >> > ----- ---- ----------- >> > (time) 0x5455 Short tag for this extra block type ("UT") >> > TSize Short total data size for this block >> > Flags Byte info bits >> > (ModTime) Long time of last modification (UTC/GMT) >> > (AcTime) Long time of last access (UTC/GMT) >> > (CrTime) Long time of original creation (UTC/GMT) >> > >> > Central-header version: >> > >> > Value Size Description >> > ----- ---- ----------- >> > (time) 0x5455 Short tag for this extra block type ("UT") >> > TSize Short total data size for this block >> > Flags Byte info bits (refers to local header!) >> > (ModTime) Long time of last modification (UTC/GMT) >> > >> > The central-header extra field contains the modification time >> > only, >> > or no timestamp at all. TSize is used to flag its presence or >> > absence. But note: >> > >> > If "Flags" indicates that Modtime is present in the local >> > header >> > field, it MUST be present in the central header field, too! >> > This correspondence is required because the modification time >> > value may be used to support trans-timezone freshening and >> > updating operations with zip archives. >> > >> > The time values are in standard Unix signed-long format, >> > indicating >> > the number of seconds since 1 January 1970 00:00:00. The times >> > are relative to Coordinated Universal Time (UTC), also sometimes >> > referred to as Greenwich Mean Time (GMT). To convert to local >> > time, >> > the software must know the local timezone offset from UTC/GMT. >> > >> > The lower three bits of Flags in both headers indicate which >> > time- >> > stamps are present in the LOCAL extra field: >> > >> > bit 0 if set, modification time is present >> > bit 1 if set, access time is present >> > bit 2 if set, creation time is present >> > bits 3-7 reserved for additional timestamps; not set >> > >> > Those times that are present will appear in the order indicated, >> > but >> > any combination of times may be omitted. (Creation time may be >> > present without access time, for example.) TSize should equal >> > (1 + 4*(number of set bits in Flags)), as the block is currently >> > defined. Other timestamps may be added in the future. >> > >> > >> > -------------- >> > >> > It's hard to comment on why you received the error that you did but it is >> > possible the tool that was used for writing the entry did something >> > unexpected. >> > >> > Out of curiosity, have you tried using ZipFile/ZipEntry to access the >> > entry? >> > >> > >> > Best, >> > Lance >> > >> > >> > On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.we...@gmail.com> wrote: >> > >> > Hello, >> > >> > We've encountered a seemingly valid ZIP file (zipinfo -v shows all its >> > entries are intact) that causes a runtime exception when scanning its >> > contents with ZipFileSystem. The exception indicates an invalid >> > signature when parsing EXTID_EXTT. I don't quite understand this >> > comment in the code: >> > >> > case EXTID_EXTT: >> > // spec says the Extened timestamp in cen only has mtime >> > // need to read the loc to get the extra a/ctime, if flag >> > // "zipinfo-time" is not specified to false; >> > // there is performance cost (move up to loc and read) to >> > // access the loc table foreach entry; >> > if (zipfs.noExtt) { >> > if (sz == 5) >> > mtime = unixToJavaTime(LG(extra, pos + 1)); >> > break; >> > } >> > ... >> > >> > but this ZIP file has the extra data block of exactly 5 bytes, as >> > indicated by zipinfo: >> > >> > Central directory entry #6: >> > --------------------------- >> > ... >> > file system or operating system of origin: Unix >> > version of encoding software: 3.0 >> > minimum file system compatibility required: MS-DOS, OS/2 or NT FAT >> > minimum software version required to extract: 2.0 >> > compression method: deflated >> > ... >> > extended local header: no >> > file last modified on (DOS date/time): 2018 Mar 1 04:56:20 >> > file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 local >> > file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC >> > ... >> > Unix file attributes (100000 octal): ---------- >> > MS-DOS file attributes (01 hex): read-only >> > >> > The central-directory extra field contains: >> > - A subfield with ID 0x5455 (universal time) and 5 data bytes. >> > The local extra field has UTC/GMT modification/access times. >> > >> > The above conditional block checking for length == 5 would have worked >> > in ZipFileSystem but it's surrounded by a condition over an >> > externally-provided property - zipfs.noExtt is only set to true if: >> > >> > this.noExtt = "false".equals(env.get("zipinfo-time")); >> > >> > I can't share this particular ZIP file with you and I don't know how >> > it was created but it seems like that "zipinfo-time" flag could be >> > omitted if the length of the extra data field is exactly 5? >> > >> > Dawid >> > >> > >> > >> > Best >> > Lance >> > ------------------ >> > >> > >> > >> > Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 >> > Oracle Java Engineering >> > 1 Network Drive >> > Burlington, MA 01803 >> > lance.ander...@oracle.com >> > >> > >> > >> >