I know what the problem is and I know what the fix should be (and tried to convey it) but I don't have a patch. Writing the test case is not a problem but it'll be a lengthy test on slower drives (has to create a zip file that exceeds 4gb of disk space). I don't think there is a way to fake the file system by using a sparse file, sadly. Would this be acceptable? If so then sure - I can provide a reproducible test case that will fail for the current implementation.
D. On Fri, Nov 27, 2020 at 8:37 PM Lance Andersen <lance.ander...@oracle.com> wrote: > Hi Dawid, > > If you believe you have a a fix, please submit a patch along with a JTREG > test case. Ideally the test case would create zip file as part of the test. > > Best > Lance > > On Nov 27, 2020, at 7:37 AM, Dawid Weiss <dawid.we...@gmail.com> wrote: > > Just for the archives - I'm not sure if this ended up being filed to > the bug system - the repro below demonstrates the bug on all JDKs up > to the newest one. > > # create a single random file of 250 megabytes > head -c 250M < /dev/urandom > rnd.bin > # create a bunch of hardlinks to the same file. > for i in `seq -w 1 25`; do ln rnd.bin rnd-$i.bin; done > # create a zip archive exceeding 4gb (in stored mode so that it's faster) > zip -0 archive.zip rnd*.bin > # show the content of the archive - for all entries beyond 4gb this should > # show extra data blocks like below (note the order of extra data > blocks - this is important). > # > # The central-directory extra field contains: > # - A subfield with ID 0x5455 (universal time) and 5 data bytes. > # The local extra field has UTC/GMT modification/access times. > # - A subfield with ID 0x7875 (Unix UID/GID (any size)) and 11 data bytes: > # 01 04 ea 03 00 00 04 ea 03 00 00. > # - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 8 data bytes: > # a4 06 a0 86 01 00 00 00. > # > zipinfo -v archive.zip > > # Bug repro code: > cat > Test.java <<EOF > import java.io.*; > import java.nio.*; > import java.nio.file.*; > import java.nio.file.attribute.*; > > public class Test { > public static void main(String[]args) throws IOException { > try (FileSystem fs = > FileSystems.newFileSystem(Paths.get("archive.zip"))) { > for (Path root : fs.getRootDirectories()) { > Files.walkFileTree(root, new SimpleFileVisitor<>() { > public FileVisitResult visitFile(Path file, > BasicFileAttributes attrs) throws IOException { > return FileVisitResult.CONTINUE; > } > }); > } > } > } > } > EOF > # and run it. > java Test.java > > The above ends with: > > Exception in thread "main" java.util.zip.ZipException: loc: wrong sig > ->841cd111 > at > jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readExtra(ZipFileSystem.java:2899) > at > jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2600) > at > jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.<init>(ZipFileSystem.java:2536) > at > jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:532) > at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:767) > at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:777) > at > jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:276) > at java.base/java.nio.file.Files.readAttributes(Files.java:1843) > at > java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219) > at > java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276) > at > java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373) > at java.base/java.nio.file.Files.walkFileTree(Files.java:2840) > at java.base/java.nio.file.Files.walkFileTree(Files.java:2876) > at Test.main(Test.java:10) > > The fix is relatively simple I think - the current code assumes entry > offset is < 4GB when parsing extra data for > ID 0x5455. This should be evaluated lazily after all extra data blocks > are parsed to ensure the file offset is correctly > updated based on block ID 0x0001, regardless of its ordering within > other extra data blocks. > > > > On Fri, Oct 23, 2020 at 9:12 AM Dawid Weiss <dawid.we...@gmail.com> wrote: > > > > This looks like a legitimate bug to me (the zip file system implementation > is sensitive to the order of extra parameters and may throw an unexpected > error > on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no > permissions - but I think it'd be worth adding one? > > Dawid > > On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <dawid.we...@gmail.com> wrote: > > > Hi Lance, > > Yes, this is exactly the point where the problem is in JDK code. It's > directly > related to the zip entry beyond max int offset. The code tries to read data > from a local zip entry header at locoff, here: > > if (zipfs.readFullyAt(buf, 0, buf.length , locoff) > > but the locoff is set to ~0 as per the spec for files exceeding > 4 gigabytes, which says: > > 4.4.16 relative offset of local header: (4 bytes) > > This is the offset from the start of the first disk on > which this file appears, to where the local header SHOULD > be found. If an archive is in ZIP64 format and the value > in this field is 0xFFFFFFFF, the size will be in the > corresponding 8 byte zip64 extended information extra field. > > A proper fix would be to read the local header from zip64 extra data > first. I don't know how this interferes with the rest of the code though - > didn't have enough time to look at it. As it stands, zip entries > beyond 4GB cause > an unchecked exception while attribute-scanning. This simple snippet > is enough to demonstrate it, given a ZIP entry beyond 4GB range: > > try (FileSystem fs = > FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) { > for (Path root : fs.getRootDirectories()) { > Files.walkFileTree(root, new SimpleFileVisitor<>() { > @Override > public FileVisitResult visitFile(Path file, BasicFileAttributes > attrs) throws IOException { > return FileVisitResult.CONTINUE; > } > }); > } > } > > The walkFileTree method is key here as it attempts to harvest > attributes (why we use it is another story -- this saves a lot of time > on large, network-mounted regular directories when you're collecting > file metadata). > > > Dawid > > > On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen > <lance.ander...@oracle.com> wrote: > > > Hi David, > > From a quick look at ZipFileSystem this appears to be an optimization to > avoid looking at the LOC extended Timestamp Extra field > > If a Info-ZIP Extended Timestamp (0x5455)is found then: > > If the "zipinfo-time" entry was set to “false” in the Map specified when > creating the Zip FileSystem, > > FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false") > > and the data size of the CEN extended Timestamp is 5 (flag + mod time) > > The modified time is used from the CEN Extended Timestamp extra field > > Otherwise get the modified time, creation time, and access time from the > LOC Extended Timestamp extra field. > > > ————— > > Extended Timestamp Extra Field: > > ============================== > > The following is the layout of the extended-timestamp extra block. > (Last Revision 19970118) > > Local-header version: > > Value Size Description > ----- ---- ----------- > (time) 0x5455 Short tag for this extra block type ("UT") > TSize Short total data size for this block > Flags Byte info bits > (ModTime) Long time of last modification (UTC/GMT) > (AcTime) Long time of last access (UTC/GMT) > (CrTime) Long time of original creation (UTC/GMT) > > Central-header version: > > Value Size Description > ----- ---- ----------- > (time) 0x5455 Short tag for this extra block type ("UT") > TSize Short total data size for this block > Flags Byte info bits (refers to local header!) > (ModTime) Long time of last modification (UTC/GMT) > > The central-header extra field contains the modification time > only, > or no timestamp at all. TSize is used to flag its presence or > absence. But note: > > If "Flags" indicates that Modtime is present in the local > header > field, it MUST be present in the central header field, too! > This correspondence is required because the modification time > value may be used to support trans-timezone freshening and > updating operations with zip archives. > > The time values are in standard Unix signed-long format, > indicating > the number of seconds since 1 January 1970 00:00:00. The times > are relative to Coordinated Universal Time (UTC), also sometimes > referred to as Greenwich Mean Time (GMT). To convert to local > time, > the software must know the local timezone offset from UTC/GMT. > > The lower three bits of Flags in both headers indicate which time- > stamps are present in the LOCAL extra field: > > bit 0 if set, modification time is present > bit 1 if set, access time is present > bit 2 if set, creation time is present > bits 3-7 reserved for additional timestamps; not set > > Those times that are present will appear in the order indicated, > but > any combination of times may be omitted. (Creation time may be > present without access time, for example.) TSize should equal > (1 + 4*(number of set bits in Flags)), as the block is currently > defined. Other timestamps may be added in the future. > > > -------------- > > It's hard to comment on why you received the error that you did but it is > possible the tool that was used for writing the entry did something > unexpected. > > Out of curiosity, have you tried using ZipFile/ZipEntry to access the > entry? > > > Best, > Lance > > > On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.we...@gmail.com> wrote: > > Hello, > > We've encountered a seemingly valid ZIP file (zipinfo -v shows all its > entries are intact) that causes a runtime exception when scanning its > contents with ZipFileSystem. The exception indicates an invalid > signature when parsing EXTID_EXTT. I don't quite understand this > comment in the code: > > case EXTID_EXTT: > // spec says the Extened timestamp in cen only has mtime > // need to read the loc to get the extra a/ctime, if flag > // "zipinfo-time" is not specified to false; > // there is performance cost (move up to loc and read) to > // access the loc table foreach entry; > if (zipfs.noExtt) { > if (sz == 5) > mtime = unixToJavaTime(LG(extra, pos + 1)); > break; > } > ... > > but this ZIP file has the extra data block of exactly 5 bytes, as > indicated by zipinfo: > > Central directory entry #6: > --------------------------- > ... > file system or operating system of origin: Unix > version of encoding software: 3.0 > minimum file system compatibility required: MS-DOS, OS/2 or NT FAT > minimum software version required to extract: 2.0 > compression method: deflated > ... > extended local header: no > file last modified on (DOS date/time): 2018 Mar 1 04:56:20 > file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 local > file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC > ... > Unix file attributes (100000 octal): ---------- > MS-DOS file attributes (01 hex): read-only > > The central-directory extra field contains: > - A subfield with ID 0x5455 (universal time) and 5 data bytes. > The local extra field has UTC/GMT modification/access times. > > The above conditional block checking for length == 5 would have worked > in ZipFileSystem but it's surrounded by a condition over an > externally-provided property - zipfs.noExtt is only set to true if: > > this.noExtt = "false".equals(env.get("zipinfo-time")); > > I can't share this particular ZIP file with you and I don't know how > it was created but it seems like that "zipinfo-time" flag could be > omitted if the length of the extra data field is exactly 5? > > Dawid > > > > Best > Lance > ------------------ > > > > Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 > Oracle Java Engineering > 1 Network Drive > Burlington, MA 01803 > lance.ander...@oracle.com > > > > > > > Best > Lance > ------------------ > > > > Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 > Oracle Java Engineering > 1 Network Drive > Burlington, MA 01803 > lance.ander...@oracle.com > > > > >