Hi Dawid Thank you for getting back to me.
> On Nov 29, 2020, at 5:00 PM, Dawid Weiss <dawid.we...@gmail.com> wrote: > > > Hi Lance, > > I looked at the test you pointed to... This is exactly the test for the > problem I originally reported on October 21st... The bug has been filed > for this on October 25th (by you) -- > https://bugs.openjdk.java.net/browse/JDK-8255380 > <https://bugs.openjdk.java.net/browse/JDK-8255380> and the fix is in commit > 6606e0909cbda9. Weird. > > I think that test is slightly broken in that it passes "zipinfo-time=false" > -- this is exactly the workaround that makes it work everywhere. The > test should *not* have that flag set? The ZipFileSystem check is case sensitive: ——— this.noExtt = "false".equals(env.get("zipinfo-time")); ———— > > try (FileSystem fs = > FileSystems.newFileSystem(Paths.get(ZIP_FILE_NAME), > Map.of("zipinfo-time", "False"))) { > You are right though, it would be clearer if I change it to Map.of(); Thank you for pointing that out Best Lance > Anyway, the problem is fixed. I think it'd be good to backport to LTS > releases too. > > Dawid > > On Sat, Nov 28, 2020 at 4:32 PM Lance Andersen <lance.ander...@oracle.com > <mailto:lance.ander...@oracle.com>> wrote: > Hi Dawid, > > Thank you for the follow up. > >> On Nov 28, 2020, at 9:34 AM, Dawid Weiss <dawid.we...@gmail.com >> <mailto:dawid.we...@gmail.com>> wrote: >> >> >> I know what the problem is and I know what the fix should be (and tried to >> convey it) but I don't have a patch. > > Given you believe you know what the fix would be, it would be great to > create a patch for ZipFileSystem and see if it addresses the issue and then > contribute the fix back to the OpenJDK community :-). > > >> Writing the test case is not a problem but it'll be a lengthy test on slower >> drives (has to create a zip file that exceeds 4gb of disk space). > > The test case would be configured as a manual test and would look similar to > test/jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java > <https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/987*diff-73b401685399c9d59851cda49e9c0d377093086119555708253c8df99649e3c9__;Iw!!GqivPVa7Brio!O7QI8ncxtRbHUjGhW-GmeXZnM3KYdut-r74YfTVxu6etbBcluUHLqDp6DaLsampu2w$> > which also creates a 4GB zip. > >> I don't think there is a way to fake the file system by using a sparse file, >> sadly. Would this be acceptable? If so then sure - I can provide a >> reproducible test case that will fail for the current implementation. > > The above test runs in a reasonable time so I would expect this test case > would as well. > > I will look to reproduce the issue over the weekend or early next week and > log a bug with what you provided in your earlier email. > > > Best > Lance >> >> D. >> >> On Fri, Nov 27, 2020 at 8:37 PM Lance Andersen <lance.ander...@oracle.com >> <mailto:lance.ander...@oracle.com>> wrote: >> Hi Dawid, >> >> If you believe you have a a fix, please submit a patch along with a JTREG >> test case. Ideally the test case would create zip file as part of the test. >> >> Best >> Lance >> >>> On Nov 27, 2020, at 7:37 AM, Dawid Weiss <dawid.we...@gmail.com >>> <mailto:dawid.we...@gmail.com>> wrote: >>> >>> Just for the archives - I'm not sure if this ended up being filed to >>> the bug system - the repro below demonstrates the bug on all JDKs up >>> to the newest one. >>> >>> # create a single random file of 250 megabytes >>> head -c 250M < /dev/urandom > rnd.bin >>> # create a bunch of hardlinks to the same file. >>> for i in `seq -w 1 25`; do ln rnd.bin rnd-$i.bin; done >>> # create a zip archive exceeding 4gb (in stored mode so that it's faster) >>> zip -0 archive.zip rnd*.bin >>> # show the content of the archive - for all entries beyond 4gb this should >>> # show extra data blocks like below (note the order of extra data >>> blocks - this is important). >>> # >>> # The central-directory extra field contains: >>> # - A subfield with ID 0x5455 (universal time) and 5 data bytes. >>> # The local extra field has UTC/GMT modification/access times. >>> # - A subfield with ID 0x7875 (Unix UID/GID (any size)) and 11 data bytes: >>> # 01 04 ea 03 00 00 04 ea 03 00 00. >>> # - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 8 data bytes: >>> # a4 06 a0 86 01 00 00 00. >>> # >>> zipinfo -v archive.zip >>> >>> # Bug repro code: >>> cat > Test.java <<EOF >>> import java.io >>> <https://urldefense.com/v3/__http://java.io__;!!GqivPVa7Brio!KhvWnawkon4E5sVeCpHmwDPJusjNPLMtqPh-ZBleD-U4Dv6tmWKB3cq3sT68yelqdA$>.*; >>> import java.nio.*; >>> import java.nio.file.*; >>> import java.nio.file.attribute.*; >>> >>> public class Test { >>> public static void main(String[]args) throws IOException { >>> try (FileSystem fs = >>> FileSystems.newFileSystem(Paths.get("archive.zip"))) { >>> for (Path root : fs.getRootDirectories()) { >>> Files.walkFileTree(root, new SimpleFileVisitor<>() { >>> public FileVisitResult visitFile(Path file, >>> BasicFileAttributes attrs) throws IOException { >>> return FileVisitResult.CONTINUE; >>> } >>> }); >>> } >>> } >>> } >>> } >>> EOF >>> # and run it. >>> java Test.java >>> >>> The above ends with: >>> >>> Exception in thread "main" java.util.zip.ZipException: loc: wrong sig >>> ->841cd111 >>> at >>> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readExtra(ZipFileSystem.java:2899) >>> at >>> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2600) >>> at >>> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.<init>(ZipFileSystem.java:2536) >>> at >>> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:532) >>> at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:767) >>> at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:777) >>> at >>> jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:276) >>> at java.base/java.nio.file.Files.readAttributes(Files.java:1843) >>> at >>> java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219) >>> at >>> java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276) >>> at >>> java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373) >>> at java.base/java.nio.file.Files.walkFileTree(Files.java:2840) >>> at java.base/java.nio.file.Files.walkFileTree(Files.java:2876) >>> at Test.main(Test.java:10) >>> >>> The fix is relatively simple I think - the current code assumes entry >>> offset is < 4GB when parsing extra data for >>> ID 0x5455. This should be evaluated lazily after all extra data blocks >>> are parsed to ensure the file offset is correctly >>> updated based on block ID 0x0001, regardless of its ordering within >>> other extra data blocks. >>> >>> >>> >>> On Fri, Oct 23, 2020 at 9:12 AM Dawid Weiss <dawid.we...@gmail.com >>> <mailto:dawid.we...@gmail.com>> wrote: >>>> >>>> >>>> This looks like a legitimate bug to me (the zip file system implementation >>>> is sensitive to the order of extra parameters and may throw an unexpected >>>> error >>>> on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no >>>> permissions - but I think it'd be worth adding one? >>>> >>>> Dawid >>>> >>>> On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <dawid.we...@gmail.com >>>> <mailto:dawid.we...@gmail.com>> wrote: >>>>> >>>>> Hi Lance, >>>>> >>>>> Yes, this is exactly the point where the problem is in JDK code. It's >>>>> directly >>>>> related to the zip entry beyond max int offset. The code tries to read >>>>> data >>>>> from a local zip entry header at locoff, here: >>>>> >>>>> if (zipfs.readFullyAt(buf, 0, buf.length , locoff) >>>>> >>>>> but the locoff is set to ~0 as per the spec for files exceeding >>>>> 4 gigabytes, which says: >>>>> >>>>> 4.4.16 relative offset of local header: (4 bytes) >>>>> >>>>> This is the offset from the start of the first disk on >>>>> which this file appears, to where the local header SHOULD >>>>> be found. If an archive is in ZIP64 format and the value >>>>> in this field is 0xFFFFFFFF, the size will be in the >>>>> corresponding 8 byte zip64 extended information extra field. >>>>> >>>>> A proper fix would be to read the local header from zip64 extra data >>>>> first. I don't know how this interferes with the rest of the code though - >>>>> didn't have enough time to look at it. As it stands, zip entries >>>>> beyond 4GB cause >>>>> an unchecked exception while attribute-scanning. This simple snippet >>>>> is enough to demonstrate it, given a ZIP entry beyond 4GB range: >>>>> >>>>> try (FileSystem fs = >>>>> FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) { >>>>> for (Path root : fs.getRootDirectories()) { >>>>> Files.walkFileTree(root, new SimpleFileVisitor<>() { >>>>> @Override >>>>> public FileVisitResult visitFile(Path file, BasicFileAttributes >>>>> attrs) throws IOException { >>>>> return FileVisitResult.CONTINUE; >>>>> } >>>>> }); >>>>> } >>>>> } >>>>> >>>>> The walkFileTree method is key here as it attempts to harvest >>>>> attributes (why we use it is another story -- this saves a lot of time >>>>> on large, network-mounted regular directories when you're collecting >>>>> file metadata). >>>>> >>>>> >>>>> Dawid >>>>> >>>>> >>>>> On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen >>>>> <lance.ander...@oracle.com <mailto:lance.ander...@oracle.com>> wrote: >>>>>> >>>>>> Hi David, >>>>>> >>>>>> From a quick look at ZipFileSystem this appears to be an optimization >>>>>> to avoid looking at the LOC extended Timestamp Extra field >>>>>> >>>>>> If a Info-ZIP Extended Timestamp (0x5455)is found then: >>>>>> >>>>>> If the "zipinfo-time" entry was set to “false” in the Map specified >>>>>> when creating the Zip FileSystem, >>>>>> >>>>>> FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false") >>>>>> >>>>>> and the data size of the CEN extended Timestamp is 5 (flag + mod time) >>>>>> >>>>>> The modified time is used from the CEN Extended Timestamp extra field >>>>>> >>>>>> Otherwise get the modified time, creation time, and access time from the >>>>>> LOC Extended Timestamp extra field. >>>>>> >>>>>> >>>>>> ————— >>>>>> >>>>>> Extended Timestamp Extra Field: >>>>>> >>>>>> ============================== >>>>>> >>>>>> The following is the layout of the extended-timestamp extra >>>>>> block. >>>>>> (Last Revision 19970118) >>>>>> >>>>>> Local-header version: >>>>>> >>>>>> Value Size Description >>>>>> ----- ---- ----------- >>>>>> (time) 0x5455 Short tag for this extra block type ("UT") >>>>>> TSize Short total data size for this block >>>>>> Flags Byte info bits >>>>>> (ModTime) Long time of last modification (UTC/GMT) >>>>>> (AcTime) Long time of last access (UTC/GMT) >>>>>> (CrTime) Long time of original creation (UTC/GMT) >>>>>> >>>>>> Central-header version: >>>>>> >>>>>> Value Size Description >>>>>> ----- ---- ----------- >>>>>> (time) 0x5455 Short tag for this extra block type ("UT") >>>>>> TSize Short total data size for this block >>>>>> Flags Byte info bits (refers to local header!) >>>>>> (ModTime) Long time of last modification (UTC/GMT) >>>>>> >>>>>> The central-header extra field contains the modification time >>>>>> only, >>>>>> or no timestamp at all. TSize is used to flag its presence or >>>>>> absence. But note: >>>>>> >>>>>> If "Flags" indicates that Modtime is present in the local >>>>>> header >>>>>> field, it MUST be present in the central header field, too! >>>>>> This correspondence is required because the modification >>>>>> time >>>>>> value may be used to support trans-timezone freshening and >>>>>> updating operations with zip archives. >>>>>> >>>>>> The time values are in standard Unix signed-long format, >>>>>> indicating >>>>>> the number of seconds since 1 January 1970 00:00:00. The times >>>>>> are relative to Coordinated Universal Time (UTC), also sometimes >>>>>> referred to as Greenwich Mean Time (GMT). To convert to local >>>>>> time, >>>>>> the software must know the local timezone offset from UTC/GMT. >>>>>> >>>>>> The lower three bits of Flags in both headers indicate which >>>>>> time- >>>>>> stamps are present in the LOCAL extra field: >>>>>> >>>>>> bit 0 if set, modification time is present >>>>>> bit 1 if set, access time is present >>>>>> bit 2 if set, creation time is present >>>>>> bits 3-7 reserved for additional timestamps; not >>>>>> set >>>>>> >>>>>> Those times that are present will appear in the order >>>>>> indicated, but >>>>>> any combination of times may be omitted. (Creation time may be >>>>>> present without access time, for example.) TSize should equal >>>>>> (1 + 4*(number of set bits in Flags)), as the block is currently >>>>>> defined. Other timestamps may be added in the future. >>>>>> >>>>>> >>>>>> -------------- >>>>>> >>>>>> It's hard to comment on why you received the error that you did but it >>>>>> is possible the tool that was used for writing the entry did something >>>>>> unexpected. >>>>>> >>>>>> Out of curiosity, have you tried using ZipFile/ZipEntry to access the >>>>>> entry? >>>>>> >>>>>> >>>>>> Best, >>>>>> Lance >>>>>> >>>>>> >>>>>> On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.we...@gmail.com >>>>>> <mailto:dawid.we...@gmail.com>> wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> We've encountered a seemingly valid ZIP file (zipinfo -v shows all its >>>>>> entries are intact) that causes a runtime exception when scanning its >>>>>> contents with ZipFileSystem. The exception indicates an invalid >>>>>> signature when parsing EXTID_EXTT. I don't quite understand this >>>>>> comment in the code: >>>>>> >>>>>> case EXTID_EXTT: >>>>>> // spec says the Extened timestamp in cen only has mtime >>>>>> // need to read the loc to get the extra a/ctime, if flag >>>>>> // "zipinfo-time" is not specified to false; >>>>>> // there is performance cost (move up to loc and read) to >>>>>> // access the loc table foreach entry; >>>>>> if (zipfs.noExtt) { >>>>>> if (sz == 5) >>>>>> mtime = unixToJavaTime(LG(extra, pos + 1)); >>>>>> break; >>>>>> } >>>>>> ... >>>>>> >>>>>> but this ZIP file has the extra data block of exactly 5 bytes, as >>>>>> indicated by zipinfo: >>>>>> >>>>>> Central directory entry #6: >>>>>> --------------------------- >>>>>> ... >>>>>> file system or operating system of origin: Unix >>>>>> version of encoding software: 3.0 >>>>>> minimum file system compatibility required: MS-DOS, OS/2 or NT FAT >>>>>> minimum software version required to extract: 2.0 >>>>>> compression method: deflated >>>>>> ... >>>>>> extended local header: no >>>>>> file last modified on (DOS date/time): 2018 Mar 1 04:56:20 >>>>>> file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 local >>>>>> file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC >>>>>> ... >>>>>> Unix file attributes (100000 octal): ---------- >>>>>> MS-DOS file attributes (01 hex): read-only >>>>>> >>>>>> The central-directory extra field contains: >>>>>> - A subfield with ID 0x5455 (universal time) and 5 data bytes. >>>>>> The local extra field has UTC/GMT modification/access times. >>>>>> >>>>>> The above conditional block checking for length == 5 would have worked >>>>>> in ZipFileSystem but it's surrounded by a condition over an >>>>>> externally-provided property - zipfs.noExtt is only set to true if: >>>>>> >>>>>> this.noExtt = "false".equals(env.get("zipinfo-time")); >>>>>> >>>>>> I can't share this particular ZIP file with you and I don't know how >>>>>> it was created but it seems like that "zipinfo-time" flag could be >>>>>> omitted if the length of the extra data field is exactly 5? >>>>>> >>>>>> Dawid >>>>>> >>>>>> >>>>>> >>>>>> Best >>>>>> Lance >>>>>> ------------------ >>>>>> >>>>>> >>>>>> >>>>>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 >>>>>> Oracle Java Engineering >>>>>> 1 Network Drive >>>>>> Burlington, MA 01803 >>>>>> lance.ander...@oracle.com <mailto:lance.ander...@oracle.com> >>>>>> >>>>>> >>>>>> >>>>>> >> >> >> Best >> Lance >> ------------------ >> >> <oracle_sig_logo.gif> >> >> >> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 >> Oracle Java Engineering >> 1 Network Drive >> Burlington, MA 01803 >> lance.ander...@oracle.com <mailto:lance.ander...@oracle.com> > > Best > Lance > ------------------ > > <oracle_sig_logo.gif> > > > Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 > Oracle Java Engineering > 1 Network Drive > Burlington, MA 01803 > lance.ander...@oracle.com <mailto:lance.ander...@oracle.com> > > > > Best Lance ------------------ Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 Oracle Java Engineering 1 Network Drive Burlington, MA 01803 lance.ander...@oracle.com