On Sun, 12 Mar 2023 10:00:50 GMT, Eirik Bjorsnos <d...@openjdk.org> wrote:
>> Please review this PR which speeds up TestTooManyEntries and clarifies its >> purpose: >> >> - The name 'TestTooManyEntries' does not clearly convey the purpose of the >> test. What is tested is the validation that the total CEN size fits in a >> Java byte array. Suggested rename: CenSizeTooLarge >> - The test creates DEFLATED entries which incurs zlib costs and File Data / >> Data Descriptors for no additional benefit. We can use STORED instead. >> - By creating a single LocalDateTime and setting it with >> `ZipEntry.setTimeLocal`, we can avoid repeated time zone calculations. >> - The name of entries is generated by calling UUID.randomUUID, we could use >> simple counter instead. >> - The produced file is unnecessarily large. We know how large a CEN entry >> is, let's take advantage of that to create a file with the minimal size. >> - The summary and comments of the test can be improved to help explain the >> purpose of the test and how we reach the limit being tested. >> >> These speedups reduced the runtime from 4 min 17 sec to 1 min 54 sec on my >> Macbook Pro. The produced ZIP size was reduced from 5.7 GB to 3.5 GB. > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Use Integer.toString instead of Long.toString Here's a wild idea: We could inject large extra fields on the CEN entries to inflate the size of each CEN entry. This means we get away with much fewer CEN entries which again means much less memory. Also, writing mostly empty extra data seems to be really fast, at least on my Macbook Pro. For me, the test now run without the `@requires' and completes in less than 5 seconds. The cost is slightly more complicated code. But perhaps still worth the added complexity? @BeforeTest public void setup() throws IOException { hugeZipFile = new File("cen-too-large.zip"); hugeZipFile.deleteOnExit(); // Length of generated entry names int nameLength = 10; // We use a large extra field to get away with fewer CEN headers byte[] extra = makeLargeExtra(); // The size of a single CEN header, including name and extra field int cenHeaderSize = ZipFile.CENHDR + nameLength + extra.length; // Determine the number of entries needed to exceed the MAX_CEN_SIZE int numEntries = (MAX_CEN_SIZE / cenHeaderSize) + 1; ZipEntry[] entries = new ZipEntry[numEntries]; try (ZipOutputStream zip = new ZipOutputStream(new BufferedOutputStream(new FileOutputStream(hugeZipFile)))) { // Creating the LocalDataTime once allows faster processing LocalDateTime time = LocalDateTime.now(); // Add entries until MAX_CEN_SIZE is reached for (int i = 0; i < numEntries; i++) { String name = Integer.toString(i); name = "0".repeat(nameLength -name.length()) + name; ZipEntry entry = entries[i] = new ZipEntry(name); // Use STORED for faster processing entry.setMethod(ZipEntry.STORED); entry.setSize(0); entry.setCrc(0); entry.setTimeLocal(time); zip.putNextEntry(entry); } zip.closeEntry(); for (ZipEntry entry : entries) { entry.setExtra(extra); } } } /** * Make an extra field with an 'unknown' tag and the maximum possible data size * @return */ private static byte[] makeLargeExtra() { byte[] extra = new byte[Short.MAX_VALUE]; ByteBuffer buffer = ByteBuffer.wrap(extra).order(ByteOrder.LITTLE_ENDIAN); buffer.putShort((short) 0x9902); // 'unknown' tag buffer.putShort((short) (extra.length - 2 * Short.BYTES)); // Data size return extra; } ------------- PR: https://git.openjdk.org/jdk/pull/12991