TestTooManyEntries.java and clarify its purpose [v3]

Eirik Bjorsnos Sun, 12 Mar 2023 04:55:05 -0700

On Sun, 12 Mar 2023 10:00:50 GMT, Eirik Bjorsnos <[email protected]> wrote:


>> Please review this PR which speeds up TestTooManyEntries and clarifies its 
>> purpose:
>> 
>> - The name 'TestTooManyEntries' does not clearly convey the purpose of the 
>> test. What is tested is the validation that the total CEN size fits in a 
>> Java byte array. Suggested rename: CenSizeTooLarge
>> - The test creates DEFLATED entries which incurs zlib costs and File Data / 
>> Data Descriptors for no additional benefit. We can use STORED instead.
>> - By creating a single LocalDateTime and setting it with 
>> `ZipEntry.setTimeLocal`, we can avoid repeated time zone calculations. 
>> - The name of entries is generated by calling UUID.randomUUID, we could use 
>> simple counter instead.
>> - The produced file is unnecessarily large. We know how large a CEN entry 
>> is, let's take advantage of that to create a file with the minimal size.
>> - The summary and comments of the test can be improved to help explain the 
>> purpose of the test and how we reach the limit being tested.
>> 
>> These speedups reduced the runtime from 4 min 17 sec to 1 min 54 sec on my 
>> Macbook Pro. The produced ZIP size was reduced from 5.7 GB to 3.5 GB.
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Use Integer.toString instead of Long.toString

Here's a wild idea:

We could inject  large extra fields on the CEN entries to inflate the size of 
each CEN entry. This means we get away with much fewer CEN entries which again 
means much less memory. Also, writing mostly empty extra data seems to be 
really fast, at least on my Macbook Pro. 

For me, the test now run without the `@requires' and completes in less than 5 
seconds.

The cost is slightly more complicated code. But perhaps still worth the added 
complexity?


@BeforeTest
public void setup() throws IOException {
    hugeZipFile = new File("cen-too-large.zip");
    hugeZipFile.deleteOnExit();

    // Length of generated entry names
    int nameLength = 10;

    // We use a large extra field to get away with fewer CEN headers
    byte[] extra = makeLargeExtra();

    // The size of a single CEN header, including name and extra field
    int cenHeaderSize = ZipFile.CENHDR + nameLength + extra.length;

    // Determine the number of entries needed to exceed the MAX_CEN_SIZE
    int numEntries = (MAX_CEN_SIZE / cenHeaderSize) + 1;

    ZipEntry[] entries = new ZipEntry[numEntries];
    try (ZipOutputStream zip = new ZipOutputStream(new BufferedOutputStream(new 
FileOutputStream(hugeZipFile)))) {
        // Creating the LocalDataTime once allows faster processing
        LocalDateTime time = LocalDateTime.now();
        // Add entries until MAX_CEN_SIZE is reached
        for (int i = 0; i < numEntries; i++) {
            String name = Integer.toString(i);
            name = "0".repeat(nameLength -name.length()) + name;
            ZipEntry entry = entries[i] = new ZipEntry(name);
            // Use STORED for faster processing
            entry.setMethod(ZipEntry.STORED);
            entry.setSize(0);
            entry.setCrc(0);
            entry.setTimeLocal(time);
            zip.putNextEntry(entry);
        }
        zip.closeEntry();
        for (ZipEntry entry : entries) {
            entry.setExtra(extra);
        }
    }
}

/**
 * Make an extra field with an 'unknown' tag and the maximum possible data size
 * @return
 */
private static byte[] makeLargeExtra() {
    byte[] extra = new byte[Short.MAX_VALUE];
    ByteBuffer buffer = ByteBuffer.wrap(extra).order(ByteOrder.LITTLE_ENDIAN);
    buffer.putShort((short) 0x9902); // 'unknown' tag
    buffer.putShort((short) (extra.length - 2 * Short.BYTES)); // Data size
    return extra;
}

-------------

PR: https://git.openjdk.org/jdk/pull/12991

Re: RFR: 8304020: Speed up test/jdk/java/util/zip/ZipFile/TestTooManyEntries.java and clarify its purpose [v3]

Reply via email to