On Sat, 30 Mar 2024 03:21:39 GMT, Jiangli Zhou <[email protected]> wrote:
>> The other loop uses `readAt` to read in additional data and advance through
>> the extra fields, this loop already has access to a buffer that contains all
>> of the data for the extra fields and doesn't need to do that.
>>
>> I considered having the other loop read in all of the data for the extra
>> fields so there could be a shared helper, but the data is variable length,
>> so that would have required having a large fixed-size buffer or allocating
>> memory, which this code seems to be trying to avoid.
>>
>> Do you see a good way to share the loop?
>
> How about something like the following (I haven't tried to compile or test,
> please double check if everything is correct)? The size of the extra fields
> is recorded in the 2-byte field, `extra field length`, so we can just read
> all extra fields in a temp buffer. It causes less overhead with just one
> read.
>
>
> jboolean read_zip64_ext(int fd, jlong censtart, jlong base_offset, Byte
> *cenhdr,
> jboolean check_offset_only,
> jboolean read_extra_fields) {
> jlong cenlen = CENLEN(cenhdr);
> jlong censiz = CENSIZ(cenhdr);
> jlong cenoff = CENOFF(cenhdr);
> if (cenoff == ZIP64_MAGICVAL ||
> (!check_offset_only && (censiz == ZIP64_MAGICVAL || cenlen ==
> ZIP64_MAGICVAL))) {
> jlong cenext = CENEXT(cenhdr);
> if (cenext == 0) {
> return JNI_FALSE;
> }
>
> jlong start = censtart + CENHDR + CENNAM(cenhdr);
> jlong offset = 0;
> Byte *p = start;
>
> // Read the extra fields if needed.
> if (read_extra_fields) {
> *p = (Byte*)malloc(cenext);
> if (p == NULL) {
> return JNI_FALSE;
> }
> if (!readAt(fd, start + offset, cenext, p)) {
> free(p);
> return JNI_FALSE;
> }
> }
>
> while (offset < cenext) {
> short headerId = ZIPEXT_HDR(p + offset);
> short headerSize = ZIPEXT_SIZ(p + offset);
> if (headerId == ZIP64_EXTID) {
> if (!read_zip64_ext(p, &cenlen, &censiz, &cenoff, CENDSK(cenhdr))) {
> if (p != start) {
> free(p);
> }
> return JNI_FALSE;
> }
> break;
> }
> offset += 4 + headerSize;
> }
> }
>
> if (p != start) {
> free(p);
> }
> return JNI_TRUE;
> }
I think that's similar idea to one of the alternatives I mentioned earlier,
won't that allocate for every central directory entry? This callsite has
already read the data we need into a buffer, if we end up doing something like
that it might be better to do the allocation only for the call site in
`validate_lochdr`, and have the shared helper take the pointer to the buffer
instead of having a duplicate read.
It seems like there are some tradeoffs here, I can definitely restructure this,
but I'd also like to get some feedback from a core-libs owner on the overall
approach before iterating too much more.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18479#discussion_r1545070043