Re: RFR 8080640: Reduce copying when reading JAR/ZIP entries

Staffan Friberg Fri, 05 Jun 2015 11:12:01 -0700

Hi Sherman,

I have a new webrev which reverts that part,http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.2/


Summary of changes
    Reduce lock region in ZipFile.getInputstream

Add private ZipFile.getBytes that can be used in select places inthe JDK where all bytes will be read


Could you sponsor this change once it has been reviewed?

Thanks,
Staffan

On 06/03/2015 10:45 AM, Xueming Shen wrote:

Staffan,
I'm not convinced that the benefit here is significant enough tochange thegetInputStream() to return a ByteArrayInputStream, given this can beeasilyachieved by wrapping the returned byte[] from getBytes(ZipEntry) atuser'ssite. I would suggest to file a separate rfe on this disagreement andmove on
with the agreed getBytes() for now.

Thanks,
-Sherman

On 06/02/2015 10:27 AM, Staffan Friberg wrote:
On 05/22/2015 01:15 PM, Staffan Friberg wrote:
On 05/22/2015 11:51 AM, Xueming Shen wrote:
On 05/22/2015 11:41 AM, Staffan Friberg wrote:
On 05/21/2015 11:00 AM, Staffan Friberg wrote:
On 05/21/2015 09:48 AM, Staffan Friberg wrote:
On 05/20/2015 10:57 AM, Xueming Shen wrote:
On 05/18/2015 06:44 PM, Staffan Friberg wrote:
Hi,
Wanted to get reviews and feedback on this performanceimprovement for reading from JAR/ZIP files during classloadingby reducing unnecessary copying and reading the entry in onego instead of in small portions. This shows a significantimprovement when reading a single entry and for a largeapplication with 10k classes and 500+ JAR files it improvedthe startup time by 4%.
For more details on the background and performance resultsplease see the RFE entry.
RFE - https://bugs.openjdk.java.net/browse/JDK-8080640
WEBREV -http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.0
Cheers,
Staffan
Hi Staffan,
If I did not miss something here, from your use scenario itappears to me the only thing you really
need here to help boost your performance is

    byte[] ZipFile.getAllBytes(ZipEntry ze);
You are allocating a byte[] at use side and wrapping it with aByteBuffer if the size is small enough,otherwise, you letting the ZipFile to allocate a big enough onefor you. It does not look like youcan re-use that byte[] (has to be wrapped by theByteArrayInputStream and return), why do youneed two different methods here? The logic would be much easierto simply let the ZipFile to allocatethe needed buffer with appropriate size, fill the bytes andreturn, with a "OOME" if the entry size
is bigger than 2g.
The only thing we use from the input ze is its name, get thesize/csize from the jzentry, I don't thinkjzentry.csize/size can be "unknown", they are from the "cen"table.
If the real/final use of the bytes is to wrap it with aByteArrayInputStream,why bother using ByteBufferhere? Shouldn't a direct byte[] with exactly the size of theentry server better.
-Sherman
Hi Sherman,
Thanks for the comments. I agree, was starting out withbytebuffer because I was hoping to be able to cache things wherethe buffer was being used, but since the buffer is past alongfurther I couldn't figure out a clean way to do it.Will rewrite it to simply just return a buffer, and only wrap itin the Resource class getByteBuffer.
What would be your thought on updating theZipFile.getInputStream to return ByteArrayInputStream for smallentries? Currently I do that work outside in two places andmoving it would potentially speed up others reading smallentries as well.
Thanks,
Staffan
Just realized that my use of ByteArrayInputStream would miss Jarverification if enabled so the way to go hear would be to add itif possible to the ZipFile.getInputStream.
//Staffan
Hi,
Here is an updated webrev which uses a byte[] directly and alsouses ByteArrayInputStream in ZipFile for small entries below 128k.
I'm not sure about the benefit of doing the ByteArrayInputStream inZipFile.getInputStream. It hasthe consequence of changing the "expected" behavior ofgetInputStream() (instead of return aninput stream waiting for reading, it now reads all bytes inadvance), something we might not wantto do in a performance tuning. Though it might be reasonable toguess everyone get an input stream
is to read all bytes from it later.

-Sherman
http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.1

//Staffan
Agree that it will change the behavior slightly, but as you said itis probably expected that some one will read the stream eventually.We could reduce the size further if that makes a difference, if thesize is below 65k we would not use more memory than the bufferallocated for the InflaterStream today.The total allocation would be slightly larger for deflated entriesas we would allocate a byte[] for the compressed bytes, but it wouldbe GC:able and not kept alive. So from a memory perspective thedifference is very limited.
//Staffan
Hi,
Bumping this thread to get some more comments on the concern aboutchanging the ZipFile.getInputStream behavior. The benefit of doingthis change is that any read of small entries from ZIP and JAR fileswill be much faster and less resources will be held, including nativeresources normally held by the ZipInputStream.
The behavior change that will occur is that the full entry will beread as part of creating the stream and not lazily as might beexpected. However when getting a today InputStream zip file will beaccessed to read information about the size of the entry, so the zipfile is already touched when getting an InputStream, but not thecompressed bytes.
I'm fine with removing this part of the change and just push theprivate getBytes function and the updates to the JDK libraries to useit.
Thanks,
Staffan

Re: RFR 8080640: Reduce copying when reading JAR/ZIP entries

Reply via email to