[
https://issues.apache.org/jira/browse/COMPRESS-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151659#comment-17151659
]
Robin Schimpf commented on COMPRESS-539:
----------------------------------------
I've rerun my test with the current master branch. This is the baseline of the
master before the patch
!image-2020-07-05-22-32-15-131.png!
After the patch estimated allocation is down to 127MiB which is about the same
as with my original patches
!image-2020-07-05-22-32-31-511.png!
[~bodewig] I also tested SkipShieldingInputStream which has the same positive
effect on the memory allocation. Since this all was an arificial test to reduce
memory allocation for the "default" usage of the api I don't include my
measurements but it is good to know about the existence of this.
Sadly the linked issue had no reference to benchmarks or cases where calling
the skip method has better performance as I noted no difference in my tests
(remember that those where artificial ones). But I can think about some special
skip implementations of streams which have to do much less work than the read
method.
Maybe the only solution for streams that do not override the skip method would
be to change the implementation in the InputStream class to use a fix buffer
instead always allocate a new one. But there could be a chance that this is
expected by the Java library developers.
To sum up in my opinion changes to the amount of memory IOUtils.skip allocates
would be appreciated but not really neccessary. I just wanted to raise the
attention and see if you had any ideas if it could be improved (as I had not
really one).
> TarArchiveInputStream allocates a lot of memory when iterating through an
> archive
> ---------------------------------------------------------------------------------
>
> Key: COMPRESS-539
> URL: https://issues.apache.org/jira/browse/COMPRESS-539
> Project: Commons Compress
> Issue Type: Bug
> Affects Versions: 1.20
> Reporter: Robin Schimpf
> Assignee: Peter Lee
> Priority: Major
> Attachments: Don't_call_InputStream#skip.patch,
> Reuse_recordBuffer.patch, image-2020-06-21-10-58-07-917.png,
> image-2020-06-21-10-58-43-255.png, image-2020-06-21-10-59-10-825.png,
> image-2020-07-05-22-10-07-402.png, image-2020-07-05-22-11-25-526.png,
> image-2020-07-05-22-32-15-131.png, image-2020-07-05-22-32-31-511.png
>
>
> I iterated through the linux source tar and noticed some unneeded
> allocations happen without extracting any data.
> Reproducing code
> {code:java}
> File tarFile = new File("linux-5.7.1.tar");
> try (TarArchiveInputStream in = new
> TarArchiveInputStream(Files.newInputStream(tarFile.toPath()))) {
> TarArchiveEntry entry;
> while ((entry = in.getNextTarEntry()) != null) {
> }
> }
> {code}
> The measurement was done on Java 11.0.7 with the Java Flight Recorder.
> Options used:
> -XX:StartFlightRecording=settings=profile,filename=allocations.jfr
> Baseline with the current master implementation:
> Estimated TLAB allocation: 293MiB
> !image-2020-06-21-10-58-07-917.png!
> 1. IOUtils.skip -> input.skip(numToSkip)
> This delegates in my test scenario to the InputStream.skip implementation
> which allocates a new byte[] for every invocation. By simply commenting out
> the while loop which calls the skip method the estimated TLAB allocation
> drops to 164MiB (-129MiB).
> !image-2020-06-21-10-58-43-255.png!
> Commenting out the skip call does not seem to be the best solution but it
> was quick for me to see how much memory can be saved. Also no unit tests
> where failing for me.
> 2. TarArchiveInputStream.readRecord
> For every read of the record a new byte[] is created. Since the record size
> does not change the byte[] can be reused and created when instantiating the
> TarStream. This optimization is already present in the
> TarArchiveOutputStream. Reusing the buffer reduces the estimated TLAB
> allocations further to 128MiB (-36MiB).
> !image-2020-06-21-10-59-10-825.png!
> I attached the patches I used so the results can be verified.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)