[
https://issues.apache.org/jira/browse/COMPRESS-540?focusedWorklogId=514468&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514468
]
ASF GitHub Bot logged work on COMPRESS-540:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 20/Nov/20 06:21
Start Date: 20/Nov/20 06:21
Worklog Time Spent: 10m
Work Description: PeterAlfredLee commented on pull request #113:
URL: https://github.com/apache/commons-compress/pull/113#issuecomment-730879784
I have tested using your bench on my computer and got this :
```
Benchmark Mode Cnt Score Error
Units
ReadLargeTarBenchmark.readAllEntries_tarFile avgt 10 5.601 ± 0.061
s/op
ReadLargeTarBenchmark.readAllEntries_tarStream avgt 10 6.728 ± 0.012
s/op
ReadLargeTarBenchmark.readFirstEntry_tarFile avgt 10 2.279 ± 0.017
s/op
ReadLargeTarBenchmark.readFirstEntry_tarStream avgt 10 0.001 ± 0.001
s/op
ReadLargeTarBenchmark.readLastEntry_tarFile avgt 10 2.266 ± 0.020
s/op
ReadLargeTarBenchmark.readLastEntry_tarStream avgt 10 13.068 ± 0.030
s/op
ReadLargeTarBenchmark.readSecondEntry_tarFile avgt 10 2.257 ± 0.012
s/op
ReadLargeTarBenchmark.readSecondEntry_tarStream avgt 10 0.001 ± 0.001
s/op
```
The score is average time by setting `BenchmarkMode` to `Mode.AverageTime`.
I'm testing with linux-4.4.tar, which has a size of 618MB with 55708 entries.
I'm testing on my machine : Windows10, SSD, Java 8.
The result is interesting. Seems the `tarFile` needs an extra 2.65s to read
all entries. `readAllEntries_tarStream` is slower than
`readAllEntries_tarFile`, and `readLastEntry_tarStream` is even slower than
eithor. Will try to figure out what's going on here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 514468)
Time Spent: 4h 40m (was: 4.5h)
> Random access on Tar archive
> ----------------------------
>
> Key: COMPRESS-540
> URL: https://issues.apache.org/jira/browse/COMPRESS-540
> Project: Commons Compress
> Issue Type: Improvement
> Reporter: Robin Schimpf
> Priority: Major
> Time Spent: 4h 40m
> Remaining Estimate: 0h
>
> The TarArchiveInputStream only provides sequential access. If only a small
> amount of files from the archive is needed large amount of data in the input
> stream needs to be skipped.
> Therefore I was working on a implementation to provide random access to
> TarFiles equal to the ZipFile api. The basic idea behind the implementation
> is the following
> * Random access is backed by a SeekableByteChannel
> * Read all headers of the tar file and save the place to the data of every
> header
> * User can request an input stream for any entry in the archive multiple
> times
--
This message was sent by Atlassian Jira
(v8.3.4#803005)