[
https://issues.apache.org/jira/browse/COMPRESS-540?focusedWorklogId=460909&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-460909
]
ASF GitHub Bot logged work on COMPRESS-540:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 20/Jul/20 05:39
Start Date: 20/Jul/20 05:39
Worklog Time Spent: 10m
Work Description: theobisproject commented on a change in pull request
#113:
URL: https://github.com/apache/commons-compress/pull/113#discussion_r457052857
##########
File path:
src/main/java/org/apache/commons/compress/archivers/tar/TarArchiveInputStream.java
##########
@@ -1106,35 +917,8 @@ public int compare(final TarArchiveStructSparse p, final
TarArchiveStructSparse
}
}
- if (sparseInputStreams.size() > 0) {
+ if (!sparseInputStreams.isEmpty()) {
currentSparseInputStreamIndex = 0;
}
}
-
- /**
- * This is an inputstream that always return 0,
- * this is used when reading the "holes" of a sparse file
- */
- private static class TarArchiveSparseZeroInputStream extends InputStream {
Review comment:
I moved this class out of the `TarArchiveInputStream` because I also
needed it in the `TarFile`. Otherwise I would had to introduce more duplicated
code. I made the class package private to prevent accidental usage.
In my opinion before the changes done here it was reasonable to have this as
inner class as this was only used here.
##########
File path:
src/main/java/org/apache/commons/compress/utils/BoundedNIOInputStream.java
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+package org.apache.commons.compress.utils;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+
+/**
+ * NIO backed bounded input stream for reading a predefined amount of data
from.
+ * @since 1.21
+ */
+public abstract class BoundedNIOInputStream extends InputStream {
Review comment:
I agree that the name of the class it not perfect. Just wan't to share
the reason why I took the name.
The newly defined read method reads the content into an
`java.nio.ByteBuffer` and the `read` methods of the `InputStream` are calling
it. So the name comes more from the fact that an inherited class only needs to
implement the new `read` method to the `ByteBuffer` and not from the actual
underlying implementation. My experience shows that reading into a ByteBuffer
from a non NIO source is not so much fun and therefore I put the `NIO` in the
classname.
If someone has a better suggestion for a classname it would be welcome.
##########
File path: src/main/java/org/apache/commons/compress/archivers/tar/TarFile.java
##########
@@ -0,0 +1,712 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+package org.apache.commons.compress.archivers.tar;
+
+import java.io.ByteArrayOutputStream;
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.SeekableByteChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.commons.compress.archivers.zip.ZipEncoding;
+import org.apache.commons.compress.archivers.zip.ZipEncodingHelper;
+import org.apache.commons.compress.utils.ArchiveUtils;
+import org.apache.commons.compress.utils.BoundedInputStream;
+import org.apache.commons.compress.utils.BoundedNIOInputStream;
+import org.apache.commons.compress.utils.BoundedSeekableByteChannelInputStream;
+import org.apache.commons.compress.utils.SeekableInMemoryByteChannel;
+
+/**
+ * The TarFile provides random access to UNIX to archives.
+ * @since 1.21
+ */
+public class TarFile implements Closeable {
Review comment:
The duplication is really on purpose from my side in the current state
of the PR. I didn't wanted to limit myself to early from making the needed
changes to get this work. Also I wanted to limit changes to existing code to
avoid breaking it. Sharing some of the code can be quite a challenge because
the data source is so different.
`buildSparseInputStream`: The difference here is that the sparse streams are
saved differently in the TarFile than in the Stream. In the TarFile the streams
need to be saved on a per entry basis to allow random access. I think this
could also be done in the stream but for the first version I didn't wanted to
change too much existing code.
`getLongNameData`: The only difference is the source from where the data is
read (InputStream vs. SeekableByteChannel). A problem could be the call to
`getNext(Tar)Entry` which has a lot of sideeffects.
##########
File path:
src/main/java/org/apache/commons/compress/utils/BoundedSeekableByteChannelInputStream.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+package org.apache.commons.compress.utils;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.SeekableByteChannel;
+
+/**
+ * InputStream that delegates requests to the underlying SeekableByteChannel,
making sure that only bytes from a certain
+ * range can be read.
+ * @since 1.21
+ */
+public class BoundedSeekableByteChannelInputStream extends
BoundedNIOInputStream {
Review comment:
You are correct about the origin of the code. I moved it out of the
`ZipFile` class because it is needed for reading sparse tar conent (see
`TarFile` line 377)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 460909)
Time Spent: 1h 20m (was: 1h 10m)
> Random access on Tar archive
> ----------------------------
>
> Key: COMPRESS-540
> URL: https://issues.apache.org/jira/browse/COMPRESS-540
> Project: Commons Compress
> Issue Type: Improvement
> Reporter: Robin Schimpf
> Priority: Major
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> The TarArchiveInputStream only provides sequential access. If only a small
> amount of files from the archive is needed large amount of data in the input
> stream needs to be skipped.
> Therefore I was working on a implementation to provide random access to
> TarFiles equal to the ZipFile api. The basic idea behind the implementation
> is the following
> * Random access is backed by a SeekableByteChannel
> * Read all headers of the tar file and save the place to the data of every
> header
> * User can request an input stream for any entry in the archive multiple
> times
--
This message was sent by Atlassian Jira
(v8.3.4#803005)