Github user ivakegg commented on a diff in the pull request:
https://github.com/apache/accumulo/pull/134#discussion_r72548605
--- Diff:
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/BoundedRangeFileInputStream.java
---
@@ -62,12 +62,15 @@ public BoundedRangeFileInputStream(FSDataInputStream
in, long offset, long lengt
@Override
public int available() throws IOException {
- int avail = in.available();
- if (pos + avail > end) {
- avail = (int) (end - pos);
- }
+ final FSDataInputStream inLocal = in;
+ synchronized (inLocal) {
+ int avail = inLocal.available();
--- End diff --
Sorry Marc, apparently pushing a rebased branch lost your comments. Here
they are reproduced:
Marc:
We don't follow the paradigm prescribed by the interface. If used in a
parallel environment the block region may all be different. The underlying
stream itself will likely be nowhere near your sought position and therefore
availability means nothing to the instance that is being called; however, if we
used the length provided by the Constructor at least we have some semblance of
what availability means for a bounded range input stream. Anyone making a
decision based on availability could have a completely incorrect meaning
because seek always occurs within a read. This will be further compounded due
to threading.
Ivan:
Absolutely correct, as I had eluded to in the ticket. This is why I
tracked where that method is being invoked. The only place I found was in the
hadoop CompressionInputStream where the returned value is used in the
CompressionInputStream.getPos() call which is never actually used. That being
said, using any InputStream in multiple threads is inherently unsafe given its
API.
So I could completely rewrite this mechanism to not reuse the same
underlying stream. Alternatively I could create a map that keeps track of
threads and the position in the stream. Do you think that is worth the
overhead for a method whose value is not actually used?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---