Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/134#discussion_r72548605
  
    --- Diff: 
core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/BoundedRangeFileInputStream.java
 ---
    @@ -62,12 +62,15 @@ public BoundedRangeFileInputStream(FSDataInputStream 
in, long offset, long lengt
     
       @Override
       public int available() throws IOException {
    -    int avail = in.available();
    -    if (pos + avail > end) {
    -      avail = (int) (end - pos);
    -    }
    +    final FSDataInputStream inLocal = in;
    +    synchronized (inLocal) {
    +      int avail = inLocal.available();
    --- End diff --
    
    Sorry Marc, apparently pushing a rebased branch lost your comments.  Here 
they are reproduced:
    
    Marc:
    We don't follow the paradigm prescribed by the interface. If used in a 
parallel environment the block region may all be different. The underlying 
stream itself will likely be nowhere near your sought position and therefore 
availability means nothing to the instance that is being called; however, if we 
used the length provided by the Constructor at least we have some semblance of 
what availability means for a bounded range input stream. Anyone making a 
decision based on availability could have a completely incorrect meaning 
because seek always occurs within a read. This will be further compounded due 
to threading.
    
    Ivan:
        Absolutely correct, as I had eluded to in the ticket.  This is why I 
tracked where that method is being invoked.  The only place I found was in the 
hadoop CompressionInputStream where the returned value is used in the 
CompressionInputStream.getPos() call which is never actually used.  That being 
said, using any InputStream in multiple threads is inherently unsafe given its 
API.
        So I could completely rewrite this mechanism to not reuse the same 
underlying stream.  Alternatively I could create a map that keeps track of 
threads and the position in the stream.  Do you think that is worth the 
overhead for a method whose value is not actually used?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to