Enis Soztutar created HBASE-16766:
-------------------------------------

             Summary: Do not rely on InputStream.available() 
                 Key: HBASE-16766
                 URL: https://issues.apache.org/jira/browse/HBASE-16766
             Project: HBase
          Issue Type: Bug
          Components: wal
            Reporter: Enis Soztutar
            Assignee: Enis Soztutar
             Fix For: 2.0.0, 1.4.0


ProtobufLogReader relies on InputStream.available() to figure out whether we 
have exhausted the file. However InputStream.available() javadoc states: 
{code}
     * <p> Note that while some implementations of {@code InputStream} will 
return
     * the total number of bytes in the stream, many will not.  It is
     * never correct to use the return value of this method to allocate
     * a buffer intended to hold all data in this stream.
{code}

HDFS and many other Hadoop FS's, and things like ByteBufferInputStream, etc all 
return remaining bytes, so the code works on top of HDFS. However, on other 
file systems, it may or may not be true that IS.available() returns the 
remaining bytes. In one specific case, the ADLS wrapper FS used implement 
{{available()}} call with the correct semantics, which ended up causing data 
loss in the WAL recovery. We have since fixed ADLS to implement the HDFS 
semantics, but we should fix HBase itself so that we do not rely on available() 
call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to