[jira] [Updated] (HBASE-16766) Do not rely on InputStream.available()

Enis Soztutar (JIRA) Tue, 04 Oct 2016 13:33:32 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Enis Soztutar updated HBASE-16766:
----------------------------------
    Attachment: hbase-16766_v1.patch

Something like this. 

> Do not rely on InputStream.available() 
> ---------------------------------------
>
>                 Key: HBASE-16766
>                 URL: https://issues.apache.org/jira/browse/HBASE-16766
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.4.0
>
>         Attachments: hbase-16766_v1.patch
>
>
> ProtobufLogReader relies on InputStream.available() to figure out whether we 
> have exhausted the file. However InputStream.available() javadoc states: 
> {code}
>      * <p> Note that while some implementations of {@code InputStream} will 
> return
>      * the total number of bytes in the stream, many will not.  It is
>      * never correct to use the return value of this method to allocate
>      * a buffer intended to hold all data in this stream.
> {code}
> HDFS and many other Hadoop FS's, and things like ByteBufferInputStream, etc 
> all return remaining bytes, so the code works on top of HDFS. However, on 
> other file systems, it may or may not be true that IS.available() returns the 
> remaining bytes. In one specific case, the ADLS wrapper FS used implement 
> {{available()}} call with the correct semantics, which ended up causing data 
> loss in the WAL recovery. We have since fixed ADLS to implement the HDFS 
> semantics, but we should fix HBase itself so that we do not rely on 
> available() call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16766) Do not rely on InputStream.available()

Reply via email to