I propose that the 20-append patches (details below) be included in 20.205 which will become the first official Apache release of Hadoop that supports Append and HBase.
Background: There hasn't been a official Apache release that supports HBase. The HBase community have instead been using the 20-append branch; the patches were contributed by the HBase community including Facebook. The Cloudera distribution has also included these patches. Andrew Purtell has ported these patches to 20-security branch. Risk Level: These patches have been used and tested on large HBase clusters by FB , by those who use 20-append branch directly (various users including a 500 node HBase cluster at Yahoo) and by those that use the Cloudera distribution. We have reviewed the patches and have conducted further tests; testing and validation continues. Patches: HDFS-200. Support append and sync for hadoop 0.20 branch. HDFS-142. Blocks that are being written by a client are stored in the blocksBeingWritten directory. HDFS-1057. Concurrent readers hit ChecksumExceptions if following a writer to very end of file HDFS-724. Use a bidirectional heartbeat to detect stuck pipeline. HDFS-895. Allow hflush/sync to occur in parallel with new writes to the file. HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease recovery. HDFS-1555. Disallow pipelien recovery if a file is already being lease recovered. HDFS-1554. New semantics for recoverLease. HDFS-988. Fix bug where savenameSpace can corrupt edits log. HDFS-826. Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline. HDFS-630. Client can exclude specific nodes in the write pipeline. HDFS-1141. completeFile does not check lease ownership. HDFS-1204. Lease expiration should recover single files, not entire lease holder HDFS-1254. Support append/sync via the default configuration. HDFS-1346. DFSClient receives out of order packet ack. HDFS-1054. remove sleep before retry for allocating a block.
