Here is the patch that are not all committed to 205 yet. I am working with Todd, Jitendra and Sanjay on this. We plan to get it done by tomorrow: HDFS-1207. FSNamesystem.stallReplicationWork should be volatile. (Todd Lipcon via dhruba) Risk level: Low - Simple change of a variable to volatile, for multi threaded correctness
HDFS-2309. TestRenameWhileOpen fails. (jitendra) Risk level: Low - Simple change to introduce first block report flag to fix a test failure. HADOOP-6722 Workaround a TCP spec quirk by not allowing NetUtils.connect to connect to itself Risk level: Low - check to see if a socket connected to it self. HDFS-1252 TestDFSConcurrentFileOperations broken in 0.20-append TODO suresh Risk level: Low - fixing the test for correctness HDFS-2300 TestFileAppend4 and TestMultiThreadedSync fail on 20.append Risk level: Low - simple changes to fix the test failure. HDFS-1779 After NameNode restart , Clients can not read partial files even after client invokes Sync. Risk level: Low - fixes related to bbw block reports. Disabled by append supported config flag. This has been tested as part of CDH. HDFS-1186 0.20: DNs should interrupt writers at start of recovery Risk level: Low - ensures data integrity by preventing further writes on lease recovery. This has been tested as part of CDH. HDFS-1260 0.20: Block lost when multiple DNs trying to recover it to different genstamps Risk level: Low - code change looks straight forward. Tested as part of CDH. HDFS-1122 Don't allow client verification to prematurely add Risk level: Low - code change looks straight forward change. Handles client verification interaction with DataBlockScanner and marking a block corrupt incorrectly. Tested as part of CDH. HDFS-1242 0.20 append: Add test for appendFile() race solved in HDFS-142 Risk level: Low - adds more tests to already commited change from HDFS-142. HDFS-1218 20 append: Blocks recovered on startup should be treated Risk level: Medium. This is a must fix to prevent dataloss if datanode goes down in pipeline. This has been tested in CDH. HDFS-1197 - Blocks are considered "complete" prematurely after Risk level: Low. This fixes dataloss. This has been tested in CDH. Considering a shorter version of the patch, given some of the issues were handled by HDFS-1779, to reduce the risk. *The patches I am not planning to add to 205 and the reason:* HDFS-611 Heartbeats times from Datanodes increase when there are plenty of blocks to delete Could be HBase related. HDFS-1056 Multi-node RPC deadlocks during block recovery Setting up xceiver port using “dfs.datanode.port” to work around this issue. HDFS-1982 Null pointer exception is thrown when NN restarts with a block lesser in size than the block present in DN1 but generation stamps is greater in NN. Low probability of this occurring. No patch is available yet. HDFS-1951/HDFS-1970 Null pointer exception comes when Namenode recovery happens and there is no response from client to NN more than the hardlimit for NN recovery and the current block is more than the prev block size in NN Not in CDH. Suitable for a subsequent release. HDFS-1264 0.20: OOME in HDFS client made an unrecoverable HDFS block No patch available yet. HDFS-1262 Failed pipeline creation during append leaves lease hanging on NN Not relevant to get flush as append is no longer used by HBase. HDFS-1266 Missing license headers in branch-20-append Missing license headers - has already been fixed. TODO check HDFS-1248 Misc cleanup/logging improvements for branch-20-append Log related cleanup. Not critical for 205. HDFS-1247 Improvements to HDFS-1204 test Risk level: Low - Test improvements.
