[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928996#comment-15928996 ]
Andrew Wang commented on HDFS-10530: ------------------------------------ Thanks for digging in Manoj! Few follow-up Q's: bq. DFSStripedOutputStream verifies if the allocated block locations length is at least equals numDataBlocks, otherwise it throws IOException and the client halts. So, the relaxation is only for the parity blocks. Ran the test myself, looking through the output. It looks like with 6 DNs, we don't allocate any locations for the parity blocks (only 6 replicas): {noformat} 2017-03-16 13:57:38,902 [IPC Server handler 0 on 45189] INFO hdfs.StateChange (FSDirWriteFileOp.java:logAllocatedBlock(777)) - BLOCK* allocate blk_-9223372036854775792_1001, replicas=127.0.0.1:37655, 127.0.0.1:33575, 127.0.0.1:38319, 127.0.0.1:46751, 127.0.0.1:44029, 127.0.0.1:37065 for /ec/test1 {noformat} Could you file a JIRA to dig into this? It looks like we can't write blocks from the same EC group to the same DN. It's still better to write the parities then not at all though. bq. WARN hdfs.DFSOutputStream (DFSStripedOutputStream.java:logCorruptBlocks(1117)) - Block group <1> has 3 corrupt blocks. It's at high risk of losing data. Agree that this log is not accurate, mind filing a JIRA to correct this message? "corrupt" means we have data loss. Here, we haven't lost data yet, but are suffering extremely lowered durability. I'd prefer we also quantify the risk in the message, e.g. "loss of any block" or "loss of two blocks will result in data loss". {noformat} 2017-03-16 13:57:40,898 [DataNode: [[[DISK]file:/home/andrew/dev/hadoop/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data17, [DISK]file:/home/andrew/dev/hadoop/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data18]] heartbeating to localhost/127.0.0.1:45189] INFO datanode.DataNode (BPOfferService.java:processCommandFromActive(738)) - DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY 2017-03-16 13:57:40,943 [DataXceiver for client at /127.0.0.1:47340 [Receiving block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775786_1002]] INFO datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775786_1002 src: /127.0.0.1:47340 dest: /127.0.0.1:44841 2017-03-16 13:57:40,944 [DataXceiver for client at /127.0.0.1:54478 [Receiving block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775785_1002]] INFO datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775785_1002 src: /127.0.0.1:54478 dest: /127.0.0.1:38977 2017-03-16 13:57:40,945 [DataXceiver for client at /127.0.0.1:51622 [Receiving block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775784_1002]] INFO datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775784_1002 src: /127.0.0.1:51622 dest: /127.0.0.1:41895 {noformat} Based on this, I think there's one DN doing reconstruction work to make three parity blocks, which get written to the three new nodes. The above logs are all from the receiving DNs. Seems like we've got a serious lack of logging though in ECWorker / StripedBlockReconstructor / etc, since I determined the above via code inspection. I'd like to see logs for what blocks are being read in, for decoding, and also for writing the blocks out. Another JIRA? > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > ------------------------------------------------------------------------------------------------ > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode > Reporter: Rui Gao > Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, > HDFS-10530.3.patch, HDFS-10530.4.patch, HDFS-10530.5.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > âBlockPlacementPolicyRackFaultTolerant.javaâ, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org