[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

Andrew Wang (JIRA) Thu, 16 Mar 2017 15:06:01 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928996#comment-15928996
 ]


Andrew Wang commented on HDFS-10530:
------------------------------------

Thanks for digging in Manoj! Few follow-up Q's:

bq. DFSStripedOutputStream verifies if the allocated block locations length is 
at least equals numDataBlocks, otherwise it throws IOException and the client 
halts. So, the relaxation is only for the parity blocks.

Ran the test myself, looking through the output.

It looks like with 6 DNs, we don't allocate any locations for the parity blocks 
(only 6 replicas):

{noformat}
2017-03-16 13:57:38,902 [IPC Server handler 0 on 45189] INFO  hdfs.StateChange 
(FSDirWriteFileOp.java:logAllocatedBlock(777)) - BLOCK* allocate 
blk_-9223372036854775792_1001, replicas=127.0.0.1:37655, 127.0.0.1:33575, 
127.0.0.1:38319, 127.0.0.1:46751, 127.0.0.1:44029, 127.0.0.1:37065 for /ec/test1
{noformat}

Could you file a JIRA to dig into this? It looks like we can't write blocks 
from the same EC group to the same DN. It's still better to write the parities 
then not at all though.

bq. WARN  hdfs.DFSOutputStream 
(DFSStripedOutputStream.java:logCorruptBlocks(1117)) - Block group <1> has 3 
corrupt blocks. It's at high risk of losing data.

Agree that this log is not accurate, mind filing a JIRA to correct this 
message? "corrupt" means we have data loss. Here, we haven't lost data yet, but 
are suffering extremely lowered durability.

I'd prefer we also quantify the risk in the message, e.g. "loss of any block" 
or "loss of two blocks will result in data loss".

{noformat}
2017-03-16 13:57:40,898 [DataNode: 
[[[DISK]file:/home/andrew/dev/hadoop/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data17,
 
[DISK]file:/home/andrew/dev/hadoop/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data18]]
  heartbeating to localhost/127.0.0.1:45189] INFO  datanode.DataNode 
(BPOfferService.java:processCommandFromActive(738)) - DatanodeCommand action: 
DNA_ERASURE_CODING_RECOVERY
2017-03-16 13:57:40,943 [DataXceiver for client  at /127.0.0.1:47340 [Receiving 
block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775786_1002]] 
INFO  datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving 
BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775786_1002 src: 
/127.0.0.1:47340 dest: /127.0.0.1:44841
2017-03-16 13:57:40,944 [DataXceiver for client  at /127.0.0.1:54478 [Receiving 
block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775785_1002]] 
INFO  datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving 
BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775785_1002 src: 
/127.0.0.1:54478 dest: /127.0.0.1:38977
2017-03-16 13:57:40,945 [DataXceiver for client  at /127.0.0.1:51622 [Receiving 
block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775784_1002]] 
INFO  datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving 
BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775784_1002 src: 
/127.0.0.1:51622 dest: /127.0.0.1:41895
{noformat}

Based on this, I think there's one DN doing reconstruction work to make three 
parity blocks, which get written to the three new nodes. The above logs are all 
from the receiving DNs.

Seems like we've got a serious lack of logging though in ECWorker / 
StripedBlockReconstructor / etc, since I determined the above via code 
inspection. I'd like to see logs for what blocks are being read in, for 
decoding, and also for writing the blocks out. Another JIRA?

> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10530
>                 URL: https://issues.apache.org/jira/browse/HDFS-10530
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Rui Gao
>            Assignee: Manoj Govindassamy
>              Labels: hdfs-ec-3.0-nice-to-have
>         Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, 
> HDFS-10530.3.patch, HDFS-10530.4.patch, HDFS-10530.5.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

Reply via email to