[
https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933376#comment-16933376
]
HuangTao edited comment on HDFS-14849 at 9/19/19 1:35 PM:
----------------------------------------------------------
I find a clue:
the `chooseSourceDatanodes` get
{quote}LIVE=2, READONLY=0, DECOMMISSIONING=7, DECOMMISSIONED=0,
MAINTENANCE_NOT_FOR_READ=0, MAINTENANCE_FOR_READ=0, CORRUPT=0, EXCESS=0,
STALESTORAGE=0, REDUNDANT=22{quote}
and all block index (0-8) exists, and three blocks 3/4/8 have no redundant
block, and the datanode where block 8 stored is in DECOMMISSIONING, other two
datanode adminState is null.
{quote}[0, 1, 2, 3, 4, 5, 6, 7, 8, 6, 7, 6, 6, 5, 0, 1, 5, 0, 2, 5, 2, 5, 1, 2,
1, 5, 2, 7, 5, 2, 0]{quote}
the `countNodes(block)` get
{quote}LIVE=8, READONLY=0, DECOMMISSIONING=7, DECOMMISSIONED=0,
MAINTENANCE_NOT_FOR_READ=0, MAINTENANCE_FOR_READ=0, CORRUPT=0, EXCESS=0,
STALESTORAGE=0, REDUNDANT=16{quote}
so we need to replicate block 8, but there is no racks anymore.
Now, I have a doubt why replicate some block more than once other than
replicate the block 8 ?
was (Author: marvelrock):
I find a clue:
the `chooseSourceDatanodes` get
{quote}LIVE=2, READONLY=0, DECOMMISSIONING=7, DECOMMISSIONED=0,
MAINTENANCE_NOT_FOR_READ=0, MAINTENANCE_FOR_READ=0, CORRUPT=0, EXCESS=0,
STALESTORAGE=0, REDUNDANT=22{quote}
and all block index (0-8) exists, and three blocks 3/4/8 have no redundant
block, and the datanode where block 8 stored is in DECOMMISSIONING, other two
datanode adminState is null.
the `countNodes(block)` get
{quote}LIVE=8, READONLY=0, DECOMMISSIONING=7, DECOMMISSIONED=0,
MAINTENANCE_NOT_FOR_READ=0, MAINTENANCE_FOR_READ=0, CORRUPT=0, EXCESS=0,
STALESTORAGE=0, REDUNDANT=16{quote}
so we need to replicate block 8, but there is no racks anymore.
Now, I have a doubt why replicate some block more than once other than
replicate the block 8 ?
> Erasure Coding: replicate block infinitely when datanode being decommissioning
> ------------------------------------------------------------------------------
>
> Key: HDFS-14849
> URL: https://issues.apache.org/jira/browse/HDFS-14849
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.3.0
> Reporter: HuangTao
> Assignee: HuangTao
> Priority: Major
> Labels: EC, HDFS, NameNode
> Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch,
> fsck-file.png, liveBlockIndices.png, scheduleReconstruction.png
>
>
> When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC block in
> that datanode will be replicated infinitely.
> // added 2019/09/19
> I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes
> simultaneously.
> !scheduleReconstruction.png!
> !fsck-file.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]