[
https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968588#comment-14968588
]
Walter Su commented on HDFS-9275:
---------------------------------
||DN0||DN1||DN2||DN3||DN4||DN5||DN6||DN7||DN8||DN9||DN10||DN11
| |*|*|*|*|*|*|*|*|*| | | <-- BlockGroup_0
| | |*|*|*|*|*|*|*|*|*| | <-- BlockGroup_1
The test case only tests last block group. Suppose DN8~10 are shutdown.
ReplicationMonitor will schedule a recovery. Firstly need to call
BlockPlacementPolicy to choose targets. DN2~DN10 are excluded because they
already have internal blocks on them. To recover 3 blocks, it must choose DN0,
DN1, DN11.
But DN1 has a block belonging to BlockGroup_0. The last time DN1 sent a
heartbeat, it reported its {{xceiverCount}} is 3. {{xceiverCount}} is equals to
the active thread in DataNode.threadGroup, as show below.
{noformat}
DatanodeRegistration(127.0.0.1:47705,
datanodeUuid=43e5be32-2066-4057-9b25-8544d2d542bc, infoPort=43445,
infoSecurePort=0, ipcPort=34036,
storageInfo=lv=-56;cid=testClusterID;nsid=23260287;c=1445489667626)
java.lang.ThreadGroup[name=dataXceiverServer,maxpri=10]
Thread[org.apache.hadoop.hdfs.server.datanode.DataXceiverServer@6aa03871,5,dataXceiverServer]
Thread[DataXceiver for client DFSClient_NONMAPREDUCE_-1867405584_1 at
/127.0.0.1:56717 [Receiving block
BP-1612020377-9.96.1.34-1445489667626:blk_-9223372036854775791_1001],5,dataXceiverServer]
Thread[PacketResponder:
BP-1612020377-9.96.1.34-1445489667626:blk_-9223372036854775791_1001,
type=LAST_IN_PIPELINE, downstreams=0:[],5,dataXceiverServer]
{noformat}
{{xceiverCount}} equals to 3 is lager than average number, so DN1 is excluded
by {{chooseRandom()}}. Then BlockGroup_1 can only recover 2 blocks. As
discussed
[here|https://issues.apache.org/jira/browse/HDFS-8220?focusedCommentId=14518931&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14518931]
, now temporarily PlacementPolicy doesn't support return two identical
storages, aka, no 2 replicas(internal blocks) in the same storage.
We could simply add more DNs to fix the test. Or we can set
{{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY}} to false in the test case.
The 02 patch includes some clean up. Kindly review. Thanks.
> Fix TestRecoverStripedFile
> --------------------------
>
> Key: HDFS-9275
> URL: https://issues.apache.org/jira/browse/HDFS-9275
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: test
> Reporter: Walter Su
> Assignee: Walter Su
> Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)