[ 
https://issues.apache.org/jira/browse/HDFS-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15696466#comment-15696466
 ] 

Rakesh R commented on HDFS-11164:
---------------------------------

Thanks [~umamaheswararao] for the comments. 

bq. One thought is, Failed blocks due to pinned can be stored separately and 
when retry happens and if blocks exist in failedDueToPinned list, then skip to 
add them into PendingMoves? Just a thought, we need to check the feasibility.
Yes, we can maintain a datastructure to hold the {{blockId vs 
pinnedBlkLocations}}, there could be multiple favored/pinned nodes for a 
"blockId". Let me try implementing this excluded logic. In worst case, I could 
see a case of millions of pinned block entries and this data structure will 
consume more memory throughout the lifetime of Mover tool. Do we need to 
introduce any configuration parameter to limit the size of this data structure.
{code}
int LIMIT = // configured value, default to 10K.
Map<Long, Set<DatanodeInfo>> excludedPinnedBlocks = new HashMap<>(LIMIT);
{code}

> Mover should avoid unnecessary retries if the block is pinned
> -------------------------------------------------------------
>
>                 Key: HDFS-11164
>                 URL: https://issues.apache.org/jira/browse/HDFS-11164
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer & mover
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>         Attachments: HDFS-11164-00.patch, HDFS-11164-01.patch
>
>
> When mover is trying to move a pinned block to another datanode, it will 
> internally hits the following IOException and mark the block movement as 
> {{failure}}. Since the Mover has {{dfs.mover.retry.max.attempts}} configs, it 
> will continue moving this block until it reaches {{retryMaxAttempts}}. If the 
> block movement failure(s) are only due to block pinning, then retry is 
> unnecessary. The idea of this jira is to avoid retry attempts of pinned 
> blocks as they won't be able to move to a different node. 
> {code}
> 2016-11-22 10:56:10,537 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: Failed to move 
> blk_1073741825_1001 with size=52 from 127.0.0.1:19501:DISK to 
> 127.0.0.1:19758:ARCHIVE through 127.0.0.1:19501
> java.io.IOException: Got error, status=ERROR, status message opReplaceBlock 
> BP-1772076264-10.252.146.200-1479792322960:blk_1073741825_1001 received 
> exception java.io.IOException: Got error, status=ERROR, status message Not 
> able to copy block 1073741825 to /127.0.0.1:19826 because it's pinned , copy 
> block BP-1772076264-10.252.146.200-1479792322960:blk_1073741825_1001 from 
> /127.0.0.1:19501, reportedBlock move is failed
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:118)
>       at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:417)
>       at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:358)
>       at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$5(Dispatcher.java:322)
>       at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:1075)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to