[ 
https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796425#comment-15796425
 ] 

Uma Maheswara Rao G edited comment on HDFS-11284 at 1/3/17 10:46 PM:
---------------------------------------------------------------------

HI [~yuanbo], Retry will not happen in DN itself. 
When DN send movement result as failure, NN will take care to retry(HDFS-11029) 
. That time, NN find all existing blocks satisfied, then that items will be 
ignore to send for movement. If still need satisfaction, then it will send by 
finding new src,targets again. The default time for retry was 30mins. (Higher 
timeout made because, some times DN itself takes longer time to send results 
back due to low process nodes, then NN unnecessarily go for retry. This can be 
refined more on testing)
Hope this helps you understand better.

{quote}
Agree, I will go back to HDFS-11150. Since #2 has been addressed, the last 
issue seems belong to retry mechanism. I'm thinking about removing/changing 
this JIRA.
{quote}
Please keep this JIRA open, until you agree on the reason.
Can you confirm one point from your logs that whether the block was deleted due 
to over replication and used the same node for movement(as movement was 
scheduled before)? If thats the case, behavior should be fine. Also can you 
confirm remaining block movements were successful (by looking at logs)?
Any way, go ahead with HDFS-11150 please. There were some test failure related 
to that, can you please check?

Thanks a lot for putting efforts. 

 


was (Author: umamaheswararao):
HI [~yuanbo], Retry will not happen in DN itself. 
When DN send movement result as failure, NN will take care to retry. That time, 
NN find all existing blocks satisfied, then that items will be ignore to send 
for movement. If still need satisfaction, then it will send by finding new 
src,targets again. The default time for retry was 30mins. (Higher timeout made 
because, some times DN itself takes longer time to send results back due to low 
process nodes, then NN unnecessarily go for retry. This can be refined more on 
testing)
Hope this helps you understand better.

{quote}
Agree, I will go back to HDFS-11150. Since #2 has been addressed, the last 
issue seems belong to retry mechanism. I'm thinking about removing/changing 
this JIRA.
{quote}
Please keep this JIRA open, until you agree on the reason.
Can you confirm one point from your logs that whether the block was deleted due 
to over replication and used the same node for movement(as movement was 
scheduled before)? If thats the case, behavior should be fine. Also can you 
confirm remaining block movements were successful (by looking at logs)?
Any way, go ahead with HDFS-11150 please. There were some test failure related 
to that, can you please check?

Thanks a lot for putting efforts. 

 

> [SPS]: Avoid running SPS under safemode and fix issues in target node 
> choosing.
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-11284
>                 URL: https://issues.apache.org/jira/browse/HDFS-11284
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>            Reporter: Yuanbo Liu
>            Assignee: Yuanbo Liu
>         Attachments: TestSatisfier.java
>
>
> Recently I've found in some conditions, SPS is not stable:
> * SPS runs under safe mode.
> * There're some overlap nodes in the chosen target nodes.
> * The real replication number of block doesn't match the replication factor. 
> For example, the real replication is 2 while the replication factor is 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to