[
https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796425#comment-15796425
]
Uma Maheswara Rao G edited comment on HDFS-11284 at 1/3/17 10:46 PM:
---------------------------------------------------------------------
HI [~yuanbo], Retry will not happen in DN itself.
When DN send movement result as failure, NN will take care to retry(HDFS-11029)
. That time, NN find all existing blocks satisfied, then that items will be
ignore to send for movement. If still need satisfaction, then it will send by
finding new src,targets again. The default time for retry was 30mins. (Higher
timeout made because, some times DN itself takes longer time to send results
back due to low process nodes, then NN unnecessarily go for retry. This can be
refined more on testing)
Hope this helps you understand better.
{quote}
Agree, I will go back to HDFS-11150. Since #2 has been addressed, the last
issue seems belong to retry mechanism. I'm thinking about removing/changing
this JIRA.
{quote}
Please keep this JIRA open, until you agree on the reason.
Can you confirm one point from your logs that whether the block was deleted due
to over replication and used the same node for movement(as movement was
scheduled before)? If thats the case, behavior should be fine. Also can you
confirm remaining block movements were successful (by looking at logs)?
Any way, go ahead with HDFS-11150 please. There were some test failure related
to that, can you please check?
Thanks a lot for putting efforts.
was (Author: umamaheswararao):
HI [~yuanbo], Retry will not happen in DN itself.
When DN send movement result as failure, NN will take care to retry. That time,
NN find all existing blocks satisfied, then that items will be ignore to send
for movement. If still need satisfaction, then it will send by finding new
src,targets again. The default time for retry was 30mins. (Higher timeout made
because, some times DN itself takes longer time to send results back due to low
process nodes, then NN unnecessarily go for retry. This can be refined more on
testing)
Hope this helps you understand better.
{quote}
Agree, I will go back to HDFS-11150. Since #2 has been addressed, the last
issue seems belong to retry mechanism. I'm thinking about removing/changing
this JIRA.
{quote}
Please keep this JIRA open, until you agree on the reason.
Can you confirm one point from your logs that whether the block was deleted due
to over replication and used the same node for movement(as movement was
scheduled before)? If thats the case, behavior should be fine. Also can you
confirm remaining block movements were successful (by looking at logs)?
Any way, go ahead with HDFS-11150 please. There were some test failure related
to that, can you please check?
Thanks a lot for putting efforts.
> [SPS]: Avoid running SPS under safemode and fix issues in target node
> choosing.
> -------------------------------------------------------------------------------
>
> Key: HDFS-11284
> URL: https://issues.apache.org/jira/browse/HDFS-11284
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode, namenode
> Reporter: Yuanbo Liu
> Assignee: Yuanbo Liu
> Attachments: TestSatisfier.java
>
>
> Recently I've found in some conditions, SPS is not stable:
> * SPS runs under safe mode.
> * There're some overlap nodes in the chosen target nodes.
> * The real replication number of block doesn't match the replication factor.
> For example, the real replication is 2 while the replication factor is 3.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]