[jira] [Updated] (AMBARI-11624) Datanode Shutdown Retries During Upgrade Are Too Long

Jonathan Hurley (JIRA) Tue, 02 Jun 2015 10:09:17 -0700

     [ 
https://issues.apache.org/jira/browse/AMBARI-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jonathan Hurley updated AMBARI-11624:
-------------------------------------
    Attachment: AMBARI-11624.patch

> Datanode Shutdown Retries During Upgrade Are Too Long
> -----------------------------------------------------
>
>                 Key: AMBARI-11624
>                 URL: https://issues.apache.org/jira/browse/AMBARI-11624
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.1.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.1.0
>
>         Attachments: AMBARI-11624.patch
>
>
> See HDFS-8510.
> During upgrade from HDP 2.2 to HDP 2.3, even if there are 4 DataNodes in the 
> cluster, HBase still goes down during the core slaves portion. This is 
> because the DataNode upgrade takes too long. The default properties in the 
> HDP stack for {{ipc.client.connect.retry.interval}} are greater than the 30 
> second period in which the DataNode would be marked as dead.
> Notice that after the shutdown command, it takes 52 seconds for {{dfsadmin}} 
> to report that the DataNode is down:
> {noformat}
> 2015-05-29 13:13:27,222 - hadoop-hdfs-datanode is currently at version 
> 2.2.7.0-2808
> 2015-05-29 13:13:27,306 - Execute['hdfs dfsadmin -shutdownDatanode 
> 0.0.0.0:8010 upgrade'] {'tries': 1, 'user': 'hdfs'}
> 2015-05-29 13:13:29,003 - Execute['hdfs dfsadmin -getDatanodeInfo 
> 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2015-05-29 13:13:30,648 - DataNode has not shutdown.
> 2015-05-29 13:13:40,655 - Execute['hdfs dfsadmin -getDatanodeInfo 
> 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2015-05-29 13:14:32,280 - DataNode has successfully shutdown for upgrade.
> 2015-05-29 13:14:32,327 - Execute['hdp-select set hadoop-hdfs-datanode 
> 2.3.0.0-2162'] {}
> ...
> 2015-05-29 13:14:32,835 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 
> 'ulimit -c unlimited ;  /usr/hdp/2.3.0.0-2162/hadoop/sbin/hadoop-daemon.sh 
> --config /usr/hdp/2.3.0.0-2162/hadoop/conf start datanode''] {'environment': 
> {'HADOOP_LIBEXEC_DIR': '/usr/hdp/2.3.0.0-2162/hadoop/libexec'}, 'not_if': 'ls 
> /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid >/dev/null 2>&1 && ps -p `cat 
> /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid` >/dev/null 2>&1'}
> 2015-05-29 13:14:36,954 - Executing DataNode Rolling Upgrade post-restart
> 2015-05-29 13:14:36,957 - Checking that the DataNode has rejoined the cluster 
> after upgrade...
> ...
> 2015-05-29 13:14:40,281 - DataNode 
> jhurley-hdp22-ru-5.c.pramod-thangali.internal reports that it has rejoined 
> the cluster.
> {noformat}
> As DataNodes are upgraded, we should be temporarily overriding the default 
> retry timeout values:
> {code}
> dfsadmin -D ipc.client.connect.max.retries=5 -D 
> ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (AMBARI-11624) Datanode Shutdown Retries During Upgrade Are Too Long

Reply via email to