----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34947/#review86254 -----------------------------------------------------------
Ship it! Ship It! - Alejandro Fernandez On June 2, 2015, 3:36 p.m., Jonathan Hurley wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/34947/ > ----------------------------------------------------------- > > (Updated June 2, 2015, 3:36 p.m.) > > > Review request for Ambari, Alejandro Fernandez and Nate Cole. > > > Bugs: AMBARI-11624 > https://issues.apache.org/jira/browse/AMBARI-11624 > > > Repository: ambari > > > Description > ------- > > See HDFS-8510. > > During upgrade from HDP 2.2 to HDP 2.3, even if there are 4 DataNodes in the > cluster, HBase still goes down during the core slaves portion. This is > because the DataNode upgrade takes too long. The default properties in the > HDP stack for {{ipc.client.connect.retry.interval}} are greater than the 30 > second period in which the DataNode would be marked as dead. > > Notice that after the shutdown command, it takes 52 seconds for {{dfsadmin}} > to report that the DataNode is down: > > {noformat} > 2015-05-29 13:13:27,222 - hadoop-hdfs-datanode is currently at version > 2.2.7.0-2808 > 2015-05-29 13:13:27,306 - Execute['hdfs dfsadmin -shutdownDatanode > 0.0.0.0:8010 upgrade'] {'tries': 1, 'user': 'hdfs'} > 2015-05-29 13:13:29,003 - Execute['hdfs dfsadmin -getDatanodeInfo > 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'} > 2015-05-29 13:13:30,648 - DataNode has not shutdown. > 2015-05-29 13:13:40,655 - Execute['hdfs dfsadmin -getDatanodeInfo > 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'} > 2015-05-29 13:14:32,280 - DataNode has successfully shutdown for upgrade. > 2015-05-29 13:14:32,327 - Execute['hdp-select set hadoop-hdfs-datanode > 2.3.0.0-2162'] {} > ... > 2015-05-29 13:14:32,835 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c > 'ulimit -c unlimited ; /usr/hdp/2.3.0.0-2162/hadoop/sbin/hadoop-daemon.sh > --config /usr/hdp/2.3.0.0-2162/hadoop/conf start datanode''] {'environment': > {'HADOOP_LIBEXEC_DIR': '/usr/hdp/2.3.0.0-2162/hadoop/libexec'}, 'not_if': 'ls > /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid >/dev/null 2>&1 && ps -p `cat > /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid` >/dev/null 2>&1'} > 2015-05-29 13:14:36,954 - Executing DataNode Rolling Upgrade post-restart > 2015-05-29 13:14:36,957 - Checking that the DataNode has rejoined the cluster > after upgrade... > ... > 2015-05-29 13:14:40,281 - DataNode > jhurley-hdp22-ru-5.c.pramod-thangali.internal reports that it has rejoined > the cluster. > {noformat} > > As DataNodes are upgraded, we should be temporarily overriding the default > retry timeout values: > > {code} > dfsadmin -D ipc.client.connect.max.retries=5 -D > ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010 > {code} > > > Diffs > ----- > > > ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py > 529ca4438 > ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py a310bf4 > > Diff: https://reviews.apache.org/r/34947/diff/ > > > Testing > ------- > > ---------------------------------------------------------------------- > Total run:750 > Total errors:0 > Total failures:0 > OK > > > Thanks, > > Jonathan Hurley > >
