-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26065/
-----------------------------------------------------------
(Updated Sept. 30, 2014, 11:57 p.m.)
Review request for Ambari, Florian Barca, Jonathan Hurley, Mahadev Konar, Sid
Wagle, and Tom Beerbower.
Changes
-------
Unit and system testing are complete.
Bugs: AMBARI-7506
https://issues.apache.org/jira/browse/AMBARI-7506
Repository: ambari
Description
-------
When a drive fails and it is unmounted for service, if the data node process is
stopped/started using Ambari the dfs.data.dir path that was housed on that
drive is re-created, but this time on the / partition leading to out of disk
space issues and data being created on the wrong volume.
In this case we only want the Ambari Agent to create dfs.data.dir's during
installation, and not after as this makes drive replacements difficult.
Diffs (updated)
-----
ambari-agent/src/test/python/resource_management/TestFileSystem.py
PRE-CREATION
ambari-common/src/main/python/resource_management/core/logger.py e395bd7
ambari-common/src/main/python/resource_management/core/providers/mount.py
dc6d7d9
ambari-common/src/main/python/resource_management/libraries/functions/dfs_datanode_helper.py
PRE-CREATION
ambari-common/src/main/python/resource_management/libraries/functions/file_system.py
PRE-CREATION
ambari-server/src/main/resources/stacks/HDP/1.3.2/services/HDFS/configuration/hadoop-env.xml
5da6484
ambari-server/src/main/resources/stacks/HDP/1.3.2/services/HDFS/package/scripts/hdfs_datanode.py
2482f97
ambari-server/src/main/resources/stacks/HDP/1.3.2/services/HDFS/package/scripts/params.py
245ad92
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/configuration/hadoop-env.xml
b3935d7
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/package/scripts/hdfs_datanode.py
e38d9af
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/package/scripts/params.py
27cef20
ambari-server/src/test/python/stacks/1.3.2/configs/default.json c80723c
ambari-server/src/test/python/stacks/1.3.2/configs/secured.json 99e88b8
ambari-server/src/test/python/stacks/2.0.6/configs/default.json 4e00086
ambari-server/src/test/python/stacks/2.0.6/configs/secured.json d03be7a
ambari-web/app/data/HDP2/site_properties.js 9886d56
ambari-web/app/data/site_properties.js 0e6aa8e
Diff: https://reviews.apache.org/r/26065/diff/
Testing
-------
Created unit tests and simple end-to-end test on a sandbox VM.
Ran end-to-end tests on Google Compute Cloud with VMs that had an external
drive mounted.
1. Created a cluster with 2 VMs, and copied the changes python files.
2. To avoid having to copy the changed web files, instead saved the new
property by running,
/var/lib/ambari-server/resources/scripts/configs.sh set localhost dev
hadoop-env dfs.datanode.data.dir.mount.file
"/etc/hadoop/conf/dfs_data_dir_mount.hist"
and verified that the property appears in the API, e.g.,
http://162.216.150.229:8080/api/v1/clusters/dev/configurations?type=hadoop-env&tag=version1412115461978734672
3. Restarted HDFS on all agents
4. cat /etc/hadoop/conf/dfs_data_dir_mount.hist
correctly showed the HDFS data dir and its mount point,
# data_dir,mount_point
/grid/0/hadoop/hdfs/data,/grid/0
5. Then changed the HDFS data dir property from /grid/0/hadoop/hdfs/data to
/grid/1/hadoop/hdfs/data
which correctly showed it is mounted on root, and created the
/grid/1/hadoop/hdfs/data directory
6. Next, unmounted the drive, by first stopping HDFS and Zookeeper. Also ran,
cd /root
fuser -c /grid/0
lsof /grid/0
umount /grid/0
7. Restarted the HDFS services, and it resulted in an error as expected.
Fail: Execution of 'ulimit -c unlimited; su - hdfs -c 'export
HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec &&
/usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start
datanode'' returned 1. starting datanode, logging to
/var/log/hadoop/hdfs/hadoop-hdfs-datanode-alejandro-1.out
8. Next, incremented the "DataNode volumes failure toleration" property from 0
to 1 and restarted all of the Datanodes, which did not result in an error this
time.
Thanks,
Alejandro Fernandez