> On Sept. 22, 2015, 10:31 p.m., Sumit Mohanty wrote: > > ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/alerts/alert_datanode_unmounted_data_dir.py, > > line 77 > > <https://reviews.apache.org/r/38651/diff/1/?file=1081590#file1081590line77> > > > > Will it result in an alert after Ambari upgrade? Not sure if requiring > > DN restart to get rid of an alert is a good idea?
No upgrade is needed to pickup added alert definitions in Ambari 2.1; ambari-server actually loads them from the json file on start. It checks if the history file exits, if the data dirs exist, and if it's possible for the data dirs to have become unmounted. One way to fix the missing history file or missing data dir is to restart DN, but that's not necessarily required. - Alejandro ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38651/#review100084 ----------------------------------------------------------- On Sept. 22, 2015, 10:17 p.m., Alejandro Fernandez wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/38651/ > ----------------------------------------------------------- > > (Updated Sept. 22, 2015, 10:17 p.m.) > > > Review request for Ambari, Andrew Onischuk, Dmitro Lisnichenko, Jonathan > Hurley, Nate Cole, Sumit Mohanty, Srimanth Gunturi, and Sid Wagle. > > > Bugs: AMBARI-13194 > https://issues.apache.org/jira/browse/AMBARI-13194 > > > Repository: ambari > > > Description > ------- > > Ambari uses the dfs.datanode.data.dir.mount.file property in HDFS, whose > value is typically /etc/hadoop/conf/dfs_data_dir_mount.hist > to track the mount points for each of the data dirs. > > E.g., > {code} > /hadoop01/data,/device1 > /hadoop02/data,/device2 > /hadoop03/data,/ # this one is on root, the others are all on mount > points. > {code} > > Whenever a drive becomes unmounted, Ambari detects that it was previously on > a mount and will not create that data dir; HDFS can still tolerate the > failure if dfs.datanode.failed.volumes.tolerated is greater than 0. > Now, if the /etc/hadoop/conf/dfs_data_dir_mount.hist file is deleted, then > Ambari won't have this knowledge, and will create the datadir (even if it's > on the root partition). > > To improve tracking, create an alert definition that checks the following > * warning status if the /etc/hadoop/conf/dfs_data_dir_mount.hist file is > deleted > * critical status if at least one of the data dirs is mounted on the root > partition, and at least one data dir is on a mount > > > Diffs > ----- > > ambari-common/src/main/python/resource_management/core/providers/system.py > 213adc5 > > ambari-common/src/main/python/resource_management/libraries/functions/dfs_datanode_helper.py > a05e162 > ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json > 477fd95 > > ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/alerts/alert_datanode_unmounted_data_dir.py > PRE-CREATION > > ambari-server/src/test/python/stacks/2.0.6/HDFS/test_alert_datanode_unmounted_data_dir.py > PRE-CREATION > > Diff: https://reviews.apache.org/r/38651/diff/ > > > Testing > ------- > > * Python unit tests passed > * Verified that the alert worked on several hosts for all 3 types of statuses > (WARNING, CRITICAL, OK) > * Also checked that it did not run on a host without DataNode, and it did run > once I added DataNode to that host > > > Thanks, > > Alejandro Fernandez > >
