[
https://issues.apache.org/jira/browse/AMBARI-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Greg Hill resolved AMBARI-9902.
-------------------------------
Resolution: Invalid
I needed to set the operation_level differently for this to work properly.
> Decommission DATANODE silently fails if in maintenance mode
> -----------------------------------------------------------
>
> Key: AMBARI-9902
> URL: https://issues.apache.org/jira/browse/AMBARI-9902
> Project: Ambari
> Issue Type: Bug
> Components: ambari-agent
> Affects Versions: 1.7.0
> Reporter: Greg Hill
>
> If you set maintenance mode on multiple hosts, then attempt to decommission
> the DATANODE on those hosts, it says that it succeeded but it did not
> actually decommission any nodes in HDFS. This can lead to data loss as the
> customer might assume that it's safe to remove those hosts from the pool.
> The request looks like:
> {noformat}
> "RequestInfo": {
> "command": "DECOMMISSION",
> "context": "Decommission DataNode”,
> "parameters": {"slave_type": “DATANODE", "excluded_hosts":
> “slave-3.local,slave-1.local"},
> "operation_level": {
> “level”: “CLUSTER”,
> “cluster_name”: cluster_name
> },
> },
> "Requests/resource_filters": [{
> "service_name": “HDFS",
> "component_name": “NAMENODE",
> }],
> {noformat}
> The task output appears to work:
> {noformat}
> File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content':
> Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
> Execute[''] {'user': 'hdfs'}
> ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir':
> '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf',
> 'kinit_override': True, 'user': 'hdfs'}
> Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes']
> {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries':
> 1, 'user': 'hdfs', 'try_sleep': 0}
> {noformat}
> But it didn't actually write any contents to the file. If it had, this line
> would have been in there:
> {noformat}
> Writing File['/etc/hadoop/conf/dfs.exclude'] because contents don't match
> {noformat}
> The command json file for the task has the right hosts list as a parameter:
> {noformat}
> "commandParams": {
> "service_package_folder": "HDP/2.0.6/services/HDFS/package",
> "update_exclude_file_only": "false",
> "script": "scripts/namenode.py",
> "hooks_folder": "HDP/2.0.6/hooks",
> "excluded_hosts": "slave-3.local,slave-1.local",
> "command_timeout": "600",
> "slave_type": "DATANODE",
> "script_type": "PYTHON"
> },
> {noformat}
> So something is filtering the list external to that.
> If maintenance mode was not set, everything works as expected. I don't
> believe there's a legitimate reason to disallow decommissioning nodes in
> maintenance mode, as that seems to be the expected course of action (set
> maintenance, decommission, remove) for dealing with a problematic host.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)