[ 
https://issues.apache.org/jira/browse/AMBARI-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hill resolved AMBARI-9902.
-------------------------------
    Resolution: Invalid

I needed to set the operation_level differently for this to work properly.

> Decommission DATANODE silently fails if in maintenance mode
> -----------------------------------------------------------
>
>                 Key: AMBARI-9902
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9902
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-agent
>    Affects Versions: 1.7.0
>            Reporter: Greg Hill
>
> If you set maintenance mode on multiple hosts, then attempt to decommission 
> the DATANODE on those hosts, it says that it succeeded but it did not 
> actually decommission any nodes in HDFS.  This can lead to data loss as the 
> customer might assume that it's safe to remove those hosts from the pool.
> The request looks like:
> {noformat}
>          "RequestInfo": {
>                 "command": "DECOMMISSION",
>                 "context": "Decommission DataNode”,
>                 "parameters": {"slave_type": “DATANODE", "excluded_hosts": 
> “slave-3.local,slave-1.local"},
>                 "operation_level": {
>                     “level”: “CLUSTER”,
>                     “cluster_name”: cluster_name
>                 },
>             },
>             "Requests/resource_filters": [{
>                 "service_name": “HDFS",
>                 "component_name": “NAMENODE",
>             }],
> {noformat}
> The task output appears to work:
> {noformat}
> File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': 
> Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
> Execute[''] {'user': 'hdfs'}
> ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': 
> '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 
> 'kinit_override': True, 'user': 'hdfs'}
> Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] 
> {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 
> 1, 'user': 'hdfs', 'try_sleep': 0}
> {noformat}
> But it didn't actually write any contents to the file.  If it had, this line 
> would have been in there:
> {noformat}
> Writing File['/etc/hadoop/conf/dfs.exclude'] because contents don't match
> {noformat}
> The command json file for the task has the right hosts list as a parameter:
> {noformat}
> "commandParams": {
>         "service_package_folder": "HDP/2.0.6/services/HDFS/package",
>         "update_exclude_file_only": "false",
>         "script": "scripts/namenode.py",
>         "hooks_folder": "HDP/2.0.6/hooks",
>         "excluded_hosts": "slave-3.local,slave-1.local",
>         "command_timeout": "600",
>         "slave_type": "DATANODE",
>         "script_type": "PYTHON"
>     },
> {noformat}
> So something is filtering the list external to that.
> If maintenance mode was not set, everything works as expected.  I don't 
> believe there's a legitimate reason to disallow decommissioning nodes in 
> maintenance mode, as that seems to be the expected course of action (set 
> maintenance, decommission, remove) for dealing with a problematic host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to