[ 
https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Onischuk updated AMBARI-25604:
-------------------------------------
    Description: 
During blueprint deploy we don't rely on topology cache since AMBARI-23660
So correct topology is send with
the command, however the topology from the topology event can be wrong as per 
AMBARI-23660. 

The problem occurs when we still try to process broken topology from the event 
on agent. Agent need to handle this failure with a warning. Currently it just 
fails the whole command.

{code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
Caught an exception while executing custom service command: <type 
'exceptions.KeyError'>: 10; 10
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 324, in runCommand
    command = self.generate_command(command_header)
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 507, in generate_command
    command_dict = self.configuration_builder.get_configuration(cluster_id, 
service_name, component_name, required_config_timestamp)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
43, in get_configuration
    'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
  File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
newFunction
    return f(*args, **kw)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
112, in get_cluster_host_info
    hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in 
component_dict.hostIds]
KeyError: 10{code}



  was:
During blueprint deploy we don't rely on topology cache since AMBARI-23660
So correct topology is send with
the command, however the topology from the topology event can be wrong as per 
AMBARI-23660. 

The problem occurs when we still try to process broken topology from the event 
on agent. Agent need to handle this failure with a warning.

{code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
Caught an exception while executing custom service command: <type 
'exceptions.KeyError'>: 10; 10
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 324, in runCommand
    command = self.generate_command(command_header)
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 507, in generate_command
    command_dict = self.configuration_builder.get_configuration(cluster_id, 
service_name, component_name, required_config_timestamp)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
43, in get_configuration
    'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
  File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
newFunction
    return f(*args, **kw)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
112, in get_cluster_host_info
    hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in 
component_dict.hostIds]
KeyError: 10{code}




> During blueprint deploy tasks sometimes fail due to KeyError on large clusters
> ------------------------------------------------------------------------------
>
>                 Key: AMBARI-25604
>                 URL: https://issues.apache.org/jira/browse/AMBARI-25604
>             Project: Ambari
>          Issue Type: Bug
>            Reporter: Andrew Onischuk
>            Assignee: Andrew Onischuk
>            Priority: Major
>             Fix For: 2.7.6
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> During blueprint deploy we don't rely on topology cache since AMBARI-23660
> So correct topology is send with
> the command, however the topology from the topology event can be wrong as per 
> AMBARI-23660. 
> The problem occurs when we still try to process broken topology from the 
> event on agent. Agent need to handle this failure with a warning. Currently 
> it just fails the whole command.
> {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
> Caught an exception while executing custom service command: <type 
> 'exceptions.KeyError'>: 10; 10
> Traceback (most recent call last):
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 324, in runCommand
>     command = self.generate_command(command_header)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 507, in generate_command
>     command_dict = self.configuration_builder.get_configuration(cluster_id, 
> service_name, component_name, required_config_timestamp)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
> 43, in get_configuration
>     'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
>   File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
> newFunction
>     return f(*args, **kw)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
> 112, in get_cluster_host_info
>     hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id 
> in component_dict.hostIds]
> KeyError: 10{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to