[ 
https://issues.apache.org/jira/browse/AMBARI-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hurley updated AMBARI-10456:
-------------------------------------
    Description: 
When mapping hosts concurrently with reading information from a cluster, there 
was a deadlock between the building the cluster health report and mapping the 
new hosts. 

A few changes to note here:

- ClustersImpl uses concurrent maps; there's really no need to keep the 
internal lock. I removed it in several places where the cluster is guaranteed 
to be available (such as when using the ID to retrieve the cluster). The 
concurrent maps guard against concurrent modifications.

- The Ambari Event Publisher was actually synchronous. This not only caused 
bottlenecks, but also contributed to a secondary deadlock detected while fixing 
the original issue. It was changed into a single-threaded asynchronous bus. 
Consumers of this bus should never rely on it to perform its actions in order 
to perform their own logic, so changing the behavior seemed correct

  was:When mapping hosts concurrently while getting clusters, there's a 
deadlock that can occur between {{ClustersImpl}} and {{ClusterImpl}}.


> Ambari Server Deadlock When Mapping Hosts
> -----------------------------------------
>
>                 Key: AMBARI-10456
>                 URL: https://issues.apache.org/jira/browse/AMBARI-10456
>             Project: Ambari
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.1.0
>
>         Attachments: dump.txt
>
>
> When mapping hosts concurrently with reading information from a cluster, 
> there was a deadlock between the building the cluster health report and 
> mapping the new hosts. 
> A few changes to note here:
> - ClustersImpl uses concurrent maps; there's really no need to keep the 
> internal lock. I removed it in several places where the cluster is guaranteed 
> to be available (such as when using the ID to retrieve the cluster). The 
> concurrent maps guard against concurrent modifications.
> - The Ambari Event Publisher was actually synchronous. This not only caused 
> bottlenecks, but also contributed to a secondary deadlock detected while 
> fixing the original issue. It was changed into a single-threaded asynchronous 
> bus. Consumers of this bus should never rely on it to perform its actions in 
> order to perform their own logic, so changing the behavior seemed correct



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to