[jira] [Created] (AMBARI-13974) Retreiving Failed Service Checks Takes Too Long On Large Clusters

Jonathan Hurley (JIRA) Thu, 19 Nov 2015 09:52:59 -0800

Jonathan Hurley created AMBARI-13974:
----------------------------------------


             Summary: Retreiving Failed Service Checks Takes Too Long On Large 
Clusters
                 Key: AMBARI-13974
                 URL: https://issues.apache.org/jira/browse/AMBARI-13974
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.0.0
            Reporter: Jonathan Hurley
            Assignee: Jonathan Hurley
            Priority: Critical
             Fix For: 2.1.3


*STR:*
* Launch Rolling Upgrade on big cluster (500+ node)
* Proceed to Finalize step

*Actual Result:*
Call: 

{code}
/api/v1/clusters/c500/upgrades/69/upgrade_groups?upgrade_items/UpgradeItem/status=COMPLETED&upgrade_items/tasks/Tasks/status.in(FAILED,ABORTED,TIMEDOUT)&upgrade_items/tasks/Tasks/command=SERVICE_CHECK&fields=upgrade_items/tasks/Tasks/command_detail,upgrade_items/tasks/Tasks/status&minimal_response=true
{code}

This call fails due to timeout. No failed Service Checks shown to user.

The root of the problem is how the REST API handles subqueries. For every group 
that matches, it will attempt to retrieve every stage and every task and then 
produce a slice of results from in-memory comparison.

This should really go through the JPA layer since it's simple comparisons on DB 
fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (AMBARI-13974) Retreiving Failed Service Checks Takes Too Long On Large Clusters

Reply via email to