Jonathan Hurley created AMBARI-13974:
----------------------------------------
Summary: Retreiving Failed Service Checks Takes Too Long On Large
Clusters
Key: AMBARI-13974
URL: https://issues.apache.org/jira/browse/AMBARI-13974
Project: Ambari
Issue Type: Bug
Components: ambari-server
Affects Versions: 2.0.0
Reporter: Jonathan Hurley
Assignee: Jonathan Hurley
Priority: Critical
Fix For: 2.1.3
*STR:*
* Launch Rolling Upgrade on big cluster (500+ node)
* Proceed to Finalize step
*Actual Result:*
Call:
{code}
/api/v1/clusters/c500/upgrades/69/upgrade_groups?upgrade_items/UpgradeItem/status=COMPLETED&upgrade_items/tasks/Tasks/status.in(FAILED,ABORTED,TIMEDOUT)&upgrade_items/tasks/Tasks/command=SERVICE_CHECK&fields=upgrade_items/tasks/Tasks/command_detail,upgrade_items/tasks/Tasks/status&minimal_response=true
{code}
This call fails due to timeout. No failed Service Checks shown to user.
The root of the problem is how the REST API handles subqueries. For every group
that matches, it will attempt to retrieve every stage and every task and then
produce a slice of results from in-memory comparison.
This should really go through the JPA layer since it's simple comparisons on DB
fields.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)