Jonathan Hurley created AMBARI-12657:
----------------------------------------
Summary: Cluster creates fail on larger deployments with SQL Azure
DB
Key: AMBARI-12657
URL: https://issues.apache.org/jira/browse/AMBARI-12657
Project: Ambari
Issue Type: Bug
Components: ambari-server
Affects Versions: 2.0.0
Reporter: Jonathan Hurley
Assignee: Jonathan Hurley
Priority: Critical
Fix For: 2.1.1
We started doing larger cluster creates (48 workernodes) with SQL Azure DB as
an Ambari DB, and we are seeing below HTTP GET requests timeout on the client
side (even after retries), resulting in cluster create failures (15%). This is
a tracking Jira to resolve the CRUD failures.
What I’m seeing is that DB CPU usage goes above 50% in some of my experiments
for 48 node clusters. This might explain why SQL is running slow.
end_time avg_cpu_percent avg_data_io_percent
avg_log_write_percent avg_memory_usage_percent
2015-08-05 18:51:24.153 40.89 0.00 0.62 0.67
2015-08-05 18:51:09.107 41.86 0.00 1.49 0.67
2015-08-05 18:50:54.090 24.36 0.00 0.08 0.67
2015-08-05 18:50:38.763 43.16 0.00 0.57 0.67
2015-08-05 18:50:23.700 65.03 0.00 0.51 0.67
2015-08-05 18:50:07.840 28.57 0.00 0.45 0.67
2015-08-05 18:49:49.480 39.78 0.00 0.42 0.67
2015-08-05 18:49:34.383 28.14 0.00 0.43 0.67
Most expensive queries in terms of CPU time are below.
Basically, it’s this one query which consumes most of the CPU. Query plan is
also attached.
{code}
SELECT DISTINCT t0.request_id FROM host_role_command t0 WHERE NOT EXISTS
(SELECT @P0 FROM host_role_command t1 WHERE (t1.status IN
(@P1,@P2,@P3,@P4,@P5,@P6,@P7,@P8,@P9))) ORDER BY t0.request_id ASC
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)