[
https://issues.apache.org/jira/browse/IMPALA-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464500#comment-16464500
]
Tim Armstrong commented on IMPALA-6227:
---------------------------------------
It looks a bit like the "admitted" and "dequeued" metrics were temporarily out
of sync when we took the snapshot of the metrics:
{noformat}
MainThread: wait_for_metric_changes, current=admitted=10, queued=20,
dequeued=0, rejected=20, released=0, timed-out=0
...
MainThread: Main loop, curr_metrics: admitted=18, queued=20, dequeued=8,
rejected=20, released=10, timed-out=0
...
MainThread: wait_for_metric_changes, current=admitted=24, queued=20,
dequeued=14, rejected=20, released=18, timed-out=0
...
MainThread: wait_for_metric_changes, initial=admitted=26, queued=20,
dequeued=14, rejected=20, released=18, timed-out=0
...
MainThread: DeltaSum=4 Deltas={'dequeued': 6, 'admitted': 4, 'released': 6,
'rejected': 0, 'queued': 0, 'timed-out': 0} (Expected=5 for
metrics=['admitted', 'timed-out'])
{noformat}
I looked at MetricsGroup and it looks like we iterate over the metrics map in
sorted order of key, so we could have taken the snapshot of the "admitted"
value before the "queued" value if we got unlucky.
{code}
/// Contains all Metric objects, indexed by key
typedef std::map<std::string, Metric*> MetricMap;
MetricMap metric_map_;
...
for (const MetricMap::value_type& m: metric_map_) {
Value metric_value;
m.second->ToJson(document, &metric_value);
metric_list.PushBack(metric_value, document->GetAllocator());
}
{code}
> TestAdmissionControllerStress can be flaky
> ------------------------------------------
>
> Key: IMPALA-6227
> URL: https://issues.apache.org/jira/browse/IMPALA-6227
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.11.0
> Reporter: Csaba Ringhofer
> Assignee: Tim Armstrong
> Priority: Critical
> Labels: flaky
> Attachments: TEST-impala-custom-cluster.xml
>
>
> jenkins build https://jenkins.impala.io/job/gerrit-verify-dryrun/1503/console
> failed at the following test:
> {noformat}
> 01:30:11 ] =================================== FAILURES
> ===================================
> 01:30:11 ] TestAdmissionControllerStress.test_mem_limit[num_queries: 30 |
> submission_delay_ms: 0 | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> text/none | round_robin_submission: True]
> 01:30:11 ] custom_cluster/test_admission_controller.py:877: in test_mem_limit
> 01:30:11 ] {'request_pool': self.pool_name, 'mem_limit': query_mem_limit})
> 01:30:11 ] custom_cluster/test_admission_controller.py:760: in
> run_admission_test
> 01:30:11 ] assert metric_deltas['rejected'] ==\
> 01:30:11 ] E assert 5 == ((30 - 15) - 15)
> {noformat}
> This is probably related to the following recent commit:
> https://github.com/apache/incubator-impala/commit/7487c5de04c2c5d97b8a8d5c935d10568f1ed686
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]