[
https://issues.apache.org/jira/browse/MESOS-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762303#comment-15762303
]
Neil Conway commented on MESOS-6231:
------------------------------------
Same issue happens with the agent.
> Scheduler driver metrics can hang Metrics() in tests
> ----------------------------------------------------
>
> Key: MESOS-6231
> URL: https://issues.apache.org/jira/browse/MESOS-6231
> Project: Mesos
> Issue Type: Bug
> Components: tests
> Reporter: Neil Conway
> Labels: mesosphere
>
> * {{SchedulerProcess}} has a field, {{metrics}}, whose constructor registers
> two metrics, {{event_queue_messages}} and {{event_queue_dispatches}}.
> * These metrics are implemented by {{defer}}'ing a message to
> {{SchedulerProcess}}.
> * If {{MesosSchedulerDriver}} is started and then stopped (but not
> destructed), {{SchedulerProcess}} is terminated but not destroyed.
> Hence, if a scheduler driver is started and then stopped, fetching the metric
> will hang. This means a test case that fetches {{Metrics()}} after stopping a
> scheduler driver will hang.
> For example, the following patch will hang
> {{SlaveTest.MetricsSlaveLaunchErrors}}.
> {noformat}
> diff --git a/src/tests/slave_tests.cpp b/src/tests/slave_tests.cpp
> index 3471314..f323bb9 100644
> --- a/src/tests/slave_tests.cpp
> +++ b/src/tests/slave_tests.cpp
> @@ -1408,12 +1408,12 @@ TEST_F(SlaveTest, MetricsSlaveLaunchErrors)
> AWAIT_READY(failureUpdate);
> ASSERT_EQ(TASK_FAILED, failureUpdate.get().state());
> + driver.stop();
> + driver.join();
> +
> // After failure injection, metrics should report a single failure.
> snapshot = Metrics();
> EXPECT_EQ(1, snapshot.values["slave/container_launch_errors"]);
> -
> - driver.stop();
> - driver.join();
> }
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)