[ 
https://issues.apache.org/jira/browse/MESOS-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671820#comment-16671820
 ] 

Till Toenshoff commented on MESOS-9366:
---------------------------------------

Can we consider back porting the fix, given that it appears to be rather 
straight forward and risk free?

> Test `HealthCheckTest.HealthyTaskNonShell` can hang.
> ----------------------------------------------------
>
>                 Key: MESOS-9366
>                 URL: https://issues.apache.org/jira/browse/MESOS-9366
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1.5.0, 1.6.0, 1.7.0
>            Reporter: Chun-Hung Hsiao
>            Assignee: Chun-Hung Hsiao
>            Priority: Major
>              Labels: flaky-test
>
> In {{HealthCheckTest.HealthyTaskNonShell}} the {{statusRunning}} future is 
> incorrectly checked before being waited:
> [https://github.com/apache/mesos/blob/d8062f231b9f27889b7cae7a42eef49e4eed79ec/src/tests/health_check_tests.cpp#L673]
> As a result, if for some arbitrary reason there is only one task status 
> update sent (e.g., {{TASK_FAILED}}), {{statusRunning->state()}} will make the 
> test hang forever:
> {noformat}
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at 
> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1 0x00007fc1d9a9991c in 
> std::condition_variable::wait(std::unique_lock<std::mutex>&) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #2 0x00005652770d1950 in synchronized_wait<std::condition_variable, 
> std::mutex> () at ../../3rdparty/stout/include/stout/synchronized.hpp:201
> #3 0x00007fc1e76ba909 in process::Gate::wait () at 
> ../../../3rdparty/libprocess/src/gate.hpp:50
> #4 0x00007fc1e768c01d in process::ProcessManager::wait () at 
> ../../../3rdparty/libprocess/src/process.cpp:3232
> #5 0x00007fc1e76917fd in process::wait () at 
> ../../../3rdparty/libprocess/src/process.cpp:3973
> #6 0x00007fc1e75ebf11 in process::Latch::await () at 
> ../../../3rdparty/libprocess/src/latch.cpp:63
> #7 0x0000565275431ff6 in process::Future<mesos::TaskStatus>::await () at 
> ../../3rdparty/libprocess/include/process/future.hpp:1289
> #8 0x0000565275441825 in process::Future<mesos::TaskStatus>::get () at 
> ../../3rdparty/libprocess/include/process/future.hpp:1301
> #9 0x0000565275432198 in process::Future<mesos::TaskStatus>::operator-> () at 
> ../../3rdparty/libprocess/include/process/future.hpp:1319
> #10 0x0000565275db5ef1 in 
> mesos::internal::tests::HealthCheckTest_HealthyTaskNonShell_Test::TestBody () 
> at ../../src/tests/health_check_tests.cpp:682
> #11 0x000056527717296b in 
> testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, 
> void> () at googletest-release-1.8.0/googletest/src/gtest.cc:2402
> #12 0x000056527716ca6b in 
> testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> 
> () at googletest-release-1.8.0/googletest/src/gtest.cc:2438
> #13 0x0000565277149b82 in testing::Test::Run () at 
> googletest-release-1.8.0/googletest/src/gtest.cc:2475
> #14 0x000056527714a4a8 in testing::TestInfo::Run () at 
> googletest-release-1.8.0/googletest/src/gtest.cc:2656
> #15 0x000056527714ab45 in testing::TestCase::Run () at 
> googletest-release-1.8.0/googletest/src/gtest.cc:2774
> #16 0x0000565277151d3e in testing::internal::UnitTestImpl::RunAllTests () at 
> googletest-release-1.8.0/googletest/src/gtest.cc:4649
> #17 0x0000565277173703 in 
> testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,
>  bool> () at googletest-release-1.8.0/googletest/src/gtest.cc:2402
> #18 0x000056527716d69d in 
> testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,
>  bool> () at googletest-release-1.8.0/googletest/src/gtest.cc:2438
> #19 0x00005652771508da in testing::UnitTest::Run () at 
> googletest-release-1.8.0/googletest/src/gtest.cc:4257
> #20 0x0000565276034020 in RUN_ALL_TESTS () at 
> ../3rdparty/googletest-release-1.8.0/googletest/include/gtest/gtest.h:2233
> #21 0x0000565276033ab7 in main () at ../../src/tests/main.cpp:168{noformat}
> (The line number above are not correct because of additional logging I added 
> to triage this error.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to