[
https://issues.apache.org/jira/browse/MESOS-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502871#comment-14502871
]
Till Toenshoff commented on MESOS-1303:
---------------------------------------
I have been looking into {{ExamplesTest.TestFramework}} over the weekend --
interesting issue indeed. The problem is a failure to rename the file the slave
checkpointed its boot-id to;
{noformat}
F0420 14:23:56.446990 99655680 slave.cpp:3816]
CHECK_SOME(state::checkpoint(path, bootId.get())): Failed to rename
'/var/folders/jk/791tgt39495cz8kqz5kml7p00000gn/T/mesos-XXXXXX.iqYIUYHp/1/meta/1OVUvX'
to
'/var/folders/jk/791tgt39495cz8kqz5kml7p00000gn/T/mesos-XXXXXX.iqYIUYHp/0/meta/boot_id':
No such file or directory
{noformat}
Enabling {{verbose}} mode on the test will enable the slave logging to be
propagated all the way to the test-framework output, revealing the above.
The file in question ({{1OVUvX}}) does in fact exist. Right now I am suspecting
a file descriptor leakage causing this but have no hard evidence, so far. The
hit-rate of this failure on my OSX machines is very high (> 50%).
> ExamplesTest.{TestFramework, NoExecutorFramework} flaky
> -------------------------------------------------------
>
> Key: MESOS-1303
> URL: https://issues.apache.org/jira/browse/MESOS-1303
> Project: Mesos
> Issue Type: Bug
> Components: test
> Reporter: Ian Downes
> Labels: flaky
>
> I'm having trouble reproducing this but I did observe it once on my OSX
> system:
> {noformat}
> [==========] Running 2 tests from 1 test case.
> [----------] Global test environment set-up.
> [----------] 2 tests from ExamplesTest
> [ RUN ] ExamplesTest.TestFramework
> ../../src/tests/script.cpp:81: Failure
> Failed
> test_framework_test.sh terminated with signal 'Abort trap: 6'
> [ FAILED ] ExamplesTest.TestFramework (953 ms)
> [ RUN ] ExamplesTest.NoExecutorFramework
> [ OK ] ExamplesTest.NoExecutorFramework (10162 ms)
> [----------] 2 tests from ExamplesTest (11115 ms total)
> [----------] Global test environment tear-down
> [==========] 2 tests from 1 test case ran. (11121 ms total)
> [ PASSED ] 1 test.
> [ FAILED ] 1 test, listed below:
> [ FAILED ] ExamplesTest.TestFramework
> {noformat}
> when investigating a failed make check for https://reviews.apache.org/r/20971/
> {noformat}
> [----------] 6 tests from ExamplesTest
> [ RUN ] ExamplesTest.TestFramework
> [ OK ] ExamplesTest.TestFramework (8643 ms)
> [ RUN ] ExamplesTest.NoExecutorFramework
> tests/script.cpp:81: Failure
> Failed
> no_executor_framework_test.sh terminated with signal 'Aborted'
> [ FAILED ] ExamplesTest.NoExecutorFramework (7220 ms)
> [ RUN ] ExamplesTest.JavaFramework
> [ OK ] ExamplesTest.JavaFramework (11181 ms)
> [ RUN ] ExamplesTest.JavaException
> [ OK ] ExamplesTest.JavaException (5624 ms)
> [ RUN ] ExamplesTest.JavaLog
> [ OK ] ExamplesTest.JavaLog (6472 ms)
> [ RUN ] ExamplesTest.PythonFramework
> [ OK ] ExamplesTest.PythonFramework (14467 ms)
> [----------] 6 tests from ExamplesTest (53607 ms total)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)