Greg Mann commented on MESOS-6180:

Another common error seen when this issue manifests is:
Recovery failed: Failed to recover registrar: Failed to perform fetch within 
See the file {{RoleTest.ImplicitRoleRegister.txt}} for the full test log.

[~haosd...@gmail.com], there is a review 
[here|https://reviews.apache.org/r/41665/] proposing the {{in_memory}} registry 
for tests. I'm currently trying to figure out whether this is a legitimate bug 
or simply the result of an unreasonable load put on the machine.

> Several tests are flaky, with futures timing out early
> ------------------------------------------------------
>                 Key: MESOS-6180
>                 URL: https://issues.apache.org/jira/browse/MESOS-6180
>             Project: Mesos
>          Issue Type: Bug
>          Components: tests
>            Reporter: Greg Mann
>            Assignee: haosdent
>              Labels: mesosphere, tests
>         Attachments: CGROUPS_ROOT_PidNamespaceBackward.log, 
> CGROUPS_ROOT_PidNamespaceForward.log, FetchAndStoreAndStoreAndFetch.log, 
> RoleTest.ImplicitRoleRegister.txt, 
> flaky-containerizer-pid-namespace-backward.txt, 
> flaky-containerizer-pid-namespace-forward.txt
> Following the merging of a large patch chain, it was noticed on our internal 
> CI that several tests had become flaky, with a similar pattern in the 
> failures: the tests fail early when a future times out. Often, this occurs 
> when a test cluster is being spun up and one of the offer futures times out. 
> This has been observed in the following tests:
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward
> * ZooKeeperStateTest.FetchAndStoreAndStoreAndFetch
> * RoleTest.ImplicitRoleRegister
> * SlaveRecoveryTest/0.MultipleFrameworks
> * SlaveRecoveryTest/0.ReconcileShutdownFramework
> * SlaveTest.ContainerizerUsageFailure
> * MesosSchedulerDriverTest.ExplicitAcknowledgements
> * SlaveRecoveryTest/0.ReconnectHTTPExecutor (MESOS-6164)
> * ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (MESOS-6165)
> * SlaveTest.CommandTaskWithKillPolicy (MESOS-6166)
> See the linked JIRAs noted above for individual tickets addressing a couple 
> of these.

This message was sent by Atlassian JIRA

Reply via email to