[ 
https://issues.apache.org/jira/browse/MESOS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479319#comment-16479319
 ] 

Andrei Budnik commented on MESOS-8828:
--------------------------------------

Another possible solution can be introducing `FUTURE_DELAY(M)` primitive, that 
returns a future which is set to ready when `delay(duration, pid, M)` is 
called. This primitive is kind of similar to `FUTURE_DISPATCH()`.

> Clock::advance can race with process::delay in tests.
> -----------------------------------------------------
>
>                 Key: MESOS-8828
>                 URL: https://issues.apache.org/jira/browse/MESOS-8828
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Andrei Budnik
>            Priority: Major
>              Labels: flaky
>         Attachments: failed_tests.txt
>
>
> There are lots of tests that use the following pattern:
>  1) [Pause 
> clocks|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1108]
>  2) [Start an 
> agent|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1122]
>  3) [Advance clocks to trigger an 
> event|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1125]
>  4) [Wait for the 
> event|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1127]
> If an event is scheduled via `process::delay()` after advancing the clocks, 
> then a test hangs in the endless wait for the event that is never triggered, 
> because libprocess clocks are paused. For example, 
> `DiskResource/PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy/0` 
> test hangs at step 4, because the clocks at step 3 has been already advanced 
> before the agent scheduled a call of 
> [Slave::authenticate()|https://github.com/apache/mesos/blob/ebe92c9b39933136968e4ba3a52527e52b361d22/src/slave/slave.cpp#L1301]
>  method. After a successful authentication with a master, the agent sends a 
> [UpdateSlaveMessage|https://github.com/apache/mesos/blob/ebe92c9b39933136968e4ba3a52527e52b361d22/src/slave/slave.cpp#L1546-L1550].
>  But the authentication process never finishes because 
> `[Slave::authenticate()|https://github.com/apache/mesos/blob/ebe92c9b39933136968e4ba3a52527e52b361d22/src/slave/slave.cpp#L1301]`
>  is never called.
> A list of tests that might be affected by the issue attached to this ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to