Re: Build failed in Jenkins: Mesos » autotools,gcc,--verbose --enable-libevent --enable-ssl,GLOG_v=1 MESOS_VERBOSE=1,ubuntu:14.04,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-6)&&(!ubuntu-eu2) #2933

Benjamin Bannier Thu, 17 Nov 2016 00:54:06 -0800

Hi,

>> What do folks think about removing future timeouts in tests altogether?
>> Instead, we can time the whole suite differently on different CIs?


> Has there been any response from the ASF Infra folks on addressing the
> VM/hardware issues? Seems like it will be difficult to get good signal
> from the ASF CI in the absence of some improvements on the
> infrastructure side.

Alex brings up a valid way to largely decouple us from VM lag problems which 
seems to be mostly a problem since we expect actions in tests to finished 
faster than actual happing. The real, tested code would be much less aggressive 
in interpreting small response lags as fatal errors.

Would we set the default timeout for say `AWAIT_READY` in our test code to 
e.g., infinity, slow VMs would be much less an issue. To not indefinitely block 
machines for broken tests we probably should then either limit the duration of 
our Jenkins jobs (if ASF doesn’t already have that safeguard), or maybe even 
add that to our test execution setup itself (e.g., simply with `timeout(1)` or 
equivalents from the outside, or inside directly in the harness).

The downside of this is of course that a hanging test (e.g., due to some true 
race) could block execution of all other tests.

Being more patient can be helpful in other environments as well (e.g., 
`valgrind`).


Cheers,

Benjamin

Re: Build failed in Jenkins: Mesos » autotools,gcc,--verbose --enable-libevent --enable-ssl,GLOG_v=1 MESOS_VERBOSE=1,ubuntu:14.04,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-6)&&(!ubuntu-eu2) #2933

Reply via email to