> On Jan. 2, 2015, 10:23 p.m., Ben Mahler wrote: > > What happened previously if a launch takes forever? > > > > What happens now if a launch takes forever? > > Timothy Chen wrote: > Ben that's a good point, previously the registration timeout gates the > launch not to take forever, however it is too coarse of a timeout as it > included the time to launch and also the time it take for the executor to > register after launch. > > Now it will only time out on the latter case, I think we do need a > seperate timeout that checks for the launch itself. Ben what do you think? > > Nishant Suneja wrote: > I concur with Timothy. I think having a separate timeout to track the > container launch itself makes sense. > > Timothy Chen wrote: > Hi Nishant, can you also add another timer for gating the launch? Once we > have both PRs then I think we are in a better state to merge both. > > Tom Arnfeld wrote: > Just caught wind of this change and it's good to see! Going to be very > useful over here. Can I request that there's also a command line flag for the > slave to change the launch timeout? :-) > > Nishant Suneja wrote: > @Timothy: For launch timeout, if the launch doesnt happen within the > specified timeout duration, what action would we want to take ? We could > possibly use the same callback "registerExecutorTimeout", which effectively > moves the executor to TERMINATING state (for gc later on) and calls the > containerizer destroy for that container. > > Also, what default value would we want for this timeout ? > > > @Tom: yeah..I think its a good idea. I can add that. > > Timothy Chen wrote: > I think we want to perform the same action (registerExecutorTimeout) > actually. If it's going to be shared, I think we should rename so it's not > executor registration specific. > The other part is to make sure the timeout is cancelled when launch is > finished before issuing the next delay call.
Yeah.. I have implemented that. So, I was writing a testcase to verify the container launch timeout logic, which has 2 conditions to be tested:: a) container launch fails, and timeout action triggers. b) container launch happens, and timeout action is cancelled. I wanted to test both of them in the same testcase. However, once the "launchExecutor()" call has been made for the default executor testing the first case, 2nd case doesnt call the "launchExecutor()" because the framework already has an in-memory pointer for it, and skips the launch. What would be the easiest way to trigger launchExecutor() the 2nd time ? I wanted to get a handle of the Framework structure, and call destoryExecutor(), before launching the tasks again, but couldnt find an easy way to get the handle. - Nishant ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29437/#review66557 ----------------------------------------------------------- On Dec. 31, 2014, 11:57 p.m., Nishant Suneja wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/29437/ > ----------------------------------------------------------- > > (Updated Dec. 31, 2014, 11:57 p.m.) > > > Review request for mesos and Timothy Chen. > > > Bugs: MESOS-999 > https://issues.apache.org/jira/browse/MESOS-999 > > > Repository: mesos-git > > > Description > ------- > > As part of this bug fix, I have trigerred the executor registration timeout > timer after the container's future object is set, instead of starting the > timer when the container launch is still pending > > > Diffs > ----- > > src/slave/slave.cpp 50b57819b55bdcdb9f49f20648199badc4d3f37b > src/tests/composing_containerizer_tests.cpp > 5ab5a36cadb7f8622bad0c5814e9a5fb338753ad > src/tests/containerizer.hpp 24b014f44d9eec56840e18cf39fbf9100f2c0711 > src/tests/slave_tests.cpp f2896a1fc4521452e29fd261a6f117372345dcfc > > Diff: https://reviews.apache.org/r/29437/diff/ > > > Testing > ------- > > Added the unit test : SlaveTest::ExecutorRegistrationTimeoutTrigger > make check succeeds. > > > Thanks, > > Nishant Suneja > >