Hi, so having gone through my logs more and looked at the interactions, I am led to believe that the actual problem was not Mesos but my (vagrant based) setup. It seems that sometimes Mesos used a different internal IP for sending requests which results in messages like this:
mesos-master.WARNING:W0724 02:51:59.583686 26949 master.cpp:1769] Ignoring launch tasks message for offer [ 20140724-004359-3078776074-5050-26927-854 ] of framework 20140724-004359-3078776074-5050-26927-0000 from '[email protected]:56713' because it is not from the registered framework '[email protected]:56713' (note that the IP changes). After locking down IP addresses on the mesos master and slave command line, I could no longer reproduce the launch problems. -h On Wed, Jul 23, 2014 at 1:49 PM, Vinod Kone <[email protected]> wrote: > On Wed, Jul 23, 2014 at 12:17 PM, Henning Schmiedehausen < > [email protected]> wrote: > >> I0722 02:09:36.131597 6767 slave.cpp:1783] Flushing queued task >> 83ea269c-8988-49bd-9d23-034c33858352 for executor >> 'candyland_e3030c10-d154-4a34-a72f-aba07e1a84d4' of framework >> 20140711-183251-3078776074-5050-5240-0083 >> > > The above line tells me that the slave did send() (or tried to send) a > RunTaskMessage to the executor. Now, depending on how the underlying > libprocess library send() works w.r.t to native and non-native endpoints, > there might be a bug in libprocess code (on the slave side) or in the > executor code (jesos). I'm not very familiar with the native client > interaction, so I would let someone else chime in (maybe benh?).
