Re: how to debug task lost in custom scheduler?

Olivier Sallou Sun, 19 Oct 2014 23:13:26 -0700

On 10/18/2014 12:55 PM, Alex Rukletsov wrote:
> Hi Oliver,
>
> you can get a TASK_LOST if import directives in your executor fail. Do you
> have mesos python eggs installed or available through PYTHONPATH? Could you
> please also paste the output of stderr and stdout of the lost task (you can
> access them via mesos webUI → sandbox)?
I do not see the task at all on webUI. Python eggs are available from
PYTHONPATH. My eggs are in MESOS_BUILD_DIR.
If I execute directly my executor, I have no "python" error, only a
MISSING SLAVE ID (but this is correct as mesos adds this env at runtime).


I see that task is lost because, in my scheduler, in the statusUpdate
method, I print the task status (value = 5). Message is empty.

nothing in webUI, nothing in console logs.... as my executor is not
executed, it means that mesos (master or slave) give me this error
status, but I have no additional info about the reason.

I have used and adapted the examples given with sources
(src/examples/python).

Olivier
>
> On Fri, Oct 17, 2014 at 7:31 PM, Vinod Kone <[email protected]> wrote:
>
>> Can you grep for TASK_LOST in master and slave logs and paste the output
>> here?
>>
>> On Fri, Oct 17, 2014 at 8:24 AM, Olivier Sallou <[email protected]>
>> wrote:
>>
>>> Hi,
>>> I have installed mesos on a single host master/slave config (for
>>> devpt/test).
>>>
>>> Mesos works fine with frameworks I tested (aurora...).
>>>
>>> I try to create my own scheduler/executor in python, based on example
>>> given with sources, but I cannot get my task executed.
>>>
>>> Executor is not executed (I have added debug logs in a file to check,
>>> and no file is created), but I see no error in master logs (console) nor
>>> slave logs.
>>>
>>> In master I can see:
>>>
>>> I1017 16:50:30.601210 25794 master.cpp:3559] Sending 1 offers to
>>> framework 20141017-141022-16777343-5050-25774-0047
>>> I1017 16:50:30.608912 25789 master.cpp:2169] Processing reply for
>>> offers: [ 20141017-141022-16777343-5050-25774-97 ] on slave
>>> 20141017-141022-16777343-5050-25774-0 at slave(1)@127.0.0.1:5051
>>> (localhost) for framework 20141017-141022-16777343-5050-25774-0047
>>> I1017 16:50:30.609207 25789 hierarchical_allocator_process.hpp:563]
>>> Recovered cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000]
>>> (total allocatable: cpus(*):8; mem(*):6900; disk(*):215925;
>>> ports(*):[31000-32000]) on slave 20141017-141022-16777343-5050-25774-0
>>> from framework 20141017-141022-16777343-5050-25774-0047
>>>
>>> My reply to the offer is received, but in my scheduler I receive an
>>> update status of TASK_LOST.
>>>
>>> I do not see how to debug this, I see no information why my task is lost
>>> (there is enough cpu/mem, I ask 2 cpu, and 2024 mem), and it seems that
>>> it is rejected at master level.
>>>
>>> Any hint on how to analyse this?
>>>
>>> Thanks
>>>
>>> --
>>> gpg key id: 4096R/326D8438  (keyring.debian.org)
>>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>>>
>>>
>>>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: how to debug task lost in custom scheduler?

Reply via email to