On 10/18/2014 12:55 PM, Alex Rukletsov wrote: > Hi Oliver, > > you can get a TASK_LOST if import directives in your executor fail. Do you > have mesos python eggs installed or available through PYTHONPATH? Could you > please also paste the output of stderr and stdout of the lost task (you can > access them via mesos webUI → sandbox)? I do not see the task at all on webUI. Python eggs are available from PYTHONPATH. My eggs are in MESOS_BUILD_DIR. If I execute directly my executor, I have no "python" error, only a MISSING SLAVE ID (but this is correct as mesos adds this env at runtime).
I see that task is lost because, in my scheduler, in the statusUpdate method, I print the task status (value = 5). Message is empty. nothing in webUI, nothing in console logs.... as my executor is not executed, it means that mesos (master or slave) give me this error status, but I have no additional info about the reason. I have used and adapted the examples given with sources (src/examples/python). Olivier > > On Fri, Oct 17, 2014 at 7:31 PM, Vinod Kone <[email protected]> wrote: > >> Can you grep for TASK_LOST in master and slave logs and paste the output >> here? >> >> On Fri, Oct 17, 2014 at 8:24 AM, Olivier Sallou <[email protected]> >> wrote: >> >>> Hi, >>> I have installed mesos on a single host master/slave config (for >>> devpt/test). >>> >>> Mesos works fine with frameworks I tested (aurora...). >>> >>> I try to create my own scheduler/executor in python, based on example >>> given with sources, but I cannot get my task executed. >>> >>> Executor is not executed (I have added debug logs in a file to check, >>> and no file is created), but I see no error in master logs (console) nor >>> slave logs. >>> >>> In master I can see: >>> >>> I1017 16:50:30.601210 25794 master.cpp:3559] Sending 1 offers to >>> framework 20141017-141022-16777343-5050-25774-0047 >>> I1017 16:50:30.608912 25789 master.cpp:2169] Processing reply for >>> offers: [ 20141017-141022-16777343-5050-25774-97 ] on slave >>> 20141017-141022-16777343-5050-25774-0 at slave(1)@127.0.0.1:5051 >>> (localhost) for framework 20141017-141022-16777343-5050-25774-0047 >>> I1017 16:50:30.609207 25789 hierarchical_allocator_process.hpp:563] >>> Recovered cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000] >>> (total allocatable: cpus(*):8; mem(*):6900; disk(*):215925; >>> ports(*):[31000-32000]) on slave 20141017-141022-16777343-5050-25774-0 >>> from framework 20141017-141022-16777343-5050-25774-0047 >>> >>> My reply to the offer is received, but in my scheduler I receive an >>> update status of TASK_LOST. >>> >>> I do not see how to debug this, I see no information why my task is lost >>> (there is enough cpu/mem, I ask 2 cpu, and 2024 mem), and it seems that >>> it is rejected at master level. >>> >>> Any hint on how to analyse this? >>> >>> Thanks >>> >>> -- >>> gpg key id: 4096R/326D8438 (keyring.debian.org) >>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 >>> >>> >>> -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
