Re: Question about running hadoop over mesos

Benjamin Mahler Wed, 17 Apr 2013 11:00:56 -0700

As long as MesosScheduler thinks that there are pending jobs that need to
have TaskTrackers launched, we will normally allow the PREP->FAILED
transition to run so this will self-correct as expected when we are able to
launch TaskTrackers.


Guodong, I see from your original email, but can you confirm again that the
MesosScheduler thinks there are pending tasks after you issue the kill?


On Wed, Apr 17, 2013 at 8:43 AM, Vinod Kone <[email protected]> wrote:

> I see. So looks like Hadoop doesn't differentiate between not launching
> setup tasks and launching setup tasks but waiting to get to RUNNING state?
> That's kinda unfortunate.
>
> Anyway as long as this is the expected behavior for Hadoop users Im fine
> for now.
>
> @vinodkone
> Sent from my mobile
>
> On Apr 17, 2013, at 8:04 AM, 王国栋 <[email protected]> wrote:
>
> > Hi Vinod,
> >
> > I think it is reasonable that hadoop run the job clean up task. Since all
> > the job status change is triggered by the completion of some task, eg.
> Job
> > status change from PREP to RUNNING as long as the job set up task is
> > completed(if the job is not killed yet).
> >
> > For any job, as long as it is killed, no matter the previous status is
> > RUNNING or PREP, the job cleanup task will be scheduled to a tasktracker.
> > When the job cleanup task is finished, the job status will be set as
> KILLED
> > and FAILED. So the logic for changing the job status will be the same,
> > regardless the previous job status.
> >
> > Anyway, I think it makes sense to me. So I don't think this is a bug :)
> >
> > Guodong
> >
> >
> > On Wed, Apr 17, 2013 at 10:52 PM, Vinod Kone <[email protected]>
> wrote:
> >
> >> Thanks for digging through the code. Your analysis makes sense. Though
> it
> >> is strange that Hadoop wants to run a cleanup task for a job that was
> never
> >> launched. Maybe you could file a bug for Hadoop?
> >>
> >> @vinodkone
> >> Sent from my mobile
> >>
> >> On Apr 17, 2013, at 2:30 AM, 王国栋 <[email protected]> wrote:
> >>
> >>> Hi Ben,
> >>>
> >>> I have read some of the code about job tracker and task tracker these 2
> >>> days.
> >>>
> >>> I don't think this scenario is a bug. Neither a bug in mesos scheduler
> >> nor
> >>> a bug in jobtracker. The following is my opinion.
> >>>
> >>> We kill the job when the job status is PREP( which means setup task is
> >> not
> >>> launched yet). When we kill the job, jobtracker just mark 2 boolean
> >> values
> >>> in JobInProgress as true, which means this job is killed and failed.
> But
> >>> killing a job will not trigger any job status change until the job
> >> cleanup
> >>> task is scheduled to a tasktracker.
> >>>
> >>> The job status stays in PREP just because mesos can not launch a
> >>> tasktracker due to not enough resource. So it is a little bit
> >> embarrassing.
> >>> When there is enough resource for mesos to launch a tasktracker(say,
> >> there
> >>> is some other mesos-slave who can offer more enough to start a task
> >>> tracker), mesos will launch a task tracker and the job cleanup task
> will
> >> be
> >>> scheduled to this task tracker. And at last, after the clean up task is
> >>> completed, the job status will be updated as *KILLED* and the pending
> >>> mapper slot and pending reduce slot will be released too.
> >>>
> >>> If anything is wrong, please correct me. :)
> >>>
> >>>
> >>> Guodong
> >>>
> >>>
> >>> On Tue, Apr 16, 2013 at 1:02 AM, Benjamin Mahler
> >>> <[email protected]>wrote:
> >>>
> >>>> Thanks! It would be helpful to me if you could provide the exact
> >> commands
> >>>> you're running for steps 8, 9 and 10 as well. It would save me some
> >> time as
> >>>> my knowledge of Hadoop commands is limited.
> >>>>
> >>>>
> >>>> On Sun, Apr 14, 2013 at 9:53 PM, 王国栋 <[email protected]> wrote:
> >>>>
> >>>>> Hi ben,
> >>>>>
> >>>>> I've updated to the latest code in trunk. And the problem is still
> >> here.
> >>>>>
> >>>>> Please follow these steps to reproduce it.
> >>>>> 1. check out the trunk code
> >>>>> 2. bootstrap
> >>>>> 3. mkdir build for out source build
> >>>>> 4. cd build && ../configure
> >>>>> 5. make
> >>>>> 6. cd hadoop && make hadoop-0.20.2-cdh3u3
> >>>>> 7. modify conf/mapred-site.xml, set mapred.mesos.slot.mem=10240(make
> >> sure
> >>>>> the resource is not enough)
> >>>>> 8. start mesos and the jobtracker
> >>>>> 9. submit a wordcount job to jobtracker.(at this time, the job is
> >> pending
> >>>>> due to not enough resource)
> >>>>> 10. kill the job (can not kill the job)
> >>>>>
> >>>>>
> >>>>> Guodong
> >>>>>
> >>>>>
> >>>>> On Mon, Apr 15, 2013 at 10:52 AM, 王国栋 <[email protected]> wrote:
> >>>>>
> >>>>>> I am building the latest code in the trunk. I will keep you updated,
> >> if
> >>>>>> the problem is still here, I will give you the exact reproduce
> process
> >>>>> and
> >>>>>> make sure you can debug it.
> >>>>>>
> >>>>>> I also think this may be a bug in jobtracker. :)
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> Guodong
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Apr 15, 2013 at 10:20 AM, Benjamin Mahler <
> >>>>>> [email protected]> wrote:
> >>>>>>
> >>>>>>> On April 9th we submitted a deadlock fix, please update to make
> sure
> >>>> you
> >>>>>>> have the fix:
> >>>>>>> https://reviews.apache.org/r/10352/
> >>>>>>>
> >>>>>>> Can you provide the commands to reproduce from a mesos build
> >>>> directory?
> >>>>>>> I'd
> >>>>>>> like to be able to reproduce this locally on my laptop, so the
> exact
> >>>>>>> commands I need to run from within my mesos build directory would
> be
> >>>>>>> useful
> >>>>>>> :)
> >>>>>>>
> >>>>>>> The fact that the job gets stuck in the JobTracker as PREP makes me
> >>>>> think
> >>>>>>> there's a bug in the JobTracker, I imagine people don't often run
> >>>>>>> JobTrackers without any TaskTrackers, which is not the case here.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sun, Apr 14, 2013 at 7:04 PM, 王国栋 <[email protected]> wrote:
> >>>>>>>
> >>>>>>>> Hi Ben,
> >>>>>>>>
> >>>>>>>> I put my ideas inline. Please check.
> >>>>>>>>
> >>>>>>>> On Mon, Apr 15, 2013 at 8:13 AM, Benjamin Mahler
> >>>>>>>> <[email protected]>wrote:
> >>>>>>>>
> >>>>>>>>> So I'm still a little confused here. From what you showed, it
> >>>> looks
> >>>>>>> like
> >>>>>>>>> the 'job -kill' command you posted succeeded on the client side?
> >>>>>>>>>
> >>>>>>>>> [trunk ?]$ ./bin/hadoop job -kill job_201304121621_000113/04/12
> >>>>>>> 16:27:16
> >>>>>>>>> INFO security.UserGroupInformation: JAAS Configuration already
> set
> >>>>> up
> >>>>>>> for
> >>>>>>>>> Hadoop, not re-installing.
> >>>>>>>>> *Killed job job_201304121621_0001**
> >>>>>>>>> *
> >>>>>>>> Yes, from the client side, killing the job is successful.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I see that the JobTracker still thinks the job is pending. What
> >>>>>>> happens
> >>>>>>>>> when you re-issue that kill command?
> >>>>>>>> The Jobtracker still think the status of job is *PREP*, when I
> >>>> reissue
> >>>>>>> kill
> >>>>>>>> cmd, it seems the same as the first time I issued the cmd.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I'm confused as to why it's still in pending when the JobTracker
> >>>> has
> >>>>>>>>> removed the job:
> >>>>>>>>>
> >>>>>>>>> 13/04/12 16:27:16 INFO mapred.JobTracker: Killing job
> >>>>>>>> job_201304121621_0001
> >>>>>>>>> 13/04/12 16:27:16 INFO mapred.JobInProgress: Killing job
> >>>>>>>>> 'job_201304121621_0001'*
> >>>>>>>>>
> >>>>>>>>> Looking at the JobTracker code, it seems like this indeed kills
> >>>> the
> >>>>>>> job:
> >>>>>>>>>
> >>>>>>>>> The code that prints the log line in JobTracker:
> >>>>>>>>> private synchronized void killJob(JobInProgress job) {
> >>>>>>>>>   LOG.info("Killing job " + job.getJobID());
> >>>>>>>>>   JobStatus prevStatus = (JobStatus)job.getStatus().clone();
> >>>>>>>>>   job.kill();
> >>>>>>>>>
> >>>>>>>>>   // Inform the listeners if the job is killed
> >>>>>>>>>   // Note :
> >>>>>>>>>   //   If the job is killed in the PREP state then the listeners
> >>>>>>> will
> >>>>>>>> be
> >>>>>>>>>   //   invoked
> >>>>>>>>>   //   If the job is killed in the RUNNING state then cleanup
> >>>>> tasks
> >>>>>>>> will
> >>>>>>>>> be
> >>>>>>>>>   //   launched and the updateTaskStatuses() will take care of
> >>>> it
> >>>>>>>>>   JobStatus newStatus = (JobStatus)job.getStatus().clone();
> >>>>>>>>>   if (prevStatus.getRunState() != newStatus.getRunState()
> >>>>>>>>>       && newStatus.getRunState() == JobStatus.KILLED) {
> >>>>>>>>>     JobStatusChangeEvent event =
> >>>>>>>>>       new JobStatusChangeEvent(job, EventType.RUN_STATE_CHANGED,
> >>>>>>>>> prevStatus,
> >>>>>>>>>           newStatus);
> >>>>>>>>>     updateJobInProgressListeners(event);
> >>>>>>>>>   }
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> Then JobInProgress.kill():
> >>>>>>>>> /**
> >>>>>>>>>  * Kill the job and all its component tasks. This method should
> >>>> be
> >>>>>>>> called
> >>>>>>>>> from
> >>>>>>>>>  * jobtracker and should return fast as it locks the jobtracker.
> >>>>>>>>>  */
> >>>>>>>>> public void kill() {
> >>>>>>>>>   boolean killNow = false;
> >>>>>>>>>   synchronized(jobInitKillStatus) {
> >>>>>>>>>     jobInitKillStatus.killed = true;
> >>>>>>>>>     //if not in middle of init, terminate it now
> >>>>>>>>>     if(!jobInitKillStatus.initStarted ||
> >>>>>>> jobInitKillStatus.initDone) {
> >>>>>>>>>       //avoiding nested locking by setting flag
> >>>>>>>>>       killNow = true;
> >>>>>>>>>     }
> >>>>>>>>>   }
> >>>>>>>>>   if(killNow) {
> >>>>>>>>>     terminate(JobStatus.KILLED);
> >>>>>>>>>   }
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> I don't really see the issue at this point, so any further
> >>>>>>> information /
> >>>>>>>>> debugging on your end that reveals this bug would be very
> valuable
> >>>>> to
> >>>>>>> us.
> >>>>>>>> I am trying to debug this on my laptop. I found when I issue kill
> >>>> cmd
> >>>>> to
> >>>>>>>> jobtracker, the job status is never changed. It is always *PREP*.
> >>>>>>>> Can you reproduce this on your machine when you follow the steps I
> >>>>>>>> mentioned in the previous mail?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Lastly, what version of the code are you running? If you're
> >>>> running
> >>>>>>> off
> >>>>>>>>> trunk, when did you last update it?
> >>>>>>>>
> >>>>>>>> I am running with the code in the trunk, it is updated last week.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Ben
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Apr 12, 2013 at 1:32 AM, 王国栋 <[email protected]> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Vinod,
> >>>>>>>>>>
> >>>>>>>>>> When I submit the job, the log of jobtracker is as follow
> >>>>>>>>>> ----------------------------------
> >>>>>>>>>> 13/04/12 16:22:37 INFO mapred.MesosScheduler: Added job
> >>>>>>>>>> job_201304121621_0001
> >>>>>>>>>> 13/04/12 16:22:37 INFO mapred.JobTracker: Job
> >>>>> job_201304121621_0001
> >>>>>>>> added
> >>>>>>>>>> successfully for user 'guodong' to queue 'default'
> >>>>>>>>>> 13/04/12 16:22:37 INFO mapred.JobTracker: Initializing
> >>>>>>>>>> job_201304121621_0001
> >>>>>>>>>> 13/04/12 16:22:37 INFO mapred.JobInProgress: Initializing
> >>>>>>>>>> job_201304121621_0001
> >>>>>>>>>> 13/04/12 16:22:37 INFO mapred.AuditLogger: USER=guodong
> >>>>> IP=127.0.0.1
> >>>>>>>>>> OPERATION=SUBMIT_JOB TARGET=job_201304121621_0001 RESULT=SUCCESS
> >>>>>>>>>> 13/04/12 16:22:37 INFO mapred.JobInProgress: jobToken generated
> >>>>> and
> >>>>>>>>> stored
> >>>>>>>>>> with users keys in
> >>>>>>>>>> /tmp/hadoop-guodong/mapred/system/job_201304121621_0001/jobToken
> >>>>>>>>>> 13/04/12 16:22:37 INFO mapred.JobInProgress: Input size for job
> >>>>>>>>>> job_201304121621_0001 = 89502988. Number of splits = 3
> >>>>>>>>>> 13/04/12 16:22:37 INFO net.NetworkTopology: Adding a new node:
> >>>>>>>>>> /default-rack/localhost
> >>>>>>>>>> 13/04/12 16:22:37 INFO mapred.JobInProgress:
> >>>>>>>>>> tip:task_201304121621_0001_m_000000 has split on
> >>>>>>>>>> node:/default-rack/localhost
> >>>>>>>>>> 13/04/12 16:22:37 INFO mapred.JobInProgress:
> >>>>>>>>>> tip:task_201304121621_0001_m_000001 has split on
> >>>>>>>>>> node:/default-rack/localhost
> >>>>>>>>>> 13/04/12 16:22:37 INFO mapred.JobInProgress:
> >>>>>>>>>> tip:task_201304121621_0001_m_000002 has split on
> >>>>>>>>>> node:/default-rack/localhost
> >>>>>>>>>> 13/04/12 16:22:37 INFO mapred.JobInProgress:
> >>>> job_201304121621_0001
> >>>>>>>>>> LOCALITY_WAIT_FACTOR=1.0
> >>>>>>>>>> 13/04/12 16:22:37 INFO mapred.JobInProgress: Job
> >>>>>>> job_201304121621_0001
> >>>>>>>>>> initialized successfully with 3 map tasks and 1 reduce tasks.
> >>>>>>>>>> 13/04/12 16:22:39 INFO mapred.MesosScheduler: JobTracker Status
> >>>>>>>>>>     Pending Map Tasks: 3
> >>>>>>>>>>  Pending Reduce Tasks: 1
> >>>>>>>>>>        Idle Map Slots: 0
> >>>>>>>>>>     Idle Reduce Slots: 0
> >>>>>>>>>>    Inactive Map Slots: 0 (launched but no hearbeat yet)
> >>>>>>>>>> Inactive Reduce Slots: 0 (launched but no hearbeat yet)
> >>>>>>>>>>      Needed Map Slots: 3
> >>>>>>>>>>   Needed Reduce Slots: 1
> >>>>>>>>>> 13/04/12 16:22:39 INFO mapred.MesosScheduler: Declining offer
> >>>> with
> >>>>>>>>>> insufficient resources for a TaskTracker:
> >>>>>>>>>> cpus: offered 4.0 needed 1.800000011920929
> >>>>>>>>>> mem : offered 2731.0 needed 6400.0
> >>>>>>>>>> disk: offered 75120.0 needed 4096.0
> >>>>>>>>>> ports:  at least 2 (sufficient)
> >>>>>>>>>> [name: "cpus"
> >>>>>>>>>> type: SCALAR
> >>>>>>>>>> scalar {
> >>>>>>>>>> value: 4.0
> >>>>>>>>>> }
> >>>>>>>>>> , name: "mem"
> >>>>>>>>>> type: SCALAR
> >>>>>>>>>> scalar {
> >>>>>>>>>> value: 2731.0
> >>>>>>>>>> }
> >>>>>>>>>> , name: "ports"
> >>>>>>>>>> type: RANGES
> >>>>>>>>>> ranges {
> >>>>>>>>>> range {
> >>>>>>>>>>   begin: 31000
> >>>>>>>>>>   end: 32000
> >>>>>>>>>> }
> >>>>>>>>>> }
> >>>>>>>>>> , name: "disk"
> >>>>>>>>>> type: SCALAR
> >>>>>>>>>> scalar {
> >>>>>>>>>> value: 75120.0
> >>>>>>>>>> }
> >>>>>>>>>> ]
> >>>>>>>>>> 13/04/12 16:22:39 INFO mapred.MesosScheduler: Unable to fully
> >>>>>>> satisfy
> >>>>>>>>>> needed map/reduce slots: 3 map slots 1 reduce slots remaining
> >>>>>>>>>> 13/04/12 16:22:45 INFO mapred.MesosScheduler: JobTracker Status
> >>>>>>>>>>     Pending Map Tasks: 3
> >>>>>>>>>>  Pending Reduce Tasks: 1
> >>>>>>>>>>        Idle Map Slots: 0
> >>>>>>>>>>     Idle Reduce Slots: 0
> >>>>>>>>>>    Inactive Map Slots: 0 (launched but no hearbeat yet)
> >>>>>>>>>> Inactive Reduce Slots: 0 (launched but no hearbeat yet)
> >>>>>>>>>>      Needed Map Slots: 3
> >>>>>>>>>>   Needed Reduce Slots: 1
> >>>>>>>>>> 13/04/12 16:22:45 INFO mapred.MesosScheduler: Declining offer
> >>>> with
> >>>>>>>>>> insufficient resources for a TaskTracker:
> >>>>>>>>>> cpus: offered 4.0 needed 1.800000011920929
> >>>>>>>>>> mem : offered 2731.0 needed 6400.0
> >>>>>>>>>> disk: offered 75120.0 needed 4096.0
> >>>>>>>>>> ports:  at least 2 (sufficient)
> >>>>>>>>>> [name: "cpus"
> >>>>>>>>>> type: SCALAR
> >>>>>>>>>> scalar {
> >>>>>>>>>> value: 4.0
> >>>>>>>>>> }
> >>>>>>>>>> , name: "mem"
> >>>>>>>>>> type: SCALAR
> >>>>>>>>>> scalar {
> >>>>>>>>>> value: 2731.0
> >>>>>>>>>> }
> >>>>>>>>>> , name: "ports"
> >>>>>>>>>> type: RANGES
> >>>>>>>>>> ranges {
> >>>>>>>>>> range {
> >>>>>>>>>>   begin: 31000
> >>>>>>>>>>   end: 32000
> >>>>>>>>>> }
> >>>>>>>>>> }
> >>>>>>>>>> , name: "disk"
> >>>>>>>>>> type: SCALAR
> >>>>>>>>>> scalar {
> >>>>>>>>>> value: 75120.0
> >>>>>>>>>> }
> >>>>>>>>>> ]
> >>>>>>>>>> 13/04/12 16:22:45 INFO mapred.MesosScheduler: Unable to fully
> >>>>>>> satisfy
> >>>>>>>>>> needed map/reduce slots: 3 map slots 1 reduce slots remaining
> >>>>>>>>>>
> >>>>>>>>>> ----------------------------------
> >>>>>>>>>>
> >>>>>>>>>> the hadoop client log is
> >>>>>>>>>> --------------------------------------------
> >>>>>>>>>> 13/04/12 16:22:36 INFO security.UserGroupInformation: JAAS
> >>>>>>>> Configuration
> >>>>>>>>>> already set up for Hadoop, not re-installing.
> >>>>>>>>>> 13/04/12 16:22:36 INFO util.NativeCodeLoader: Loaded the
> >>>>>>> native-hadoop
> >>>>>>>>>> library
> >>>>>>>>>> 13/04/12 16:22:37 INFO input.FileInputFormat: Total input paths
> >>>> to
> >>>>>>>>> process
> >>>>>>>>>> : 1
> >>>>>>>>>> 13/04/12 16:22:37 WARN snappy.LoadSnappy: Snappy native library
> >>>> is
> >>>>>>>>>> available
> >>>>>>>>>> 13/04/12 16:22:37 INFO snappy.LoadSnappy: Snappy native library
> >>>>>>> loaded
> >>>>>>>>>> 13/04/12 16:22:37 INFO mapred.JobClient: Running job:
> >>>>>>>>> job_201304121621_0001
> >>>>>>>>>> 13/04/12 16:22:38 INFO mapred.JobClient:  map 0% reduce 0%
> >>>>>>>>>>
> >>>>>>>>>> --------------------------------------------
> >>>>>>>>>>
> >>>>>>>>>> Since the client is hung up,  I use ctrl-c to stop the client.
> >>>>> Then
> >>>>>>> use
> >>>>>>>>> job
> >>>>>>>>>> -status to check the job status.
> >>>>>>>>>>
> >>>>>>>>>> guodong@guodong-Vostro-3400
> >>>>>>>>>> :~/workspace/mesos-trunk/build/hadoop/hadoop-0.20.2-cdh3u3
> >>>>>>>>>> [trunk ?]$ ./bin/hadoop job -status job_201304121621_0001
> >>>>>>>>>> 13/04/12 16:26:22 INFO security.UserGroupInformation: JAAS
> >>>>>>>> Configuration
> >>>>>>>>>> already set up for Hadoop, not re-installing.
> >>>>>>>>>>
> >>>>>>>>>> Job: job_201304121621_0001
> >>>>>>>>>> file:
> >>
> file:/tmp/hadoop-guodong/mapred/staging/guodong/.staging/job_201304121621_0001/job.xml
> >>>>>>>>>> tracking URL:
> >>>> http://localhost:50030/jobdetails.jsp?jobid=job_201304121621_0001
> >>>>>>>>>> map() completion: 0.0
> >>>>>>>>>> reduce() completion: 0.0
> >>>>>>>>>> Counters: 0
> >>>>>>>>>>
> >>>>>>>>>> Then try to kill the job
> >>>>>>>>>> guodong@guodong-Vostro-3400
> >>>>>>>>>> :~/workspace/mesos-trunk/build/hadoop/hadoop-0.20.2-cdh3u3
> >>>>>>>>>> [trunk ?]$ ./bin/hadoop job -kill job_201304121621_000113/04/12
> >>>>>>>> 16:27:16
> >>>>>>>>>> INFO security.UserGroupInformation: JAAS Configuration already
> >>>> set
> >>>>>>> up
> >>>>>>>> for
> >>>>>>>>>> Hadoop, not re-installing.
> >>>>>>>>>> Killed job job_201304121621_0001
> >>>>>>>>>>
> >>>>>>>>>> when I kill the job, I can see the log on jobtracker
> >>>>>>>>>> *13/04/12 16:27:13 INFO mapred.MesosScheduler: Unable to fully
> >>>>>>> satisfy
> >>>>>>>>>> needed map/reduce slots: 3 map slots 1 reduce slots remaining*
> >>>>>>>>>> *13/04/12 16:27:16 INFO mapred.JobTracker: Killing job
> >>>>>>>>>> job_201304121621_0001
> >>>>>>>>>> *
> >>
>

Re: Question about running hadoop over mesos

Reply via email to