Re: Question about running hadoop over mesos

王国栋 Wed, 17 Apr 2013 02:31:41 -0700

Hi Ben,

I have read some of the code about job tracker and task tracker these 2
days.


I don't think this scenario is a bug. Neither a bug in mesos scheduler nor
a bug in jobtracker. The following is my opinion.

We kill the job when the job status is PREP( which means setup task is not
launched yet). When we kill the job, jobtracker just mark 2 boolean values
in JobInProgress as true, which means this job is killed and failed. But
killing a job will not trigger any job status change until the job cleanup
task is scheduled to a tasktracker.

 The job status stays in PREP just because mesos can not launch a
tasktracker due to not enough resource. So it is a little bit embarrassing.
When there is enough resource for mesos to launch a tasktracker(say, there
is some other mesos-slave who can offer more enough to start a task
tracker), mesos will launch a task tracker and the job cleanup task will be
scheduled to this task tracker. And at last, after the clean up task is
completed, the job status will be updated as *KILLED* and the pending
mapper slot and pending reduce slot will be released too.

If anything is wrong, please correct me. :)


Guodong


On Tue, Apr 16, 2013 at 1:02 AM, Benjamin Mahler
<[email protected]>wrote:

> Thanks! It would be helpful to me if you could provide the exact commands
> you're running for steps 8, 9 and 10 as well. It would save me some time as
> my knowledge of Hadoop commands is limited.
>
>
> On Sun, Apr 14, 2013 at 9:53 PM, 王国栋 <[email protected]> wrote:
>
> > Hi ben,
> >
> > I've updated to the latest code in trunk. And the problem is still here.
> >
> > Please follow these steps to reproduce it.
> > 1. check out the trunk code
> > 2. bootstrap
> > 3. mkdir build for out source build
> > 4. cd build && ../configure
> > 5. make
> > 6. cd hadoop && make hadoop-0.20.2-cdh3u3
> > 7. modify conf/mapred-site.xml, set mapred.mesos.slot.mem=10240(make sure
> > the resource is not enough)
> > 8. start mesos and the jobtracker
> > 9. submit a wordcount job to jobtracker.(at this time, the job is pending
> > due to not enough resource)
> > 10. kill the job (can not kill the job)
> >
> >
> > Guodong
> >
> >
> > On Mon, Apr 15, 2013 at 10:52 AM, 王国栋 <[email protected]> wrote:
> >
> > > I am building the latest code in the trunk. I will keep you updated, if
> > > the problem is still here, I will give you the exact reproduce process
> > and
> > > make sure you can debug it.
> > >
> > > I also think this may be a bug in jobtracker. :)
> > >
> > > Thanks.
> > >
> > > Guodong
> > >
> > >
> > > On Mon, Apr 15, 2013 at 10:20 AM, Benjamin Mahler <
> > > [email protected]> wrote:
> > >
> > >> On April 9th we submitted a deadlock fix, please update to make sure
> you
> > >> have the fix:
> > >> https://reviews.apache.org/r/10352/
> > >>
> > >> Can you provide the commands to reproduce from a mesos build
> directory?
> > >> I'd
> > >> like to be able to reproduce this locally on my laptop, so the exact
> > >> commands I need to run from within my mesos build directory would be
> > >> useful
> > >> :)
> > >>
> > >> The fact that the job gets stuck in the JobTracker as PREP makes me
> > think
> > >> there's a bug in the JobTracker, I imagine people don't often run
> > >> JobTrackers without any TaskTrackers, which is not the case here.
> > >>
> > >>
> > >> On Sun, Apr 14, 2013 at 7:04 PM, 王国栋 <[email protected]> wrote:
> > >>
> > >> > Hi Ben,
> > >> >
> > >> > I put my ideas inline. Please check.
> > >> >
> > >> > On Mon, Apr 15, 2013 at 8:13 AM, Benjamin Mahler
> > >> > <[email protected]>wrote:
> > >> >
> > >> > > So I'm still a little confused here. From what you showed, it
> looks
> > >> like
> > >> > > the 'job -kill' command you posted succeeded on the client side?
> > >> > >
> > >> > > [trunk ?]$ ./bin/hadoop job -kill job_201304121621_000113/04/12
> > >> 16:27:16
> > >> > > INFO security.UserGroupInformation: JAAS Configuration already set
> > up
> > >> for
> > >> > > Hadoop, not re-installing.
> > >> > > *Killed job job_201304121621_0001**
> > >> > > *
> > >> > >
> > >> > Yes, from the client side, killing the job is successful.
> > >> >
> > >> >
> > >> > >
> > >> > > I see that the JobTracker still thinks the job is pending. What
> > >> happens
> > >> > > when you re-issue that kill command?
> > >> > >
> > >> > The Jobtracker still think the status of job is *PREP*, when I
> reissue
> > >> kill
> > >> > cmd, it seems the same as the first time I issued the cmd.
> > >> >
> > >> >
> > >> > >
> > >> > > I'm confused as to why it's still in pending when the JobTracker
> has
> > >> > > removed the job:
> > >> > >
> > >> > > 13/04/12 16:27:16 INFO mapred.JobTracker: Killing job
> > >> > job_201304121621_0001
> > >> > > 13/04/12 16:27:16 INFO mapred.JobInProgress: Killing job
> > >> > > 'job_201304121621_0001'*
> > >> > >
> > >> > > Looking at the JobTracker code, it seems like this indeed kills
> the
> > >> job:
> > >> > >
> > >> > > The code that prints the log line in JobTracker:
> > >> > >   private synchronized void killJob(JobInProgress job) {
> > >> > >     LOG.info("Killing job " + job.getJobID());
> > >> > >     JobStatus prevStatus = (JobStatus)job.getStatus().clone();
> > >> > >     job.kill();
> > >> > >
> > >> > >     // Inform the listeners if the job is killed
> > >> > >     // Note :
> > >> > >     //   If the job is killed in the PREP state then the listeners
> > >> will
> > >> > be
> > >> > >     //   invoked
> > >> > >     //   If the job is killed in the RUNNING state then cleanup
> > tasks
> > >> > will
> > >> > > be
> > >> > >     //   launched and the updateTaskStatuses() will take care of
> it
> > >> > >     JobStatus newStatus = (JobStatus)job.getStatus().clone();
> > >> > >     if (prevStatus.getRunState() != newStatus.getRunState()
> > >> > >         && newStatus.getRunState() == JobStatus.KILLED) {
> > >> > >       JobStatusChangeEvent event =
> > >> > >         new JobStatusChangeEvent(job, EventType.RUN_STATE_CHANGED,
> > >> > > prevStatus,
> > >> > >             newStatus);
> > >> > >       updateJobInProgressListeners(event);
> > >> > >     }
> > >> > >   }
> > >> > >
> > >> > > Then JobInProgress.kill():
> > >> > >   /**
> > >> > >    * Kill the job and all its component tasks. This method should
> be
> > >> > called
> > >> > > from
> > >> > >    * jobtracker and should return fast as it locks the jobtracker.
> > >> > >    */
> > >> > >   public void kill() {
> > >> > >     boolean killNow = false;
> > >> > >     synchronized(jobInitKillStatus) {
> > >> > >       jobInitKillStatus.killed = true;
> > >> > >       //if not in middle of init, terminate it now
> > >> > >       if(!jobInitKillStatus.initStarted ||
> > >> jobInitKillStatus.initDone) {
> > >> > >         //avoiding nested locking by setting flag
> > >> > >         killNow = true;
> > >> > >       }
> > >> > >     }
> > >> > >     if(killNow) {
> > >> > >       terminate(JobStatus.KILLED);
> > >> > >     }
> > >> > >   }
> > >> > >
> > >> > > I don't really see the issue at this point, so any further
> > >> information /
> > >> > > debugging on your end that reveals this bug would be very valuable
> > to
> > >> us.
> > >> > >
> > >> > I am trying to debug this on my laptop. I found when I issue kill
> cmd
> > to
> > >> > jobtracker, the job status is never changed. It is always *PREP*.
> > >> > Can you reproduce this on your machine when you follow the steps I
> > >> > mentioned in the previous mail?
> > >> >
> > >> >
> > >> > >
> > >> > > Lastly, what version of the code are you running? If you're
> running
> > >> off
> > >> > > trunk, when did you last update it?
> > >> > >
> > >> >
> > >> > I am running with the code in the trunk, it is updated last week.
> > >> >
> > >> > >
> > >> > > Ben
> > >> > >
> > >> > >
> > >> > > On Fri, Apr 12, 2013 at 1:32 AM, 王国栋 <[email protected]> wrote:
> > >> > >
> > >> > > > Hi Vinod,
> > >> > > >
> > >> > > > When I submit the job, the log of jobtracker is as follow
> > >> > > > ----------------------------------
> > >> > > > 13/04/12 16:22:37 INFO mapred.MesosScheduler: Added job
> > >> > > > job_201304121621_0001
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobTracker: Job
> > job_201304121621_0001
> > >> > added
> > >> > > > successfully for user 'guodong' to queue 'default'
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobTracker: Initializing
> > >> > > > job_201304121621_0001
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress: Initializing
> > >> > > > job_201304121621_0001
> > >> > > > 13/04/12 16:22:37 INFO mapred.AuditLogger: USER=guodong
> > IP=127.0.0.1
> > >> > > > OPERATION=SUBMIT_JOB TARGET=job_201304121621_0001 RESULT=SUCCESS
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress: jobToken generated
> > and
> > >> > > stored
> > >> > > > with users keys in
> > >> > > > /tmp/hadoop-guodong/mapred/system/job_201304121621_0001/jobToken
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress: Input size for job
> > >> > > > job_201304121621_0001 = 89502988. Number of splits = 3
> > >> > > > 13/04/12 16:22:37 INFO net.NetworkTopology: Adding a new node:
> > >> > > > /default-rack/localhost
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress:
> > >> > > > tip:task_201304121621_0001_m_000000 has split on
> > >> > > > node:/default-rack/localhost
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress:
> > >> > > > tip:task_201304121621_0001_m_000001 has split on
> > >> > > > node:/default-rack/localhost
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress:
> > >> > > > tip:task_201304121621_0001_m_000002 has split on
> > >> > > > node:/default-rack/localhost
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress:
> job_201304121621_0001
> > >> > > > LOCALITY_WAIT_FACTOR=1.0
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress: Job
> > >> job_201304121621_0001
> > >> > > > initialized successfully with 3 map tasks and 1 reduce tasks.
> > >> > > > 13/04/12 16:22:39 INFO mapred.MesosScheduler: JobTracker Status
> > >> > > >       Pending Map Tasks: 3
> > >> > > >    Pending Reduce Tasks: 1
> > >> > > >          Idle Map Slots: 0
> > >> > > >       Idle Reduce Slots: 0
> > >> > > >      Inactive Map Slots: 0 (launched but no hearbeat yet)
> > >> > > >   Inactive Reduce Slots: 0 (launched but no hearbeat yet)
> > >> > > >        Needed Map Slots: 3
> > >> > > >     Needed Reduce Slots: 1
> > >> > > > 13/04/12 16:22:39 INFO mapred.MesosScheduler: Declining offer
> with
> > >> > > > insufficient resources for a TaskTracker:
> > >> > > >   cpus: offered 4.0 needed 1.800000011920929
> > >> > > >   mem : offered 2731.0 needed 6400.0
> > >> > > >   disk: offered 75120.0 needed 4096.0
> > >> > > >   ports:  at least 2 (sufficient)
> > >> > > > [name: "cpus"
> > >> > > > type: SCALAR
> > >> > > > scalar {
> > >> > > >   value: 4.0
> > >> > > > }
> > >> > > > , name: "mem"
> > >> > > > type: SCALAR
> > >> > > > scalar {
> > >> > > >   value: 2731.0
> > >> > > > }
> > >> > > > , name: "ports"
> > >> > > > type: RANGES
> > >> > > > ranges {
> > >> > > >   range {
> > >> > > >     begin: 31000
> > >> > > >     end: 32000
> > >> > > >   }
> > >> > > > }
> > >> > > > , name: "disk"
> > >> > > > type: SCALAR
> > >> > > > scalar {
> > >> > > >   value: 75120.0
> > >> > > > }
> > >> > > > ]
> > >> > > > 13/04/12 16:22:39 INFO mapred.MesosScheduler: Unable to fully
> > >> satisfy
> > >> > > > needed map/reduce slots: 3 map slots 1 reduce slots remaining
> > >> > > > 13/04/12 16:22:45 INFO mapred.MesosScheduler: JobTracker Status
> > >> > > >       Pending Map Tasks: 3
> > >> > > >    Pending Reduce Tasks: 1
> > >> > > >          Idle Map Slots: 0
> > >> > > >       Idle Reduce Slots: 0
> > >> > > >      Inactive Map Slots: 0 (launched but no hearbeat yet)
> > >> > > >   Inactive Reduce Slots: 0 (launched but no hearbeat yet)
> > >> > > >        Needed Map Slots: 3
> > >> > > >     Needed Reduce Slots: 1
> > >> > > > 13/04/12 16:22:45 INFO mapred.MesosScheduler: Declining offer
> with
> > >> > > > insufficient resources for a TaskTracker:
> > >> > > >   cpus: offered 4.0 needed 1.800000011920929
> > >> > > >   mem : offered 2731.0 needed 6400.0
> > >> > > >   disk: offered 75120.0 needed 4096.0
> > >> > > >   ports:  at least 2 (sufficient)
> > >> > > > [name: "cpus"
> > >> > > > type: SCALAR
> > >> > > > scalar {
> > >> > > >   value: 4.0
> > >> > > > }
> > >> > > > , name: "mem"
> > >> > > > type: SCALAR
> > >> > > > scalar {
> > >> > > >   value: 2731.0
> > >> > > > }
> > >> > > > , name: "ports"
> > >> > > > type: RANGES
> > >> > > > ranges {
> > >> > > >   range {
> > >> > > >     begin: 31000
> > >> > > >     end: 32000
> > >> > > >   }
> > >> > > > }
> > >> > > > , name: "disk"
> > >> > > > type: SCALAR
> > >> > > > scalar {
> > >> > > >   value: 75120.0
> > >> > > > }
> > >> > > > ]
> > >> > > > 13/04/12 16:22:45 INFO mapred.MesosScheduler: Unable to fully
> > >> satisfy
> > >> > > > needed map/reduce slots: 3 map slots 1 reduce slots remaining
> > >> > > >
> > >> > > > ----------------------------------
> > >> > > >
> > >> > > > the hadoop client log is
> > >> > > > --------------------------------------------
> > >> > > > 13/04/12 16:22:36 INFO security.UserGroupInformation: JAAS
> > >> > Configuration
> > >> > > > already set up for Hadoop, not re-installing.
> > >> > > > 13/04/12 16:22:36 INFO util.NativeCodeLoader: Loaded the
> > >> native-hadoop
> > >> > > > library
> > >> > > > 13/04/12 16:22:37 INFO input.FileInputFormat: Total input paths
> to
> > >> > > process
> > >> > > > : 1
> > >> > > > 13/04/12 16:22:37 WARN snappy.LoadSnappy: Snappy native library
> is
> > >> > > > available
> > >> > > > 13/04/12 16:22:37 INFO snappy.LoadSnappy: Snappy native library
> > >> loaded
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobClient: Running job:
> > >> > > job_201304121621_0001
> > >> > > > 13/04/12 16:22:38 INFO mapred.JobClient:  map 0% reduce 0%
> > >> > > >
> > >> > > > --------------------------------------------
> > >> > > >
> > >> > > > Since the client is hung up,  I use ctrl-c to stop the client.
> > Then
> > >> use
> > >> > > job
> > >> > > > -status to check the job status.
> > >> > > >
> > >> > > > guodong@guodong-Vostro-3400
> > >> > > > :~/workspace/mesos-trunk/build/hadoop/hadoop-0.20.2-cdh3u3
> > >> > > > [trunk ?]$ ./bin/hadoop job -status job_201304121621_0001
> > >> > > > 13/04/12 16:26:22 INFO security.UserGroupInformation: JAAS
> > >> > Configuration
> > >> > > > already set up for Hadoop, not re-installing.
> > >> > > >
> > >> > > > Job: job_201304121621_0001
> > >> > > > file:
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> file:/tmp/hadoop-guodong/mapred/staging/guodong/.staging/job_201304121621_0001/job.xml
> > >> > > > tracking URL:
> > >> > > >
> http://localhost:50030/jobdetails.jsp?jobid=job_201304121621_0001
> > >> > > > map() completion: 0.0
> > >> > > > reduce() completion: 0.0
> > >> > > > Counters: 0
> > >> > > >
> > >> > > > Then try to kill the job
> > >> > > > guodong@guodong-Vostro-3400
> > >> > > > :~/workspace/mesos-trunk/build/hadoop/hadoop-0.20.2-cdh3u3
> > >> > > > [trunk ?]$ ./bin/hadoop job -kill job_201304121621_000113/04/12
> > >> > 16:27:16
> > >> > > > INFO security.UserGroupInformation: JAAS Configuration already
> set
> > >> up
> > >> > for
> > >> > > > Hadoop, not re-installing.
> > >> > > > Killed job job_201304121621_0001
> > >> > > >
> > >> > > > when I kill the job, I can see the log on jobtracker
> > >> > > > *13/04/12 16:27:13 INFO mapred.MesosScheduler: Unable to fully
> > >> satisfy
> > >> > > > needed map/reduce slots: 3 map slots 1 reduce slots remaining*
> > >> > > > *13/04/12 16:27:16 INFO mapred.JobTracker: Killing job
> > >> > > > job_201304121621_0001
> > >> > > > *
> > >> > > > *13/04/12 16:27:16 INFO mapred.JobInProgress: Killing job
> > >> > > > 'job_201304121621_0001'*
> > >> > > >
> > >> > > > After I kill the job, I can use job-status to check the job
> > status,
> > >> and
> > >> > > it
> > >> > > > is still pending.
> > >> > > >
> > >> > > > guodong@guodong-Vostro-3400
> > >> > > > :~/workspace/mesos-trunk/build/hadoop/hadoop-0.20.2-cdh3u3
> > >> > > > [trunk ?]$ ./bin/hadoop job -status job_201304121621_0001
> > >> > > > 13/04/12 16:31:09 INFO security.UserGroupInformation: JAAS
> > >> > Configuration
> > >> > > > already set up for Hadoop, not re-installing.
> > >> > > >
> > >> > > > Job: job_201304121621_0001
> > >> > > > file:
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> file:/tmp/hadoop-guodong/mapred/staging/guodong/.staging/job_201304121621_0001/job.xml
> > >> > > > tracking URL:
> > >> > > >
> http://localhost:50030/jobdetails.jsp?jobid=job_201304121621_0001
> > >> > > > map() completion: 0.0
> > >> > > > reduce() completion: 0.0
> > >> > > > Counters: 0
> > >> > > >
> > >> > > >
> > >> > > > I hope this is helpful !
> > >> > > >
> > >> > > > Looking forward to your new progress. And I will also try to go
> > >> through
> > >> > > > hadoop code to check it again.
> > >> > > >
> > >> > > > Thanks.
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > Guodong
> > >> > > >
> > >> > > >
> > >> > > > On Fri, Apr 12, 2013 at 3:42 PM, Vinod Kone <
> [email protected]>
> > >> > wrote:
> > >> > > >
> > >> > > > > Hi Guodong,
> > >> > > > >
> > >> > > > > This is likely a bug on our side.
> > >> > > > >
> > >> > > > > Could you paste the output of the client when you issue the
> > kill?
> > >> > Also,
> > >> > > > the
> > >> > > > > output of the jobtracker (during the submission and kill of
> the
> > >> job)
> > >> > > > would
> > >> > > > > also be helpful.
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > -- Vinod
> > >> > > > >
> > >> > > > >
> > >> > > > > On Thu, Apr 11, 2013 at 10:43 PM, 王国栋 <[email protected]>
> > wrote:
> > >> > > > >
> > >> > > > > > Hi Ben,
> > >> > > > > >
> > >> > > > > > I am sorry for my mistake about point 2. The jobtracker
> jetty
> > >> > server
> > >> > > > > works
> > >> > > > > > fine. Yesterday, the execution time for my  test job is too
> > >> short,
> > >> > so
> > >> > > > it
> > >> > > > > is
> > >> > > > > > finished before the jetty server can show the job status  in
> > >> > running
> > >> > > > > list.
> > >> > > > > > Today, I try some big job, and the status is perfectly
> right.
> > >> > > > > >
> > >> > > > > > More information about point 1. I can reproduce this by the
> > >> > following
> > >> > > > > step.
> > >> > > > > > 1. set the needed resource for each slot(memory, cpu) in
> > >> > > > mapred-site.xml.
> > >> > > > > > Make sure each slot need a lot of resource.
> > >> > > > > > 2. Given only one mesos slave whose resource is not enough
> for
> > >> one
> > >> > > > mapper
> > >> > > > > > slot and one reduce slot.
> > >> > > > > > 3. run the jobtracker, submit a job which need 1 mapper and
> 1
> > >> > > reducer.
> > >> > > > > >
> > >> > > > > > Then , I can find out that the job is pending due to not
> > enough
> > >> > > > > resource. I
> > >> > > > > > can use ctrl-c to stop the hadoop client. But the job is
> still
> > >> > > pending.
> > >> > > > > >
> > >> > > > > > I try to kill the job with "hadoop job -kill ", but I can
> not
> > >> kill
> > >> > > the
> > >> > > > > > pending job.
> > >> > > > > > Before kill the job, I can check the job status is
> > >> > > > > > *Job Setup:* <
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> http://localhost:50030/jobtasks.jsp?jobid=job_201304121059_0001&type=setup&pagenum=1&state=killed
> > >> > > > > > >
> > >> > > > > > Pending
> > >> > > > > > *Job Cleanup:* Pending*
> > >> > > > > > *
> > >> > > > > > *
> > >> > > > > > *
> > >> > > > > > After I try to kill the job, the job status is
> > >> > > > > > *Job Setup: *failed
> > >> > > > > > *Job Cleanup:* Pending
> > >> > > > > >
> > >> > > > > > Then the job is hang up. And I can never stop it.
> > >> > > > > >
> > >> > > > > > Is it possible that mesos-scheduler miss some killed job
> > event ?
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > Guodong
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Fri, Apr 12, 2013 at 2:59 AM, Benjamin Mahler
> > >> > > > > > <[email protected]>wrote:
> > >> > > > > >
> > >> > > > > > > 'Pending Map Tasks': The nubmer of pending map tasks in
> the
> > >> > Hadoop
> > >> > > > > > > JobTracker.
> > >> > > > > > > 'Pending Reduce Tasks': The nubmer of pending reduce tasks
> > in
> > >> the
> > >> > > > > Hadoop
> > >> > > > > > > JobTracker.
> > >> > > > > > >
> > >> > > > > > > Did you successfully kill the job? If so, did you allow
> some
> > >> time
> > >> > > for
> > >> > > > > the
> > >> > > > > > > JobTracker to detect that the job was killed.
> > >> > > > > > >
> > >> > > > > > > Our scheduler (MesosScheduler.java) simply introspects on
> > the
> > >> > > > > JobTracker
> > >> > > > > > > state to determine the number of pending map/reduce tasks
> in
> > >> the
> > >> > > > > system.
> > >> > > > > > I
> > >> > > > > > > would expect this to go to 0 sometime after you kill your
> > >> job, if
> > >> > > > > that's
> > >> > > > > > > not the case, I'll need more information to figure out
> > what's
> > >> > going
> > >> > > > on.
> > >> > > > > > >
> > >> > > > > > > Can you elaborate more on your point 2? It almost sounds
> > like
> > >> > > you're
> > >> > > > > > > talking to a different JobTracker? Note that the
> > >> MesosScheduler
> > >> > > runs
> > >> > > > > > > *inside* the JobTracker.
> > >> > > > > > >
> > >> > > > > > > Also, note that we recently committed a deadlock fix for
> the
> > >> > Hadoop
> > >> > > > > > patch:
> > >> > > > > > > https://reviews.apache.org/r/10352/
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > On Thu, Apr 11, 2013 at 2:31 AM, 王国栋 <[email protected]>
> > >> wrote:
> > >> > > > > > >
> > >> > > > > > > > Hi,
> > >> > > > > > > >
> > >> > > > > > > > I am trying to run hadoop over mesos. And I am using the
> > >> code
> > >> > in
> > >> > > > the
> > >> > > > > > > trunk.
> > >> > > > > > > > But I ran into some problems here. My hadoop version is
> > >> cdh3u3.
> > >> > > > > > > >
> > >> > > > > > > > *1. when a job is pending because of no enough
> resources,
> > I
> > >> use
> > >> > > > > ctrl-c
> > >> > > > > > to
> > >> > > > > > > > stop the job client. But I can see the pending mapper
> and
> > >> > pending
> > >> > > > > > reducer
> > >> > > > > > > > are still in the job tracker. Then I try to use "hadoop
> > job
> > >> > -kill
> > >> > > > > > jobid"
> > >> > > > > > > to
> > >> > > > > > > > kill this job, but nothing happens in jobtracker, mapper
> > and
> > >> > > > reducer
> > >> > > > > > are
> > >> > > > > > > > still pending. The log in jobtracker is as follow.*
> > >> > > > > > > >
> > >> > > > > > > > 13/04/11 17:21:07 INFO mapred.MesosScheduler: JobTracker
> > >> Status
> > >> > > > > > > >       Pending Map Tasks: 1
> > >> > > > > > > >    Pending Reduce Tasks: 1
> > >> > > > > > > >          Idle Map Slots: 0
> > >> > > > > > > >       Idle Reduce Slots: 0
> > >> > > > > > > >      Inactive Map Slots: 0 (launched but no hearbeat
> yet)
> > >> > > > > > > >   Inactive Reduce Slots: 0 (launched but no hearbeat
> yet)
> > >> > > > > > > >        Needed Map Slots: 1
> > >> > > > > > > >     Needed Reduce Slots: 1
> > >> > > > > > > > 13/04/11 17:21:07 INFO mapred.MesosScheduler: Declining
> > >> offer
> > >> > > with
> > >> > > > > > > > insufficient resources for a TaskTracker:
> > >> > > > > > > >   cpus: offered 4.0 needed 1.800000011920929
> > >> > > > > > > >   mem : offered 2731.0 needed 6432.0
> > >> > > > > > > >   disk: offered 70651.0 needed 4096.0
> > >> > > > > > > >   ports:  at least 2 (sufficient)
> > >> > > > > > > > [name: "cpus"
> > >> > > > > > > > type: SCALAR
> > >> > > > > > > > scalar {
> > >> > > > > > > >   value: 4.0
> > >> > > > > > > > }
> > >> > > > > > > > , name: "mem"
> > >> > > > > > > > type: SCALAR
> > >> > > > > > > > scalar {
> > >> > > > > > > >   value: 2731.0
> > >> > > > > > > > }
> > >> > > > > > > > , name: "ports"
> > >> > > > > > > > type: RANGES
> > >> > > > > > > > ranges {
> > >> > > > > > > >   range {
> > >> > > > > > > >     begin: 31000
> > >> > > > > > > >     end: 32000
> > >> > > > > > > >   }
> > >> > > > > > > > }
> > >> > > > > > > > , name: "disk"
> > >> > > > > > > > type: SCALAR
> > >> > > > > > > > scalar {
> > >> > > > > > > >   value: 70651.0
> > >> > > > > > > > }
> > >> > > > > > > > ]
> > >> > > > > > > > 13/04/11 17:21:07 INFO mapred.MesosScheduler: Unable to
> > >> fully
> > >> > > > satisfy
> > >> > > > > > > > needed map/reduce slots: 1 map slots 1 reduce slots
> > >> remaining
> > >> > > > > > > >
> > >> > > > > > > > *2. when we submit the job to the jobtracker, I can not
> > find
> > >> > any
> > >> > > > > > running
> > >> > > > > > > > job on jobtracker web interface(
> > >> > > > > http://localhost:50030/jobtracker.jsp
> > >> > > > > > ).
> > >> > > > > > > > But
> > >> > > > > > > > when the job is finished, I can see the job info in
> > retired
> > >> job
> > >> > > in
> > >> > > > > > > > jobtracker.*
> > >> > > > > > > >
> > >> > > > > > > > Any ideas about this ? Thanks a lot.
> > >> > > > > > > >
> > >> > > > > > > > Guodong
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Question about running hadoop over mesos

Reply via email to