Re: Question about running hadoop over mesos

王国栋 Mon, 15 Apr 2013 18:15:58 -0700

step 8: cd hadoop-0.20.2-cdh3u3/ && ./bin/hadoop jobtracker
step 9: ./bin/hadoop jar hadoop-examples-0.20.2-cdh3u3.jar wordcount
/tmp/data.txt out
step10: ./bin/hadoop job -kill yourjobid


Then , you can use "./bin/hadoop job -list all " to check all the job
status. And you can see that the job you are killing is still alive.

Guodong


On Tue, Apr 16, 2013 at 1:02 AM, Benjamin Mahler
<[email protected]>wrote:

> Thanks! It would be helpful to me if you could provide the exact commands
> you're running for steps 8, 9 and 10 as well. It would save me some time as
> my knowledge of Hadoop commands is limited.
>
>
> On Sun, Apr 14, 2013 at 9:53 PM, 王国栋 <[email protected]> wrote:
>
> > Hi ben,
> >
> > I've updated to the latest code in trunk. And the problem is still here.
> >
> > Please follow these steps to reproduce it.
> > 1. check out the trunk code
> > 2. bootstrap
> > 3. mkdir build for out source build
> > 4. cd build && ../configure
> > 5. make
> > 6. cd hadoop && make hadoop-0.20.2-cdh3u3
> > 7. modify conf/mapred-site.xml, set mapred.mesos.slot.mem=10240(make sure
> > the resource is not enough)
> > 8. start mesos and the jobtracker
> > 9. submit a wordcount job to jobtracker.(at this time, the job is pending
> > due to not enough resource)
> > 10. kill the job (can not kill the job)
> >
> >
> > Guodong
> >
> >
> > On Mon, Apr 15, 2013 at 10:52 AM, 王国栋 <[email protected]> wrote:
> >
> > > I am building the latest code in the trunk. I will keep you updated, if
> > > the problem is still here, I will give you the exact reproduce process
> > and
> > > make sure you can debug it.
> > >
> > > I also think this may be a bug in jobtracker. :)
> > >
> > > Thanks.
> > >
> > > Guodong
> > >
> > >
> > > On Mon, Apr 15, 2013 at 10:20 AM, Benjamin Mahler <
> > > [email protected]> wrote:
> > >
> > >> On April 9th we submitted a deadlock fix, please update to make sure
> you
> > >> have the fix:
> > >> https://reviews.apache.org/r/10352/
> > >>
> > >> Can you provide the commands to reproduce from a mesos build
> directory?
> > >> I'd
> > >> like to be able to reproduce this locally on my laptop, so the exact
> > >> commands I need to run from within my mesos build directory would be
> > >> useful
> > >> :)
> > >>
> > >> The fact that the job gets stuck in the JobTracker as PREP makes me
> > think
> > >> there's a bug in the JobTracker, I imagine people don't often run
> > >> JobTrackers without any TaskTrackers, which is not the case here.
> > >>
> > >>
> > >> On Sun, Apr 14, 2013 at 7:04 PM, 王国栋 <[email protected]> wrote:
> > >>
> > >> > Hi Ben,
> > >> >
> > >> > I put my ideas inline. Please check.
> > >> >
> > >> > On Mon, Apr 15, 2013 at 8:13 AM, Benjamin Mahler
> > >> > <[email protected]>wrote:
> > >> >
> > >> > > So I'm still a little confused here. From what you showed, it
> looks
> > >> like
> > >> > > the 'job -kill' command you posted succeeded on the client side?
> > >> > >
> > >> > > [trunk ?]$ ./bin/hadoop job -kill job_201304121621_000113/04/12
> > >> 16:27:16
> > >> > > INFO security.UserGroupInformation: JAAS Configuration already set
> > up
> > >> for
> > >> > > Hadoop, not re-installing.
> > >> > > *Killed job job_201304121621_0001**
> > >> > > *
> > >> > >
> > >> > Yes, from the client side, killing the job is successful.
> > >> >
> > >> >
> > >> > >
> > >> > > I see that the JobTracker still thinks the job is pending. What
> > >> happens
> > >> > > when you re-issue that kill command?
> > >> > >
> > >> > The Jobtracker still think the status of job is *PREP*, when I
> reissue
> > >> kill
> > >> > cmd, it seems the same as the first time I issued the cmd.
> > >> >
> > >> >
> > >> > >
> > >> > > I'm confused as to why it's still in pending when the JobTracker
> has
> > >> > > removed the job:
> > >> > >
> > >> > > 13/04/12 16:27:16 INFO mapred.JobTracker: Killing job
> > >> > job_201304121621_0001
> > >> > > 13/04/12 16:27:16 INFO mapred.JobInProgress: Killing job
> > >> > > 'job_201304121621_0001'*
> > >> > >
> > >> > > Looking at the JobTracker code, it seems like this indeed kills
> the
> > >> job:
> > >> > >
> > >> > > The code that prints the log line in JobTracker:
> > >> > >   private synchronized void killJob(JobInProgress job) {
> > >> > >     LOG.info("Killing job " + job.getJobID());
> > >> > >     JobStatus prevStatus = (JobStatus)job.getStatus().clone();
> > >> > >     job.kill();
> > >> > >
> > >> > >     // Inform the listeners if the job is killed
> > >> > >     // Note :
> > >> > >     //   If the job is killed in the PREP state then the listeners
> > >> will
> > >> > be
> > >> > >     //   invoked
> > >> > >     //   If the job is killed in the RUNNING state then cleanup
> > tasks
> > >> > will
> > >> > > be
> > >> > >     //   launched and the updateTaskStatuses() will take care of
> it
> > >> > >     JobStatus newStatus = (JobStatus)job.getStatus().clone();
> > >> > >     if (prevStatus.getRunState() != newStatus.getRunState()
> > >> > >         && newStatus.getRunState() == JobStatus.KILLED) {
> > >> > >       JobStatusChangeEvent event =
> > >> > >         new JobStatusChangeEvent(job, EventType.RUN_STATE_CHANGED,
> > >> > > prevStatus,
> > >> > >             newStatus);
> > >> > >       updateJobInProgressListeners(event);
> > >> > >     }
> > >> > >   }
> > >> > >
> > >> > > Then JobInProgress.kill():
> > >> > >   /**
> > >> > >    * Kill the job and all its component tasks. This method should
> be
> > >> > called
> > >> > > from
> > >> > >    * jobtracker and should return fast as it locks the jobtracker.
> > >> > >    */
> > >> > >   public void kill() {
> > >> > >     boolean killNow = false;
> > >> > >     synchronized(jobInitKillStatus) {
> > >> > >       jobInitKillStatus.killed = true;
> > >> > >       //if not in middle of init, terminate it now
> > >> > >       if(!jobInitKillStatus.initStarted ||
> > >> jobInitKillStatus.initDone) {
> > >> > >         //avoiding nested locking by setting flag
> > >> > >         killNow = true;
> > >> > >       }
> > >> > >     }
> > >> > >     if(killNow) {
> > >> > >       terminate(JobStatus.KILLED);
> > >> > >     }
> > >> > >   }
> > >> > >
> > >> > > I don't really see the issue at this point, so any further
> > >> information /
> > >> > > debugging on your end that reveals this bug would be very valuable
> > to
> > >> us.
> > >> > >
> > >> > I am trying to debug this on my laptop. I found when I issue kill
> cmd
> > to
> > >> > jobtracker, the job status is never changed. It is always *PREP*.
> > >> > Can you reproduce this on your machine when you follow the steps I
> > >> > mentioned in the previous mail?
> > >> >
> > >> >
> > >> > >
> > >> > > Lastly, what version of the code are you running? If you're
> running
> > >> off
> > >> > > trunk, when did you last update it?
> > >> > >
> > >> >
> > >> > I am running with the code in the trunk, it is updated last week.
> > >> >
> > >> > >
> > >> > > Ben
> > >> > >
> > >> > >
> > >> > > On Fri, Apr 12, 2013 at 1:32 AM, 王国栋 <[email protected]> wrote:
> > >> > >
> > >> > > > Hi Vinod,
> > >> > > >
> > >> > > > When I submit the job, the log of jobtracker is as follow
> > >> > > > ----------------------------------
> > >> > > > 13/04/12 16:22:37 INFO mapred.MesosScheduler: Added job
> > >> > > > job_201304121621_0001
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobTracker: Job
> > job_201304121621_0001
> > >> > added
> > >> > > > successfully for user 'guodong' to queue 'default'
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobTracker: Initializing
> > >> > > > job_201304121621_0001
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress: Initializing
> > >> > > > job_201304121621_0001
> > >> > > > 13/04/12 16:22:37 INFO mapred.AuditLogger: USER=guodong
> > IP=127.0.0.1
> > >> > > > OPERATION=SUBMIT_JOB TARGET=job_201304121621_0001 RESULT=SUCCESS
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress: jobToken generated
> > and
> > >> > > stored
> > >> > > > with users keys in
> > >> > > > /tmp/hadoop-guodong/mapred/system/job_201304121621_0001/jobToken
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress: Input size for job
> > >> > > > job_201304121621_0001 = 89502988. Number of splits = 3
> > >> > > > 13/04/12 16:22:37 INFO net.NetworkTopology: Adding a new node:
> > >> > > > /default-rack/localhost
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress:
> > >> > > > tip:task_201304121621_0001_m_000000 has split on
> > >> > > > node:/default-rack/localhost
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress:
> > >> > > > tip:task_201304121621_0001_m_000001 has split on
> > >> > > > node:/default-rack/localhost
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress:
> > >> > > > tip:task_201304121621_0001_m_000002 has split on
> > >> > > > node:/default-rack/localhost
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress:
> job_201304121621_0001
> > >> > > > LOCALITY_WAIT_FACTOR=1.0
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobInProgress: Job
> > >> job_201304121621_0001
> > >> > > > initialized successfully with 3 map tasks and 1 reduce tasks.
> > >> > > > 13/04/12 16:22:39 INFO mapred.MesosScheduler: JobTracker Status
> > >> > > >       Pending Map Tasks: 3
> > >> > > >    Pending Reduce Tasks: 1
> > >> > > >          Idle Map Slots: 0
> > >> > > >       Idle Reduce Slots: 0
> > >> > > >      Inactive Map Slots: 0 (launched but no hearbeat yet)
> > >> > > >   Inactive Reduce Slots: 0 (launched but no hearbeat yet)
> > >> > > >        Needed Map Slots: 3
> > >> > > >     Needed Reduce Slots: 1
> > >> > > > 13/04/12 16:22:39 INFO mapred.MesosScheduler: Declining offer
> with
> > >> > > > insufficient resources for a TaskTracker:
> > >> > > >   cpus: offered 4.0 needed 1.800000011920929
> > >> > > >   mem : offered 2731.0 needed 6400.0
> > >> > > >   disk: offered 75120.0 needed 4096.0
> > >> > > >   ports:  at least 2 (sufficient)
> > >> > > > [name: "cpus"
> > >> > > > type: SCALAR
> > >> > > > scalar {
> > >> > > >   value: 4.0
> > >> > > > }
> > >> > > > , name: "mem"
> > >> > > > type: SCALAR
> > >> > > > scalar {
> > >> > > >   value: 2731.0
> > >> > > > }
> > >> > > > , name: "ports"
> > >> > > > type: RANGES
> > >> > > > ranges {
> > >> > > >   range {
> > >> > > >     begin: 31000
> > >> > > >     end: 32000
> > >> > > >   }
> > >> > > > }
> > >> > > > , name: "disk"
> > >> > > > type: SCALAR
> > >> > > > scalar {
> > >> > > >   value: 75120.0
> > >> > > > }
> > >> > > > ]
> > >> > > > 13/04/12 16:22:39 INFO mapred.MesosScheduler: Unable to fully
> > >> satisfy
> > >> > > > needed map/reduce slots: 3 map slots 1 reduce slots remaining
> > >> > > > 13/04/12 16:22:45 INFO mapred.MesosScheduler: JobTracker Status
> > >> > > >       Pending Map Tasks: 3
> > >> > > >    Pending Reduce Tasks: 1
> > >> > > >          Idle Map Slots: 0
> > >> > > >       Idle Reduce Slots: 0
> > >> > > >      Inactive Map Slots: 0 (launched but no hearbeat yet)
> > >> > > >   Inactive Reduce Slots: 0 (launched but no hearbeat yet)
> > >> > > >        Needed Map Slots: 3
> > >> > > >     Needed Reduce Slots: 1
> > >> > > > 13/04/12 16:22:45 INFO mapred.MesosScheduler: Declining offer
> with
> > >> > > > insufficient resources for a TaskTracker:
> > >> > > >   cpus: offered 4.0 needed 1.800000011920929
> > >> > > >   mem : offered 2731.0 needed 6400.0
> > >> > > >   disk: offered 75120.0 needed 4096.0
> > >> > > >   ports:  at least 2 (sufficient)
> > >> > > > [name: "cpus"
> > >> > > > type: SCALAR
> > >> > > > scalar {
> > >> > > >   value: 4.0
> > >> > > > }
> > >> > > > , name: "mem"
> > >> > > > type: SCALAR
> > >> > > > scalar {
> > >> > > >   value: 2731.0
> > >> > > > }
> > >> > > > , name: "ports"
> > >> > > > type: RANGES
> > >> > > > ranges {
> > >> > > >   range {
> > >> > > >     begin: 31000
> > >> > > >     end: 32000
> > >> > > >   }
> > >> > > > }
> > >> > > > , name: "disk"
> > >> > > > type: SCALAR
> > >> > > > scalar {
> > >> > > >   value: 75120.0
> > >> > > > }
> > >> > > > ]
> > >> > > > 13/04/12 16:22:45 INFO mapred.MesosScheduler: Unable to fully
> > >> satisfy
> > >> > > > needed map/reduce slots: 3 map slots 1 reduce slots remaining
> > >> > > >
> > >> > > > ----------------------------------
> > >> > > >
> > >> > > > the hadoop client log is
> > >> > > > --------------------------------------------
> > >> > > > 13/04/12 16:22:36 INFO security.UserGroupInformation: JAAS
> > >> > Configuration
> > >> > > > already set up for Hadoop, not re-installing.
> > >> > > > 13/04/12 16:22:36 INFO util.NativeCodeLoader: Loaded the
> > >> native-hadoop
> > >> > > > library
> > >> > > > 13/04/12 16:22:37 INFO input.FileInputFormat: Total input paths
> to
> > >> > > process
> > >> > > > : 1
> > >> > > > 13/04/12 16:22:37 WARN snappy.LoadSnappy: Snappy native library
> is
> > >> > > > available
> > >> > > > 13/04/12 16:22:37 INFO snappy.LoadSnappy: Snappy native library
> > >> loaded
> > >> > > > 13/04/12 16:22:37 INFO mapred.JobClient: Running job:
> > >> > > job_201304121621_0001
> > >> > > > 13/04/12 16:22:38 INFO mapred.JobClient:  map 0% reduce 0%
> > >> > > >
> > >> > > > --------------------------------------------
> > >> > > >
> > >> > > > Since the client is hung up,  I use ctrl-c to stop the client.
> > Then
> > >> use
> > >> > > job
> > >> > > > -status to check the job status.
> > >> > > >
> > >> > > > guodong@guodong-Vostro-3400
> > >> > > > :~/workspace/mesos-trunk/build/hadoop/hadoop-0.20.2-cdh3u3
> > >> > > > [trunk ?]$ ./bin/hadoop job -status job_201304121621_0001
> > >> > > > 13/04/12 16:26:22 INFO security.UserGroupInformation: JAAS
> > >> > Configuration
> > >> > > > already set up for Hadoop, not re-installing.
> > >> > > >
> > >> > > > Job: job_201304121621_0001
> > >> > > > file:
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> file:/tmp/hadoop-guodong/mapred/staging/guodong/.staging/job_201304121621_0001/job.xml
> > >> > > > tracking URL:
> > >> > > >
> http://localhost:50030/jobdetails.jsp?jobid=job_201304121621_0001
> > >> > > > map() completion: 0.0
> > >> > > > reduce() completion: 0.0
> > >> > > > Counters: 0
> > >> > > >
> > >> > > > Then try to kill the job
> > >> > > > guodong@guodong-Vostro-3400
> > >> > > > :~/workspace/mesos-trunk/build/hadoop/hadoop-0.20.2-cdh3u3
> > >> > > > [trunk ?]$ ./bin/hadoop job -kill job_201304121621_000113/04/12
> > >> > 16:27:16
> > >> > > > INFO security.UserGroupInformation: JAAS Configuration already
> set
> > >> up
> > >> > for
> > >> > > > Hadoop, not re-installing.
> > >> > > > Killed job job_201304121621_0001
> > >> > > >
> > >> > > > when I kill the job, I can see the log on jobtracker
> > >> > > > *13/04/12 16:27:13 INFO mapred.MesosScheduler: Unable to fully
> > >> satisfy
> > >> > > > needed map/reduce slots: 3 map slots 1 reduce slots remaining*
> > >> > > > *13/04/12 16:27:16 INFO mapred.JobTracker: Killing job
> > >> > > > job_201304121621_0001
> > >> > > > *
> > >> > > > *13/04/12 16:27:16 INFO mapred.JobInProgress: Killing job
> > >> > > > 'job_201304121621_0001'*
> > >> > > >
> > >> > > > After I kill the job, I can use job-status to check the job
> > status,
> > >> and
> > >> > > it
> > >> > > > is still pending.
> > >> > > >
> > >> > > > guodong@guodong-Vostro-3400
> > >> > > > :~/workspace/mesos-trunk/build/hadoop/hadoop-0.20.2-cdh3u3
> > >> > > > [trunk ?]$ ./bin/hadoop job -status job_201304121621_0001
> > >> > > > 13/04/12 16:31:09 INFO security.UserGroupInformation: JAAS
> > >> > Configuration
> > >> > > > already set up for Hadoop, not re-installing.
> > >> > > >
> > >> > > > Job: job_201304121621_0001
> > >> > > > file:
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> file:/tmp/hadoop-guodong/mapred/staging/guodong/.staging/job_201304121621_0001/job.xml
> > >> > > > tracking URL:
> > >> > > >
> http://localhost:50030/jobdetails.jsp?jobid=job_201304121621_0001
> > >> > > > map() completion: 0.0
> > >> > > > reduce() completion: 0.0
> > >> > > > Counters: 0
> > >> > > >
> > >> > > >
> > >> > > > I hope this is helpful !
> > >> > > >
> > >> > > > Looking forward to your new progress. And I will also try to go
> > >> through
> > >> > > > hadoop code to check it again.
> > >> > > >
> > >> > > > Thanks.
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > Guodong
> > >> > > >
> > >> > > >
> > >> > > > On Fri, Apr 12, 2013 at 3:42 PM, Vinod Kone <
> [email protected]>
> > >> > wrote:
> > >> > > >
> > >> > > > > Hi Guodong,
> > >> > > > >
> > >> > > > > This is likely a bug on our side.
> > >> > > > >
> > >> > > > > Could you paste the output of the client when you issue the
> > kill?
> > >> > Also,
> > >> > > > the
> > >> > > > > output of the jobtracker (during the submission and kill of
> the
> > >> job)
> > >> > > > would
> > >> > > > > also be helpful.
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > -- Vinod
> > >> > > > >
> > >> > > > >
> > >> > > > > On Thu, Apr 11, 2013 at 10:43 PM, 王国栋 <[email protected]>
> > wrote:
> > >> > > > >
> > >> > > > > > Hi Ben,
> > >> > > > > >
> > >> > > > > > I am sorry for my mistake about point 2. The jobtracker
> jetty
> > >> > server
> > >> > > > > works
> > >> > > > > > fine. Yesterday, the execution time for my  test job is too
> > >> short,
> > >> > so
> > >> > > > it
> > >> > > > > is
> > >> > > > > > finished before the jetty server can show the job status  in
> > >> > running
> > >> > > > > list.
> > >> > > > > > Today, I try some big job, and the status is perfectly
> right.
> > >> > > > > >
> > >> > > > > > More information about point 1. I can reproduce this by the
> > >> > following
> > >> > > > > step.
> > >> > > > > > 1. set the needed resource for each slot(memory, cpu) in
> > >> > > > mapred-site.xml.
> > >> > > > > > Make sure each slot need a lot of resource.
> > >> > > > > > 2. Given only one mesos slave whose resource is not enough
> for
> > >> one
> > >> > > > mapper
> > >> > > > > > slot and one reduce slot.
> > >> > > > > > 3. run the jobtracker, submit a job which need 1 mapper and
> 1
> > >> > > reducer.
> > >> > > > > >
> > >> > > > > > Then , I can find out that the job is pending due to not
> > enough
> > >> > > > > resource. I
> > >> > > > > > can use ctrl-c to stop the hadoop client. But the job is
> still
> > >> > > pending.
> > >> > > > > >
> > >> > > > > > I try to kill the job with "hadoop job -kill ", but I can
> not
> > >> kill
> > >> > > the
> > >> > > > > > pending job.
> > >> > > > > > Before kill the job, I can check the job status is
> > >> > > > > > *Job Setup:* <
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> http://localhost:50030/jobtasks.jsp?jobid=job_201304121059_0001&type=setup&pagenum=1&state=killed
> > >> > > > > > >
> > >> > > > > > Pending
> > >> > > > > > *Job Cleanup:* Pending*
> > >> > > > > > *
> > >> > > > > > *
> > >> > > > > > *
> > >> > > > > > After I try to kill the job, the job status is
> > >> > > > > > *Job Setup: *failed
> > >> > > > > > *Job Cleanup:* Pending
> > >> > > > > >
> > >> > > > > > Then the job is hang up. And I can never stop it.
> > >> > > > > >
> > >> > > > > > Is it possible that mesos-scheduler miss some killed job
> > event ?
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > Guodong
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Fri, Apr 12, 2013 at 2:59 AM, Benjamin Mahler
> > >> > > > > > <[email protected]>wrote:
> > >> > > > > >
> > >> > > > > > > 'Pending Map Tasks': The nubmer of pending map tasks in
> the
> > >> > Hadoop
> > >> > > > > > > JobTracker.
> > >> > > > > > > 'Pending Reduce Tasks': The nubmer of pending reduce tasks
> > in
> > >> the
> > >> > > > > Hadoop
> > >> > > > > > > JobTracker.
> > >> > > > > > >
> > >> > > > > > > Did you successfully kill the job? If so, did you allow
> some
> > >> time
> > >> > > for
> > >> > > > > the
> > >> > > > > > > JobTracker to detect that the job was killed.
> > >> > > > > > >
> > >> > > > > > > Our scheduler (MesosScheduler.java) simply introspects on
> > the
> > >> > > > > JobTracker
> > >> > > > > > > state to determine the number of pending map/reduce tasks
> in
> > >> the
> > >> > > > > system.
> > >> > > > > > I
> > >> > > > > > > would expect this to go to 0 sometime after you kill your
> > >> job, if
> > >> > > > > that's
> > >> > > > > > > not the case, I'll need more information to figure out
> > what's
> > >> > going
> > >> > > > on.
> > >> > > > > > >
> > >> > > > > > > Can you elaborate more on your point 2? It almost sounds
> > like
> > >> > > you're
> > >> > > > > > > talking to a different JobTracker? Note that the
> > >> MesosScheduler
> > >> > > runs
> > >> > > > > > > *inside* the JobTracker.
> > >> > > > > > >
> > >> > > > > > > Also, note that we recently committed a deadlock fix for
> the
> > >> > Hadoop
> > >> > > > > > patch:
> > >> > > > > > > https://reviews.apache.org/r/10352/
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > On Thu, Apr 11, 2013 at 2:31 AM, 王国栋 <[email protected]>
> > >> wrote:
> > >> > > > > > >
> > >> > > > > > > > Hi,
> > >> > > > > > > >
> > >> > > > > > > > I am trying to run hadoop over mesos. And I am using the
> > >> code
> > >> > in
> > >> > > > the
> > >> > > > > > > trunk.
> > >> > > > > > > > But I ran into some problems here. My hadoop version is
> > >> cdh3u3.
> > >> > > > > > > >
> > >> > > > > > > > *1. when a job is pending because of no enough
> resources,
> > I
> > >> use
> > >> > > > > ctrl-c
> > >> > > > > > to
> > >> > > > > > > > stop the job client. But I can see the pending mapper
> and
> > >> > pending
> > >> > > > > > reducer
> > >> > > > > > > > are still in the job tracker. Then I try to use "hadoop
> > job
> > >> > -kill
> > >> > > > > > jobid"
> > >> > > > > > > to
> > >> > > > > > > > kill this job, but nothing happens in jobtracker, mapper
> > and
> > >> > > > reducer
> > >> > > > > > are
> > >> > > > > > > > still pending. The log in jobtracker is as follow.*
> > >> > > > > > > >
> > >> > > > > > > > 13/04/11 17:21:07 INFO mapred.MesosScheduler: JobTracker
> > >> Status
> > >> > > > > > > >       Pending Map Tasks: 1
> > >> > > > > > > >    Pending Reduce Tasks: 1
> > >> > > > > > > >          Idle Map Slots: 0
> > >> > > > > > > >       Idle Reduce Slots: 0
> > >> > > > > > > >      Inactive Map Slots: 0 (launched but no hearbeat
> yet)
> > >> > > > > > > >   Inactive Reduce Slots: 0 (launched but no hearbeat
> yet)
> > >> > > > > > > >        Needed Map Slots: 1
> > >> > > > > > > >     Needed Reduce Slots: 1
> > >> > > > > > > > 13/04/11 17:21:07 INFO mapred.MesosScheduler: Declining
> > >> offer
> > >> > > with
> > >> > > > > > > > insufficient resources for a TaskTracker:
> > >> > > > > > > >   cpus: offered 4.0 needed 1.800000011920929
> > >> > > > > > > >   mem : offered 2731.0 needed 6432.0
> > >> > > > > > > >   disk: offered 70651.0 needed 4096.0
> > >> > > > > > > >   ports:  at least 2 (sufficient)
> > >> > > > > > > > [name: "cpus"
> > >> > > > > > > > type: SCALAR
> > >> > > > > > > > scalar {
> > >> > > > > > > >   value: 4.0
> > >> > > > > > > > }
> > >> > > > > > > > , name: "mem"
> > >> > > > > > > > type: SCALAR
> > >> > > > > > > > scalar {
> > >> > > > > > > >   value: 2731.0
> > >> > > > > > > > }
> > >> > > > > > > > , name: "ports"
> > >> > > > > > > > type: RANGES
> > >> > > > > > > > ranges {
> > >> > > > > > > >   range {
> > >> > > > > > > >     begin: 31000
> > >> > > > > > > >     end: 32000
> > >> > > > > > > >   }
> > >> > > > > > > > }
> > >> > > > > > > > , name: "disk"
> > >> > > > > > > > type: SCALAR
> > >> > > > > > > > scalar {
> > >> > > > > > > >   value: 70651.0
> > >> > > > > > > > }
> > >> > > > > > > > ]
> > >> > > > > > > > 13/04/11 17:21:07 INFO mapred.MesosScheduler: Unable to
> > >> fully
> > >> > > > satisfy
> > >> > > > > > > > needed map/reduce slots: 1 map slots 1 reduce slots
> > >> remaining
> > >> > > > > > > >
> > >> > > > > > > > *2. when we submit the job to the jobtracker, I can not
> > find
> > >> > any
> > >> > > > > > running
> > >> > > > > > > > job on jobtracker web interface(
> > >> > > > > http://localhost:50030/jobtracker.jsp
> > >> > > > > > ).
> > >> > > > > > > > But
> > >> > > > > > > > when the job is finished, I can see the job info in
> > retired
> > >> job
> > >> > > in
> > >> > > > > > > > jobtracker.*
> > >> > > > > > > >
> > >> > > > > > > > Any ideas about this ? Thanks a lot.
> > >> > > > > > > >
> > >> > > > > > > > Guodong
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Question about running hadoop over mesos

Reply via email to