Re: Question about running hadoop over mesos

王国栋 Fri, 12 Apr 2013 01:33:48 -0700

Hi Vinod,

When I submit the job, the log of jobtracker is as follow
----------------------------------
13/04/12 16:22:37 INFO mapred.MesosScheduler: Added job
job_201304121621_0001
13/04/12 16:22:37 INFO mapred.JobTracker: Job job_201304121621_0001 added
successfully for user 'guodong' to queue 'default'
13/04/12 16:22:37 INFO mapred.JobTracker: Initializing job_201304121621_0001
13/04/12 16:22:37 INFO mapred.JobInProgress: Initializing
job_201304121621_0001
13/04/12 16:22:37 INFO mapred.AuditLogger: USER=guodong IP=127.0.0.1
OPERATION=SUBMIT_JOB TARGET=job_201304121621_0001 RESULT=SUCCESS
13/04/12 16:22:37 INFO mapred.JobInProgress: jobToken generated and stored
with users keys in
/tmp/hadoop-guodong/mapred/system/job_201304121621_0001/jobToken
13/04/12 16:22:37 INFO mapred.JobInProgress: Input size for job
job_201304121621_0001 = 89502988. Number of splits = 3
13/04/12 16:22:37 INFO net.NetworkTopology: Adding a new node:
/default-rack/localhost
13/04/12 16:22:37 INFO mapred.JobInProgress:
tip:task_201304121621_0001_m_000000 has split on
node:/default-rack/localhost
13/04/12 16:22:37 INFO mapred.JobInProgress:
tip:task_201304121621_0001_m_000001 has split on
node:/default-rack/localhost
13/04/12 16:22:37 INFO mapred.JobInProgress:
tip:task_201304121621_0001_m_000002 has split on
node:/default-rack/localhost
13/04/12 16:22:37 INFO mapred.JobInProgress: job_201304121621_0001
LOCALITY_WAIT_FACTOR=1.0
13/04/12 16:22:37 INFO mapred.JobInProgress: Job job_201304121621_0001
initialized successfully with 3 map tasks and 1 reduce tasks.
13/04/12 16:22:39 INFO mapred.MesosScheduler: JobTracker Status
      Pending Map Tasks: 3
   Pending Reduce Tasks: 1
         Idle Map Slots: 0
      Idle Reduce Slots: 0
     Inactive Map Slots: 0 (launched but no hearbeat yet)
  Inactive Reduce Slots: 0 (launched but no hearbeat yet)
       Needed Map Slots: 3
    Needed Reduce Slots: 1
13/04/12 16:22:39 INFO mapred.MesosScheduler: Declining offer with
insufficient resources for a TaskTracker:
  cpus: offered 4.0 needed 1.800000011920929
  mem : offered 2731.0 needed 6400.0
  disk: offered 75120.0 needed 4096.0
  ports:  at least 2 (sufficient)
[name: "cpus"
type: SCALAR
scalar {
  value: 4.0
}
, name: "mem"
type: SCALAR
scalar {
  value: 2731.0
}
, name: "ports"
type: RANGES
ranges {
  range {
    begin: 31000
    end: 32000
  }
}
, name: "disk"
type: SCALAR
scalar {
  value: 75120.0
}
]
13/04/12 16:22:39 INFO mapred.MesosScheduler: Unable to fully satisfy
needed map/reduce slots: 3 map slots 1 reduce slots remaining
13/04/12 16:22:45 INFO mapred.MesosScheduler: JobTracker Status
      Pending Map Tasks: 3
   Pending Reduce Tasks: 1
         Idle Map Slots: 0
      Idle Reduce Slots: 0
     Inactive Map Slots: 0 (launched but no hearbeat yet)
  Inactive Reduce Slots: 0 (launched but no hearbeat yet)
       Needed Map Slots: 3
    Needed Reduce Slots: 1
13/04/12 16:22:45 INFO mapred.MesosScheduler: Declining offer with
insufficient resources for a TaskTracker:
  cpus: offered 4.0 needed 1.800000011920929
  mem : offered 2731.0 needed 6400.0
  disk: offered 75120.0 needed 4096.0
  ports:  at least 2 (sufficient)
[name: "cpus"
type: SCALAR
scalar {
  value: 4.0
}
, name: "mem"
type: SCALAR
scalar {
  value: 2731.0
}
, name: "ports"
type: RANGES
ranges {
  range {
    begin: 31000
    end: 32000
  }
}
, name: "disk"
type: SCALAR
scalar {
  value: 75120.0
}
]
13/04/12 16:22:45 INFO mapred.MesosScheduler: Unable to fully satisfy
needed map/reduce slots: 3 map slots 1 reduce slots remaining


----------------------------------

the hadoop client log is
--------------------------------------------
13/04/12 16:22:36 INFO security.UserGroupInformation: JAAS Configuration
already set up for Hadoop, not re-installing.
13/04/12 16:22:36 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
13/04/12 16:22:37 INFO input.FileInputFormat: Total input paths to process
: 1
13/04/12 16:22:37 WARN snappy.LoadSnappy: Snappy native library is available
13/04/12 16:22:37 INFO snappy.LoadSnappy: Snappy native library loaded
13/04/12 16:22:37 INFO mapred.JobClient: Running job: job_201304121621_0001
13/04/12 16:22:38 INFO mapred.JobClient:  map 0% reduce 0%

--------------------------------------------

Since the client is hung up,  I use ctrl-c to stop the client. Then use job
-status to check the job status.

guodong@guodong-Vostro-3400:~/workspace/mesos-trunk/build/hadoop/hadoop-0.20.2-cdh3u3
[trunk ?]$ ./bin/hadoop job -status job_201304121621_0001
13/04/12 16:26:22 INFO security.UserGroupInformation: JAAS Configuration
already set up for Hadoop, not re-installing.

Job: job_201304121621_0001
file:
file:/tmp/hadoop-guodong/mapred/staging/guodong/.staging/job_201304121621_0001/job.xml
tracking URL:
http://localhost:50030/jobdetails.jsp?jobid=job_201304121621_0001
map() completion: 0.0
reduce() completion: 0.0
Counters: 0

Then try to kill the job
guodong@guodong-Vostro-3400:~/workspace/mesos-trunk/build/hadoop/hadoop-0.20.2-cdh3u3
[trunk ?]$ ./bin/hadoop job -kill job_201304121621_000113/04/12 16:27:16
INFO security.UserGroupInformation: JAAS Configuration already set up for
Hadoop, not re-installing.
Killed job job_201304121621_0001

when I kill the job, I can see the log on jobtracker
*13/04/12 16:27:13 INFO mapred.MesosScheduler: Unable to fully satisfy
needed map/reduce slots: 3 map slots 1 reduce slots remaining*
*13/04/12 16:27:16 INFO mapred.JobTracker: Killing job job_201304121621_0001
*
*13/04/12 16:27:16 INFO mapred.JobInProgress: Killing job
'job_201304121621_0001'*

After I kill the job, I can use job-status to check the job status, and it
is still pending.

guodong@guodong-Vostro-3400:~/workspace/mesos-trunk/build/hadoop/hadoop-0.20.2-cdh3u3
[trunk ?]$ ./bin/hadoop job -status job_201304121621_0001
13/04/12 16:31:09 INFO security.UserGroupInformation: JAAS Configuration
already set up for Hadoop, not re-installing.

Job: job_201304121621_0001
file:
file:/tmp/hadoop-guodong/mapred/staging/guodong/.staging/job_201304121621_0001/job.xml
tracking URL:
http://localhost:50030/jobdetails.jsp?jobid=job_201304121621_0001
map() completion: 0.0
reduce() completion: 0.0
Counters: 0


I hope this is helpful !

Looking forward to your new progress. And I will also try to go through
hadoop code to check it again.

Thanks.




Guodong


On Fri, Apr 12, 2013 at 3:42 PM, Vinod Kone <[email protected]> wrote:

> Hi Guodong,
>
> This is likely a bug on our side.
>
> Could you paste the output of the client when you issue the kill? Also, the
> output of the jobtracker (during the submission and kill of the job) would
> also be helpful.
>
> Thanks,
>
>
>
> -- Vinod
>
>
> On Thu, Apr 11, 2013 at 10:43 PM, 王国栋 <[email protected]> wrote:
>
> > Hi Ben,
> >
> > I am sorry for my mistake about point 2. The jobtracker jetty server
> works
> > fine. Yesterday, the execution time for my  test job is too short, so it
> is
> > finished before the jetty server can show the job status  in running
> list.
> > Today, I try some big job, and the status is perfectly right.
> >
> > More information about point 1. I can reproduce this by the following
> step.
> > 1. set the needed resource for each slot(memory, cpu) in mapred-site.xml.
> > Make sure each slot need a lot of resource.
> > 2. Given only one mesos slave whose resource is not enough for one mapper
> > slot and one reduce slot.
> > 3. run the jobtracker, submit a job which need 1 mapper and 1 reducer.
> >
> > Then , I can find out that the job is pending due to not enough
> resource. I
> > can use ctrl-c to stop the hadoop client. But the job is still pending.
> >
> > I try to kill the job with "hadoop job -kill ", but I can not kill the
> > pending job.
> > Before kill the job, I can check the job status is
> > *Job Setup:* <
> >
> http://localhost:50030/jobtasks.jsp?jobid=job_201304121059_0001&type=setup&pagenum=1&state=killed
> > >
> > Pending
> > *Job Cleanup:* Pending*
> > *
> > *
> > *
> > After I try to kill the job, the job status is
> > *Job Setup: *failed
> > *Job Cleanup:* Pending
> >
> > Then the job is hang up. And I can never stop it.
> >
> > Is it possible that mesos-scheduler miss some killed job event ?
> >
> >
> >
> >
> > Guodong
> >
> >
> > On Fri, Apr 12, 2013 at 2:59 AM, Benjamin Mahler
> > <[email protected]>wrote:
> >
> > > 'Pending Map Tasks': The nubmer of pending map tasks in the Hadoop
> > > JobTracker.
> > > 'Pending Reduce Tasks': The nubmer of pending reduce tasks in the
> Hadoop
> > > JobTracker.
> > >
> > > Did you successfully kill the job? If so, did you allow some time for
> the
> > > JobTracker to detect that the job was killed.
> > >
> > > Our scheduler (MesosScheduler.java) simply introspects on the
> JobTracker
> > > state to determine the number of pending map/reduce tasks in the
> system.
> > I
> > > would expect this to go to 0 sometime after you kill your job, if
> that's
> > > not the case, I'll need more information to figure out what's going on.
> > >
> > > Can you elaborate more on your point 2? It almost sounds like you're
> > > talking to a different JobTracker? Note that the MesosScheduler runs
> > > *inside* the JobTracker.
> > >
> > > Also, note that we recently committed a deadlock fix for the Hadoop
> > patch:
> > > https://reviews.apache.org/r/10352/
> > >
> > >
> > > On Thu, Apr 11, 2013 at 2:31 AM, 王国栋 <[email protected]> wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to run hadoop over mesos. And I am using the code in the
> > > trunk.
> > > > But I ran into some problems here. My hadoop version is cdh3u3.
> > > >
> > > > *1. when a job is pending because of no enough resources, I use
> ctrl-c
> > to
> > > > stop the job client. But I can see the pending mapper and pending
> > reducer
> > > > are still in the job tracker. Then I try to use "hadoop job -kill
> > jobid"
> > > to
> > > > kill this job, but nothing happens in jobtracker, mapper and reducer
> > are
> > > > still pending. The log in jobtracker is as follow.*
> > > >
> > > > 13/04/11 17:21:07 INFO mapred.MesosScheduler: JobTracker Status
> > > >       Pending Map Tasks: 1
> > > >    Pending Reduce Tasks: 1
> > > >          Idle Map Slots: 0
> > > >       Idle Reduce Slots: 0
> > > >      Inactive Map Slots: 0 (launched but no hearbeat yet)
> > > >   Inactive Reduce Slots: 0 (launched but no hearbeat yet)
> > > >        Needed Map Slots: 1
> > > >     Needed Reduce Slots: 1
> > > > 13/04/11 17:21:07 INFO mapred.MesosScheduler: Declining offer with
> > > > insufficient resources for a TaskTracker:
> > > >   cpus: offered 4.0 needed 1.800000011920929
> > > >   mem : offered 2731.0 needed 6432.0
> > > >   disk: offered 70651.0 needed 4096.0
> > > >   ports:  at least 2 (sufficient)
> > > > [name: "cpus"
> > > > type: SCALAR
> > > > scalar {
> > > >   value: 4.0
> > > > }
> > > > , name: "mem"
> > > > type: SCALAR
> > > > scalar {
> > > >   value: 2731.0
> > > > }
> > > > , name: "ports"
> > > > type: RANGES
> > > > ranges {
> > > >   range {
> > > >     begin: 31000
> > > >     end: 32000
> > > >   }
> > > > }
> > > > , name: "disk"
> > > > type: SCALAR
> > > > scalar {
> > > >   value: 70651.0
> > > > }
> > > > ]
> > > > 13/04/11 17:21:07 INFO mapred.MesosScheduler: Unable to fully satisfy
> > > > needed map/reduce slots: 1 map slots 1 reduce slots remaining
> > > >
> > > > *2. when we submit the job to the jobtracker, I can not find any
> > running
> > > > job on jobtracker web interface(
> http://localhost:50030/jobtracker.jsp
> > ).
> > > > But
> > > > when the job is finished, I can see the job info in retired job in
> > > > jobtracker.*
> > > >
> > > > Any ideas about this ? Thanks a lot.
> > > >
> > > > Guodong
> > > >
> > >
> >
>

Re: Question about running hadoop over mesos

Reply via email to