Re: Mesos master errors when running a hadoop job on EC2

Matthew Rathbone Fri, 27 Jan 2012 16:28:30 -0800

So I think this was two separate issues I was running into 

1) The python JSON thing -- which didn't seem to affect Mesos actually working 
aside from the gui
2) I was launching hadoop from the home directory and that was telling mesos to 
look for the executor in /root/bin/mesos-executor instead of 
mesos/frameworks.....etc



Prolly a bug: if you add hadoop/bin to your PATH, when you start a job tracker 
it will assume that the executor script is in working_dir/bin/, which is not 
actually true.


-- 
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
[email protected] (mailto:[email protected]) | @rathboma 
(http://twitter.com/rathboma) | 4sq (http://foursquare.com/rathboma)



On Friday, January 27, 2012 at 5:40 PM, Matthew Rathbone wrote:

> No -- maybe apache mailing lists don't let you send attachments. 
> 
> its here:
> http://cl.ly/2z0N04290t143S463s29
> 
> 
> Also, I see this repeated a bunch in the logs, not sure if it helps, looks 
> like hadoop is exiting with a non-zero exit code?
> 
> I0127 23:34:57.590661  5089 master.cpp:1453] Launching task 316 on slave 
> 201201272320-0-2
> I0127 23:34:57.987401  5089 master.cpp:1001] Executor default of framework 
> 201201272320-0-0000 on slave 201201272320-0-2 (ip-10-98-58-126.ec2.internal) 
> exited with status 256
> I0127 23:34:57.988973  5089 master.cpp:1033] Removing task 316 of framework 
> 201201272320-0-0000 because of lost executor
> I0127 23:34:57.989153  5089 master.cpp:1184] Sending 1 offers to framework 
> 201201272320-0-0000
> I0127 23:34:57.990584  5089 master.cpp:679] Received reply for offer 
> 201201272320-0-926
> I0127 23:34:57.990675  5089 master.cpp:1453] Launching task 317 on slave 
> 201201272320-0-2
> I0127 23:34:58.587996  5089 master.cpp:1184] Sending 4 offers to framework 
> 201201272320-0-0000
> I0127 23:34:58.590417  5089 master.cpp:679] Received reply for offer 
> 201201272320-0-927
> I0127 23:34:58.590739  5089 master.cpp:1403] Filtered slave 201201272320-0-1 
> for framework 201201272320-0-0000 for 5 seconds
> 
> 
> 
> 
> -- 
> Matthew Rathbone
> Foursquare | Software Engineer | Server Engineering Team
> [email protected] (mailto:[email protected]) | @rathboma 
> (http://twitter.com/rathboma) | 4sq (http://foursquare.com/rathboma)
> 
> 
> 
> On Friday, January 27, 2012 at 5:36 PM, Andy Konwinski wrote:
> 
> > Did you forget to attach the output?
> > 
> > On Fri, Jan 27, 2012 at 3:31 PM, Matthew Rathbone <[email protected] 
> > (mailto:[email protected])>wrote:
> > 
> > > Here's my output from that (attached, it's long).
> > > 
> > > The regular web-uri :8080 works fine until I submit a job, it can see the
> > > hadoop jobtracker and everything, but when I submit a job it goes haywire.
> > > I can't see anything obvious in the logs either.
> > > 
> > > This is all I did:
> > > start a cluster
> > > start a job tracker
> > > hadoop fs -put hadoop-examples.jar
> > > <mkdirs>
> > > hadoop jar hadoop-examples.jar wordcount wordcount/input wordcount/output
> > > 
> > > I figured it might be something to do with MESOS_HOME not being set in
> > > hadoop-env.sh (http://hadoop-env.sh), so I set that too ( on all machines 
> > > ), but it didn't seem to
> > > help.
> > > 
> > > If it helps, the jobtracker is still up, and it received the job, but
> > > doesn't see any nodes.
> > > 
> > > --
> > > Matthew Rathbone
> > > Foursquare | Software Engineer | Server Engineering Team
> > > [email protected] (mailto:[email protected]) | @rathboma 
> > > <http://twitter.com/rathboma> | 4sq<http://foursquare.com/rathboma>
> > > 
> > > On Friday, January 27, 2012 at 5:20 PM, Andy Konwinski wrote:
> > > 
> > > It looks like a JSON parsing error in the webui python code (i.e. the 
> > > error
> > > output shows line 11 of webui/master/index.tpl which is the json code
> > > "state = json.loads(data)").
> > > 
> > > What happens if you go to
> > > 
> > > http://ec2-107-21-195-96.compute-1.amazonaws.com:5050/master/state.jsoninside
> > > the firewall (or open up port 5050 in the EC2 firewall for your
> > > machine)?
> > > 
> > > When I do this on my machine locally (before running any frameworks or
> > > starting any slaves), I see:
> > > 
> > > {"build_date":"2012-01-25
> > > 
> > > 11:19:19","build_user":"andyk","completed_frameworks":[],"frameworks":[],"id":"201201271511-0","pid":"
> > > [email protected] 
> > > (mailto:[email protected]):5050","slaves":[],"start_time":1327705891}
> > > 
> > > Andy
> > > 
> > > On Fri, Jan 27, 2012 at 3:02 PM, Matthew Rathbone <[email protected] 
> > > (mailto:[email protected])
> > > > wrote:
> > > 
> > > 
> > > So I spun up a mesos cluster using the ec2 scripts. So far so good.
> > > 
> > > Then I spun up a jobtracker, that worked (after some fiddling)
> > > 
> > > Then I tried to submit an example job (wordcount).
> > > 
> > > First of all, the job tracker receives the job, but then I get these
> > > errors in the terminal:
> > > 12/01/27 22:57:12 INFO input.FileInputFormat: Total input paths to process
> > > : 0
> > > 12/01/27 22:57:13 INFO mapred.JobClient: Running job: 
> > > job_201201272245_0002
> > > 12/01/27 22:57:14 INFO mapred.JobClient: map 0% reduce 0%
> > > channel 6: open failed: connect failed: Connection refused
> > > channel 7: open failed: connect failed: Connection refused
> > > channel 6: open failed: connect failed: Connection refused
> > > 
> > > 
> > > So I check on the mesos dashboard (port 8080) and I see this:
> > > http://cl.ly/221D193v0l012k0h3W0S
> > > 
> > > It doesn't look good, anyone have any pointers? (Sorry for spamming the
> > > list so much over the last couple of days)
> > > 
> > > --
> > > Matthew Rathbone
> > > Foursquare | Software Engineer | Server Engineering Team
> > > [email protected] (mailto:[email protected]) 
> > > (mailto:[email protected]<[email protected] 
> > > (mailto:[email protected])>)
> > > | @rathboma (
> > > http://twitter.com/rathboma) | 4sq (http://foursquare.com/rathboma)
> > > 
> > 
> > 
> > 
> > 
> 
>

Re: Mesos master errors when running a hadoop job on EC2

Reply via email to