So I think this was two separate issues I was running into 1) The python JSON thing -- which didn't seem to affect Mesos actually working aside from the gui 2) I was launching hadoop from the home directory and that was telling mesos to look for the executor in /root/bin/mesos-executor instead of mesos/frameworks.....etc
Prolly a bug: if you add hadoop/bin to your PATH, when you start a job tracker it will assume that the executor script is in working_dir/bin/, which is not actually true. -- Matthew Rathbone Foursquare | Software Engineer | Server Engineering Team [email protected] (mailto:[email protected]) | @rathboma (http://twitter.com/rathboma) | 4sq (http://foursquare.com/rathboma) On Friday, January 27, 2012 at 5:40 PM, Matthew Rathbone wrote: > No -- maybe apache mailing lists don't let you send attachments. > > its here: > http://cl.ly/2z0N04290t143S463s29 > > > Also, I see this repeated a bunch in the logs, not sure if it helps, looks > like hadoop is exiting with a non-zero exit code? > > I0127 23:34:57.590661 5089 master.cpp:1453] Launching task 316 on slave > 201201272320-0-2 > I0127 23:34:57.987401 5089 master.cpp:1001] Executor default of framework > 201201272320-0-0000 on slave 201201272320-0-2 (ip-10-98-58-126.ec2.internal) > exited with status 256 > I0127 23:34:57.988973 5089 master.cpp:1033] Removing task 316 of framework > 201201272320-0-0000 because of lost executor > I0127 23:34:57.989153 5089 master.cpp:1184] Sending 1 offers to framework > 201201272320-0-0000 > I0127 23:34:57.990584 5089 master.cpp:679] Received reply for offer > 201201272320-0-926 > I0127 23:34:57.990675 5089 master.cpp:1453] Launching task 317 on slave > 201201272320-0-2 > I0127 23:34:58.587996 5089 master.cpp:1184] Sending 4 offers to framework > 201201272320-0-0000 > I0127 23:34:58.590417 5089 master.cpp:679] Received reply for offer > 201201272320-0-927 > I0127 23:34:58.590739 5089 master.cpp:1403] Filtered slave 201201272320-0-1 > for framework 201201272320-0-0000 for 5 seconds > > > > > -- > Matthew Rathbone > Foursquare | Software Engineer | Server Engineering Team > [email protected] (mailto:[email protected]) | @rathboma > (http://twitter.com/rathboma) | 4sq (http://foursquare.com/rathboma) > > > > On Friday, January 27, 2012 at 5:36 PM, Andy Konwinski wrote: > > > Did you forget to attach the output? > > > > On Fri, Jan 27, 2012 at 3:31 PM, Matthew Rathbone <[email protected] > > (mailto:[email protected])>wrote: > > > > > Here's my output from that (attached, it's long). > > > > > > The regular web-uri :8080 works fine until I submit a job, it can see the > > > hadoop jobtracker and everything, but when I submit a job it goes haywire. > > > I can't see anything obvious in the logs either. > > > > > > This is all I did: > > > start a cluster > > > start a job tracker > > > hadoop fs -put hadoop-examples.jar > > > <mkdirs> > > > hadoop jar hadoop-examples.jar wordcount wordcount/input wordcount/output > > > > > > I figured it might be something to do with MESOS_HOME not being set in > > > hadoop-env.sh (http://hadoop-env.sh), so I set that too ( on all machines > > > ), but it didn't seem to > > > help. > > > > > > If it helps, the jobtracker is still up, and it received the job, but > > > doesn't see any nodes. > > > > > > -- > > > Matthew Rathbone > > > Foursquare | Software Engineer | Server Engineering Team > > > [email protected] (mailto:[email protected]) | @rathboma > > > <http://twitter.com/rathboma> | 4sq<http://foursquare.com/rathboma> > > > > > > On Friday, January 27, 2012 at 5:20 PM, Andy Konwinski wrote: > > > > > > It looks like a JSON parsing error in the webui python code (i.e. the > > > error > > > output shows line 11 of webui/master/index.tpl which is the json code > > > "state = json.loads(data)"). > > > > > > What happens if you go to > > > > > > http://ec2-107-21-195-96.compute-1.amazonaws.com:5050/master/state.jsoninside > > > the firewall (or open up port 5050 in the EC2 firewall for your > > > machine)? > > > > > > When I do this on my machine locally (before running any frameworks or > > > starting any slaves), I see: > > > > > > {"build_date":"2012-01-25 > > > > > > 11:19:19","build_user":"andyk","completed_frameworks":[],"frameworks":[],"id":"201201271511-0","pid":" > > > [email protected] > > > (mailto:[email protected]):5050","slaves":[],"start_time":1327705891} > > > > > > Andy > > > > > > On Fri, Jan 27, 2012 at 3:02 PM, Matthew Rathbone <[email protected] > > > (mailto:[email protected]) > > > > wrote: > > > > > > > > > So I spun up a mesos cluster using the ec2 scripts. So far so good. > > > > > > Then I spun up a jobtracker, that worked (after some fiddling) > > > > > > Then I tried to submit an example job (wordcount). > > > > > > First of all, the job tracker receives the job, but then I get these > > > errors in the terminal: > > > 12/01/27 22:57:12 INFO input.FileInputFormat: Total input paths to process > > > : 0 > > > 12/01/27 22:57:13 INFO mapred.JobClient: Running job: > > > job_201201272245_0002 > > > 12/01/27 22:57:14 INFO mapred.JobClient: map 0% reduce 0% > > > channel 6: open failed: connect failed: Connection refused > > > channel 7: open failed: connect failed: Connection refused > > > channel 6: open failed: connect failed: Connection refused > > > > > > > > > So I check on the mesos dashboard (port 8080) and I see this: > > > http://cl.ly/221D193v0l012k0h3W0S > > > > > > It doesn't look good, anyone have any pointers? (Sorry for spamming the > > > list so much over the last couple of days) > > > > > > -- > > > Matthew Rathbone > > > Foursquare | Software Engineer | Server Engineering Team > > > [email protected] (mailto:[email protected]) > > > (mailto:[email protected]<[email protected] > > > (mailto:[email protected])>) > > > | @rathboma ( > > > http://twitter.com/rathboma) | 4sq (http://foursquare.com/rathboma) > > > > > > > > > > > > >
