It's very likely to be because the AMI has an older version of Mesos. We should 
make a new AMI.

The -d git option in the script seems to be broken too, so we should fix that. 
In theory it would work… I think it broke when we switched the location of the 
repo (and maybe the internal structure too).

Matei

On Jan 27, 2012, at 9:36 PM, Matthew Rathbone wrote:

> When I spin up mesos using the ec2 scripts, and redeploy both hdfs and hadoop 
> using cloudera's distribution I see this error when I try to start the 
> jobtracker: 
> 
> 12/01/28 05:23:28 INFO util.HostsFileReader: Setting the includes file to 
> 12/01/28 05:23:28 INFO util.HostsFileReader: Setting the excludes file to 
> 12/01/28 05:23:28 INFO util.HostsFileReader: Refreshing hosts 
> (include/exclude) list
> 12/01/28 05:23:28 INFO mapred.JobTracker: Decommissioning 0 nodes
> 12/01/28 05:23:28 INFO mapred.FrameworkScheduler: Got resource offer value: 
> "201201280508-0-5"
> 
> Exception in thread "Thread-20" java.lang.NoSuchMethodError: 
> org.apache.mesos.Protos$Resource.getScalar()Lorg/apache/mesos/Protos$Value$Scalar;
> at 
> org.apache.hadoop.mapred.FrameworkScheduler.getResource(FrameworkScheduler.java:176)
> at 
> org.apache.hadoop.mapred.FrameworkScheduler.getResource(FrameworkScheduler.java:183)
> at 
> org.apache.hadoop.mapred.FrameworkScheduler.resourceOffers(FrameworkScheduler.java:203)
> 
> 
> It seems to be stopping the job tracker from starting new tasks.
> 
> I was wondering if this is a version conflict between the mesos I've built 
> against (trunk), and the version of mesos used on the AMI? -- it seems to 
> come from the generated protobuf library.
> 
> 
> 
> To try and solve this, I attempted to spin up a cluster passing -d git (to 
> have the latest code pulled from git, but then I get a string of crazy python 
> exceptions:
> 
> sync error: unexplained error (code 255) at 
> /SourceCache/rsync/rsync-40/rsync/io.c(452) [sender=2.6.9]
> Traceback (most recent call last):
>  File "./mesos_ec2.py", line 541, in <module>
>    main()
>  File "./mesos_ec2.py", line 450, in main
>    setup_cluster(conn, master_nodes, slave_nodes, zoo_nodes, opts, True)
>  File "./mesos_ec2.py", line 304, in setup_cluster
>    deploy_files(conn, "deploy." + opts.os, opts, master_nodes, slave_nodes, 
> zoo_nodes)
>  File "./mesos_ec2.py", line 415, in deploy_files
>    subprocess.check_call(command, shell=True)
>  File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py",
>  line 462, in check_call
>    raise CalledProcessError(retcode, cmd)
> subprocess.CalledProcessError: Command 'rsync -rv -e 'ssh -o 
> StrictHostKeyChecking=no -i /Users/matthew/id-foursquare' 
> '/var/folders/CK/CKzwG+5sFuSjDMUTvdmWfk+++TI/-Tmp-/tmpFmfdmB/' 
> '[email protected]:/'' returned non-zero exit status 255
> 
> 
> 
> 
> Are version conflicts the likely reason for this failure do you think?
> 
> -- 
> Matthew Rathbone
> Foursquare | Software Engineer | Server Engineering Team
> [email protected] (mailto:[email protected]) | @rathboma 
> (http://twitter.com/rathboma) | 4sq (http://foursquare.com/rathboma)
> 
> 

Reply via email to