[
https://issues.apache.org/jira/browse/SPARK-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057045#comment-14057045
]
Stephen M. Hopper commented on SPARK-2396:
------------------------------------------
Update: I attempted this again except on a machine running Ubuntu Server 14.04
and it worked first try. I kept all of the steps the same except I was using a
prebuilt version of Spark (1.0.0 for Hadoop 2) instead of the version I had
built myself from source using Maven.
> Spark EC2 scripts fail when trying to log in to EC2 instances
> -------------------------------------------------------------
>
> Key: SPARK-2396
> URL: https://issues.apache.org/jira/browse/SPARK-2396
> Project: Spark
> Issue Type: Bug
> Components: EC2
> Affects Versions: 1.0.0
> Environment: Windows 8, Cygwin and command prompt, Python 2.7
> Reporter: Stephen M. Hopper
> Labels: aws, ec2, ssh
>
> I cannot seem to successfully start up a Spark EC2 cluster using the
> spark-ec2 script.
> I'm using variations on the following command:
> ./spark-ec2 --instance-type=m1.small --region=us-west-1 --spot-price=0.05
> --spark-version=1.0.0 -k my-key-name -i my-key-name.pem -s 1 launch
> spark-test-cluster
> The script always allocates the EC2 instances without much trouble, but can
> never seem to complete the SSH step to install Spark on the cluster. It
> always complains about my SSH key. If I try to log in with my ssh key doing
> something like this:
> ssh -i my-key-name.pem root@<insert ip of my instance here>
> it fails. However, if I log in to the AWS console, click on my instance and
> select "connect", it displays the instructions for SSHing into my instance
> (which are no different from the ssh command from above). So, if I rerun the
> SSH command from above, I'm able to log in.
> Next, if I try to rerun the spark-ec2 command from above (replacing "launch"
> with "start"), the script logs in and starts installing Spark. However, it
> eventually errors out with the following output:
> Cloning into 'spark-ec2'...
> remote: Counting objects: 1465, done.
> remote: Compressing objects: 100% (697/697), done.
> remote: Total 1465 (delta 485), reused 1465 (delta 485)
> Receiving objects: 100% (1465/1465), 228.51 KiB | 287 KiB/s, done.
> Resolving deltas: 100% (485/485), done.
> Connection to ec2-<my-clusters-ip>.us-west-1.compute.amazonaws.com closed.
> Searching for existing cluster spark-test-cluster...
> Found 1 master(s), 1 slaves
> Starting slaves...
> Starting master...
> Waiting for instances to start up...
> Waiting 120 more seconds...
> Deploying files to master...
> Traceback (most recent call last):
> File "./spark_ec2.py", line 823, in <module>
> main()
> File "./spark_ec2.py", line 815, in main
> real_main()
> File "./spark_ec2.py", line 806, in real_main
> setup_cluster(conn, master_nodes, slave_nodes, opts, False)
> File "./spark_ec2.py", line 450, in setup_cluster
> deploy_files(conn, "deploy.generic", opts, master_nodes, slave_nodes,
> modules)
> File "./spark_ec2.py", line 593, in deploy_files
> subprocess.check_call(command)
> File "E:\windows_programs\Python27\lib\subprocess.py", line 535, in
> check_call
> retcode = call(*popenargs, **kwargs)
> File "E:\windows_programs\Python27\lib\subprocess.py", line 522, in call
> return Popen(*popenargs, **kwargs).wait()
> File "E:\windows_programs\Python27\lib\subprocess.py", line 710, in __init__
> errread, errwrite)
> File "E:\windows_programs\Python27\lib\subprocess.py", line 958, in
> _execute_child
> startupinfo)
> WindowsError: [Error 2] The system cannot find the file specified
> So, in short, am I missing something or is this a bug? Any help would be
> appreciated.
> Other notes:
> -I've tried both us-west-1 and us-east-1 regions.
> -I've tried several different instance types.
> -I've tried playing with the permissions on the ssh key (600, 400, etc.), but
> to no avail
--
This message was sent by Atlassian JIRA
(v6.2#6252)