Stephen M. Hopper created SPARK-2396:
----------------------------------------

             Summary: Spark EC2 scripts fail when trying to log in to EC2 
instances
                 Key: SPARK-2396
                 URL: https://issues.apache.org/jira/browse/SPARK-2396
             Project: Spark
          Issue Type: Bug
          Components: EC2
    Affects Versions: 1.0.0
         Environment: Windows 8, Cygwin and command prompt, Python 2.7
            Reporter: Stephen M. Hopper


I cannot seem to successfully start up a Spark EC2 cluster using the spark-ec2 
script.

I'm using variations on the following command:
./spark-ec2 --instance-type=m1.small --region=us-west-1 --spot-price=0.05 
--spark-version=1.0.0 -k my-key-name -i my-key-name.pem -s 1 launch 
spark-test-cluster

The script always allocates the EC2 instances without much trouble, but can 
never seem to complete the SSH step to install Spark on the cluster.  It always 
complains about my SSH key.  If I try to log in with my ssh key doing something 
like this:

ssh -i my-key-name.pem root@<insert ip of my instance here>

it fails.  However, if I log in to the AWS console, click on my instance and 
select "connect", it displays the instructions for SSHing into my instance 
(which are no different from the ssh command from above).  So, if I rerun the 
SSH command from above, I'm able to log in.

Next, if I try to rerun the spark-ec2 command from above (replacing "launch" 
with "start"), the script logs in and starts installing Spark.  However, it 
eventually errors out with the following output:

Cloning into 'spark-ec2'...
remote: Counting objects: 1465, done.
remote: Compressing objects: 100% (697/697), done.
remote: Total 1465 (delta 485), reused 1465 (delta 485)
Receiving objects: 100% (1465/1465), 228.51 KiB | 287 KiB/s, done.
Resolving deltas: 100% (485/485), done.
Connection to ec2-<my-clusters-ip>.us-west-1.compute.amazonaws.com closed.
Searching for existing cluster spark-test-cluster...
Found 1 master(s), 1 slaves
Starting slaves...
Starting master...
Waiting for instances to start up...
Waiting 120 more seconds...
Deploying files to master...
Traceback (most recent call last):
  File "./spark_ec2.py", line 823, in <module>
    main()
  File "./spark_ec2.py", line 815, in main
    real_main()
  File "./spark_ec2.py", line 806, in real_main
    setup_cluster(conn, master_nodes, slave_nodes, opts, False)
  File "./spark_ec2.py", line 450, in setup_cluster
    deploy_files(conn, "deploy.generic", opts, master_nodes, slave_nodes, 
modules)
  File "./spark_ec2.py", line 593, in deploy_files
    subprocess.check_call(command)
  File "E:\windows_programs\Python27\lib\subprocess.py", line 535, in check_call
    retcode = call(*popenargs, **kwargs)
  File "E:\windows_programs\Python27\lib\subprocess.py", line 522, in call
    return Popen(*popenargs, **kwargs).wait()
  File "E:\windows_programs\Python27\lib\subprocess.py", line 710, in __init__
    errread, errwrite)
  File "E:\windows_programs\Python27\lib\subprocess.py", line 958, in 
_execute_child
    startupinfo)
WindowsError: [Error 2] The system cannot find the file specified


So, in short, am I missing something or is this a bug?  Any help would be 
appreciated.

Other notes:
-I've tried both us-west-1 and us-east-1 regions.
-I've tried several different instance types.
-I've tried playing with the permissions on the ssh key (600, 400, etc.), but 
to no avail



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to