I put up my instructions for GCP and AWS on this page: 
https://pirk.incubator.apache.org/cloud_instructions
I also have prototype instructions for Azure but their HDInsight platform 
doesn’t yet support Java 8. 

Not everything works completely right but it is a start. 

On 9/15/16, 09:01, "Tim Ellison" <t.p.elli...@gmail.com> wrote:

    On 14/09/16 13:55, Ellison Anne Williams wrote:
    > In the meantime/very near term, we could provide a step-by-step
    > AWS/GCP/Azure instructions for bringing up a small cluster, running the
    > distributed tests, and debugging. Admittedly, most of this is handled in
    > the AWS/GCP/Azure documentation, but, in my experience, the documentation
    > is confusing and very time consuming to get through the first time.
    
    So do you advise running bare VMs and installing Hadoop, or running the
    AWS Elastic Map Reduce service?
    
    Here's where I've been going so far, but don't want to start a wiki
    entry with instructions if this is the wrong approach altogether...
    
      - Sign-up for an AWS account.
        https://aws.amazon.com
    
      - Obtain access keys
        https://console.aws.amazon.com/iam
    
      - Install aws command-line tool
        https://aws.amazon.com/cli
    
      - Configure aws tool
     Choose a default region in the EMR group
    http://docs.aws.amazon.com/general/latest/gr/rande.html#emr_region
    
     $ aws configure
     AWS Access Key ID [None]: AKIAI44QH8DHBEXAMPLE
     AWS Secret Access Key [None]: je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY
     Default region name [None]: eu-east-1
     Default output format [None]: text
    
      - Create an EC2 key pair, and download e.g. "SparkClusterKeys.pem".
    
      - Create a Spark cluster
    
     $ aws emr create-cluster \
       --name "Spark Cluster" \
       --release-label emr-5.0.0 \
       --applications Name=Spark \
       --ec2-attributes KeyName=SparkClusterKeys \
       --instance-type m3.xlarge \
       --instance-count 3 \
       --use-default-roles
    
     answers a cluster ID, e.g. j-3KVTXXXXXX7UG
    
      - Upload a JAR file
    
     $ aws emr put --cluster-id j-3KVTXXXXXX7UG --key-pair-file
    SparkClusterKeys.pem --src apache-pirk-0.0.1-SNAPSHOT-exe.jar
     $ aws emr ssh --cluster-id j-3KVTXXXXXX7UG --key-pair-file
    SparkClusterKeys.pem --command "hadoop jar <pirkJar>
    org.apache.pirk.test.distributed.DistributedTestDriver -j <full path to
    pirkJar>"
    
      - Terminate cluster
    
     $ aws emr terminate-clusters --cluster-ids j-3KVTXXXXXX7UG
    
    
    Look at charges per hour and think, there may be a better way...
    
    Regards,
    Tim
    


Reply via email to