I put up my instructions for GCP and AWS on this page: 
I also have prototype instructions for Azure but their HDInsight platform 
doesn’t yet support Java 8. 

Not everything works completely right but it is a start. 

On 9/15/16, 09:01, "Tim Ellison" <t.p.elli...@gmail.com> wrote:

    On 14/09/16 13:55, Ellison Anne Williams wrote:
    > In the meantime/very near term, we could provide a step-by-step
    > AWS/GCP/Azure instructions for bringing up a small cluster, running the
    > distributed tests, and debugging. Admittedly, most of this is handled in
    > the AWS/GCP/Azure documentation, but, in my experience, the documentation
    > is confusing and very time consuming to get through the first time.
    So do you advise running bare VMs and installing Hadoop, or running the
    AWS Elastic Map Reduce service?
    Here's where I've been going so far, but don't want to start a wiki
    entry with instructions if this is the wrong approach altogether...
      - Sign-up for an AWS account.
      - Obtain access keys
      - Install aws command-line tool
      - Configure aws tool
     Choose a default region in the EMR group
     $ aws configure
     AWS Access Key ID [None]: AKIAI44QH8DHBEXAMPLE
     AWS Secret Access Key [None]: je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY
     Default region name [None]: eu-east-1
     Default output format [None]: text
      - Create an EC2 key pair, and download e.g. "SparkClusterKeys.pem".
      - Create a Spark cluster
     $ aws emr create-cluster \
       --name "Spark Cluster" \
       --release-label emr-5.0.0 \
       --applications Name=Spark \
       --ec2-attributes KeyName=SparkClusterKeys \
       --instance-type m3.xlarge \
       --instance-count 3 \
     answers a cluster ID, e.g. j-3KVTXXXXXX7UG
      - Upload a JAR file
     $ aws emr put --cluster-id j-3KVTXXXXXX7UG --key-pair-file
    SparkClusterKeys.pem --src apache-pirk-0.0.1-SNAPSHOT-exe.jar
     $ aws emr ssh --cluster-id j-3KVTXXXXXX7UG --key-pair-file
    SparkClusterKeys.pem --command "hadoop jar <pirkJar>
    org.apache.pirk.test.distributed.DistributedTestDriver -j <full path to
      - Terminate cluster
     $ aws emr terminate-clusters --cluster-ids j-3KVTXXXXXX7UG
    Look at charges per hour and think, there may be a better way...

Reply via email to