Distributed testing on AWS (was: Re: Next short term goal?)

Tim Ellison Thu, 15 Sep 2016 06:02:44 -0700

On 14/09/16 13:55, Ellison Anne Williams wrote:
> In the meantime/very near term, we could provide a step-by-step
> AWS/GCP/Azure instructions for bringing up a small cluster, running the
> distributed tests, and debugging. Admittedly, most of this is handled in
> the AWS/GCP/Azure documentation, but, in my experience, the documentation
> is confusing and very time consuming to get through the first time.


So do you advise running bare VMs and installing Hadoop, or running the
AWS Elastic Map Reduce service?

Here's where I've been going so far, but don't want to start a wiki
entry with instructions if this is the wrong approach altogether...

  - Sign-up for an AWS account.
        https://aws.amazon.com

  - Obtain access keys
        https://console.aws.amazon.com/iam

  - Install aws command-line tool
        https://aws.amazon.com/cli

  - Configure aws tool
 Choose a default region in the EMR group
http://docs.aws.amazon.com/general/latest/gr/rande.html#emr_region

 $ aws configure
 AWS Access Key ID [None]: AKIAI44QH8DHBEXAMPLE
 AWS Secret Access Key [None]: je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY
 Default region name [None]: eu-east-1
 Default output format [None]: text

  - Create an EC2 key pair, and download e.g. "SparkClusterKeys.pem".

  - Create a Spark cluster

 $ aws emr create-cluster \
   --name "Spark Cluster" \
   --release-label emr-5.0.0 \
   --applications Name=Spark \
   --ec2-attributes KeyName=SparkClusterKeys \
   --instance-type m3.xlarge \
   --instance-count 3 \
   --use-default-roles

 answers a cluster ID, e.g. j-3KVTXXXXXX7UG

  - Upload a JAR file

 $ aws emr put --cluster-id j-3KVTXXXXXX7UG --key-pair-file
SparkClusterKeys.pem --src apache-pirk-0.0.1-SNAPSHOT-exe.jar
 $ aws emr ssh --cluster-id j-3KVTXXXXXX7UG --key-pair-file
SparkClusterKeys.pem --command "hadoop jar <pirkJar>
org.apache.pirk.test.distributed.DistributedTestDriver -j <full path to
pirkJar>"

  - Terminate cluster

 $ aws emr terminate-clusters --cluster-ids j-3KVTXXXXXX7UG


Look at charges per hour and think, there may be a better way...

Regards,
Tim

Distributed testing on AWS (was: Re: Next short term goal?)

Reply via email to