RE: Using Whirr to Deploy Clouders CHD4 within the HPCS Public Cloud

Mathews, Bruce Fri, 16 Aug 2013 13:51:19 -0700

Hi Whirr Development Team,

It's me again. I wanted to also share with you an additional optimization step 
that can improve the resulting deployment performance of the Apache or Cloudera 
Cluster hosted on HP Cloud Services OpenStack deployment. Instead of simply 
starting from a nova provisioned Compute Instance for each Cluster Node, you 
might consider using a Cinder call to create a Block Volume of the proper size 
in the location AZ and then associate the created Block Volume to the Compute 
Instance. This will allow for several things to occur. First, it will make the 
data on the node persist beyond termination. Next, it will allow for the Volume 
to be Snapshot and used to create other nodes of the same type (i.e. DataNode, 
ZooKeeper Node, etc.) Third, it will improve the performance of MapReduce jobs 
run on the whirr provisioned and configured Clusters. I built a new cluster 
from scratch based on the Cinder enhancement and the Cinder version of the 
cluster with the same number of nodes hosted in the same AZ performed 35% 
better for the same MapReduce Jobs.


HTH,

Bruce Basil Mathews
HP Cloud Services, DBaaS Architect

+1 760 961 7699 / Tel
+1 760 553 3197 / Mobile

HP Public Cloud Site: http://www.hpcloud.com<http://www.hp.com/go/proliantgen8>

'All the world is a stage, and all the men and women in it, merely players'
Jaquis, As You Like It!

From: Mathews, Bruce
Sent: Thursday, August 15, 2013 10:14 AM
To: dev@whirr.apache.org
Subject: Using Whirr to Deploy Clouders CHD4 within the HPCS Public Cloud

Hello Whirr Developers,

My name is Bruce Basil Mathews. I am the Western Regional Solutions Architect 
for HP Public Cloud Services (the largest OpenStack deployment in the U.S.) 
Recently, I had an opportunity to test the operation and effectiveness of whirr 
for deploying the Cloudera CHD4 stack to our cloud. I was very impressed with 
the overall capabilities and end results, but I found a few things that the 
development team may want to address to optimize whirr for the task.


1.       Security Group Entries: I think, in addition to the opening of port 22 
(ssh) in the initial phases, you may also wish to create and entry for ICMP 
from -1 to -1 with a CIDR of 0.0.0.0/0. This will allow the Compute Instances 
to use ping and other verification methods between addresses. Also, in my case, 
the needed DataNode and TaskTracker ports never seemed to be inserted into the 
Security Group, so I added them manually. I think this second issue may be 
related to the Name Resolution issue I will describe later.

2.       IP Assignment: It appears that you are creating each Compute Instance 
and then assigning a Floating IP address that attempts to be used for 
inter-process communication between nodes of the cluster. It might be better to 
deploy using the Private IP addresses as the vehicle for Node communication 
behind our Firewall and to expose the Public IP address of the NameNode and 
JobTracker Nodes for proxy to and from the whirr server for hadoop command 
execution.

3.       Security Group Rule Masking: It seems as though you tried to use the 
Floating IP address scheme to mask the rules created. This sets up some rather 
complex scenarios until we move to Neutron in Grizzly. In the interim, you may 
consider using the Private IP address scheme for masking or simply use 
0.0.0.0/0.

4.       If you use the Private IP Addresses, then the DNS services behind our 
Firewall can resolve the host names of all of the Compute Instances involved, 
but it may be a good idea to update the hosts files on every node to include 
the Private IP addresses of all the involved Compute Instances just to be safe. 
This is related to the issue brought up in items #1, #2 and #3 above.

I hope you don't mind these suggestions! Aside from these four items, and the 
need to 'reboot' the cluster after manual repairs, the deployment went very 
well from my perspective! I have a document outlining the procedure I used and 
the results achieved. Please send me a direct email and I will be happy to send 
it to whomever is interested. It was too large to attach to the email... Please 
do let me know if this information is helpful and useful for subsequent action 
or if I should not be using this group as a forum for such things... K?

The Very Best!



Bruce Basil Mathews
HP Cloud Services, DBaaS Architect

+1 760 961 7699 / Tel
+1 760 553 3197 / Mobile

HP Public Cloud Site: http://www.hpcloud.com<http://www.hp.com/go/proliantgen8>

'All the world is a stage, and all the men and women in it, merely players'
Jaquis, As You Like It!

RE: Using Whirr to Deploy Clouders CHD4 within the HPCS Public Cloud

Reply via email to