Hi Whirr Development Team, It's me again. I wanted to also share with you an additional optimization step that can improve the resulting deployment performance of the Apache or Cloudera Cluster hosted on HP Cloud Services OpenStack deployment. Instead of simply starting from a nova provisioned Compute Instance for each Cluster Node, you might consider using a Cinder call to create a Block Volume of the proper size in the location AZ and then associate the created Block Volume to the Compute Instance. This will allow for several things to occur. First, it will make the data on the node persist beyond termination. Next, it will allow for the Volume to be Snapshot and used to create other nodes of the same type (i.e. DataNode, ZooKeeper Node, etc.) Third, it will improve the performance of MapReduce jobs run on the whirr provisioned and configured Clusters. I built a new cluster from scratch based on the Cinder enhancement and the Cinder version of the cluster with the same number of nodes hosted in the same AZ performed 35% better for the same MapReduce Jobs.
HTH, Bruce Basil Mathews HP Cloud Services, DBaaS Architect +1 760 961 7699 / Tel +1 760 553 3197 / Mobile HP Public Cloud Site: http://www.hpcloud.com<http://www.hp.com/go/proliantgen8> 'All the world is a stage, and all the men and women in it, merely players' Jaquis, As You Like It! From: Mathews, Bruce Sent: Thursday, August 15, 2013 10:14 AM To: dev@whirr.apache.org Subject: Using Whirr to Deploy Clouders CHD4 within the HPCS Public Cloud Hello Whirr Developers, My name is Bruce Basil Mathews. I am the Western Regional Solutions Architect for HP Public Cloud Services (the largest OpenStack deployment in the U.S.) Recently, I had an opportunity to test the operation and effectiveness of whirr for deploying the Cloudera CHD4 stack to our cloud. I was very impressed with the overall capabilities and end results, but I found a few things that the development team may want to address to optimize whirr for the task. 1. Security Group Entries: I think, in addition to the opening of port 22 (ssh) in the initial phases, you may also wish to create and entry for ICMP from -1 to -1 with a CIDR of 0.0.0.0/0. This will allow the Compute Instances to use ping and other verification methods between addresses. Also, in my case, the needed DataNode and TaskTracker ports never seemed to be inserted into the Security Group, so I added them manually. I think this second issue may be related to the Name Resolution issue I will describe later. 2. IP Assignment: It appears that you are creating each Compute Instance and then assigning a Floating IP address that attempts to be used for inter-process communication between nodes of the cluster. It might be better to deploy using the Private IP addresses as the vehicle for Node communication behind our Firewall and to expose the Public IP address of the NameNode and JobTracker Nodes for proxy to and from the whirr server for hadoop command execution. 3. Security Group Rule Masking: It seems as though you tried to use the Floating IP address scheme to mask the rules created. This sets up some rather complex scenarios until we move to Neutron in Grizzly. In the interim, you may consider using the Private IP address scheme for masking or simply use 0.0.0.0/0. 4. If you use the Private IP Addresses, then the DNS services behind our Firewall can resolve the host names of all of the Compute Instances involved, but it may be a good idea to update the hosts files on every node to include the Private IP addresses of all the involved Compute Instances just to be safe. This is related to the issue brought up in items #1, #2 and #3 above. I hope you don't mind these suggestions! Aside from these four items, and the need to 'reboot' the cluster after manual repairs, the deployment went very well from my perspective! I have a document outlining the procedure I used and the results achieved. Please send me a direct email and I will be happy to send it to whomever is interested. It was too large to attach to the email... Please do let me know if this information is helpful and useful for subsequent action or if I should not be using this group as a forum for such things... K? The Very Best! Bruce Basil Mathews HP Cloud Services, DBaaS Architect +1 760 961 7699 / Tel +1 760 553 3197 / Mobile HP Public Cloud Site: http://www.hpcloud.com<http://www.hp.com/go/proliantgen8> 'All the world is a stage, and all the men and women in it, merely players' Jaquis, As You Like It!