Is there an important difference between creating an existing AMI or using an existing AMI with a startup script that populates everything from S3?
Building an AMI takes a few hours of time and is a total pain in the butt. My eventual result was that I didn't need to do it at all. I found that I had roughly three levels of variation in my production systems: - the OS - the infrastructural components like java, hadoop and zookeeeper - the application that I wanted to run My initial thought was that the AMI should cover the first two aspects of variability. But I also found that I wanted to change the version of the infrastructure stuff fairly often in development of the AMI and not infrequently in production. For Mahout customers, I would imagine that there is a reasonable amount of variability in desired OS (Ubuntu versus Redhat versus Centos at least), JDK and Hadoop versions. We definitely can't afford the time to build AMI's for all options. My final answer for deepdyve was to use a standard alestic.com AMI. That let me change the OS whenever I needed to and would let Mahout customers pick their preference. These AMI's allow a 16K startup script which I used to handle infrastructure variation. That worked very well for me and could be used for Mahout. The cost was a few 10's of seconds at boot time. The benefit was vastly better debug and development cycle. Somebody else handled the OS and I could test many variations of setup script very quickly. This practice is very much in line with what RightScale does. Generally, I would avoid the full-custom AMI in favor of a few S3 hosted tar balls rooted at / that anybody can rain down on any Linux version they want. On Mon, Jan 18, 2010 at 6:54 AM, Grant Ingersoll <[email protected]>wrote: > Create an AMI with: > 1. Java 1.6 > 2. Maven > 3. svn > 4. Mahout's exact Hadoop version > 5. A checkout of Mahout > -- Ted Dunning, CTO DeepDyve
