+1 this is a smarter version of what I tried to put together too. A semi-custom AMI would download components and configure via an /etc/rc script. Quite nice.
Point taken about Hadoop and the usefulness amongst ourselves of such a thing. Based on incomplete experience with running AMIs, and a Hadoop cluster, it's going to be no small feet to craft a series of AMIs (or one configurable one) that will reliably come up, find its workers, accept jobs, etc. It's not terrible but the work of a week I'm guessing. That would be pretty great, for the whole community, should you succeed. You could probably make a nice paid AMI out of it! On Mon, Jan 18, 2010 at 8:15 PM, Ted Dunning <[email protected]> wrote: > Is there an important difference between creating an existing AMI or using > an existing AMI with a startup script that populates everything from S3? > > Building an AMI takes a few hours of time and is a total pain in the butt. > My eventual result was that I didn't need to do it at all. > > I found that I had roughly three levels of variation in my production > systems: > > - the OS > - the infrastructural components like java, hadoop and zookeeeper > - the application that I wanted to run > > My initial thought was that the AMI should cover the first two aspects of > variability. But I also found that I wanted to change the version of the > infrastructure stuff fairly often in development of the AMI and not > infrequently in production. > > For Mahout customers, I would imagine that there is a reasonable amount of > variability in desired OS (Ubuntu versus Redhat versus Centos at least), JDK > and Hadoop versions. We definitely can't afford the time to build AMI's for > all options. > > My final answer for deepdyve was to use a standard alestic.com AMI. That > let me change the OS whenever I needed to and would let Mahout customers > pick their preference. These AMI's allow a 16K startup script which I used > to handle infrastructure variation. That worked very well for me and could > be used for Mahout. > > The cost was a few 10's of seconds at boot time. The benefit was vastly > better debug and development cycle. Somebody else handled the OS and I > could test many variations of setup script very quickly. This practice is > very much in line with what RightScale does. > > Generally, I would avoid the full-custom AMI in favor of a few S3 hosted tar > balls rooted at / that anybody can rain down on any Linux version they > want. > > On Mon, Jan 18, 2010 at 6:54 AM, Grant Ingersoll <[email protected]>wrote: > >> Create an AMI with: >> 1. Java 1.6 >> 2. Maven >> 3. svn >> 4. Mahout's exact Hadoop version >> 5. A checkout of Mahout >> > > > > -- > Ted Dunning, CTO > DeepDyve >
