[ 
https://issues.apache.org/jira/browse/HADOOP-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590380#action_12590380
 ] 

Steve Loughran commented on HADOOP-2409:
----------------------------------------

the idea would be that when the user issues a "yum update" (assuming an 
RPM-based distro), and the hadoop RPM would be updated (along with any other 
patches). better yet, you go "yum update hadoop-on-s3" and get an update of 
that stuff only, as a full update may have adverse side effects ( see 
http://www.1060.org/blogxter/entry?publicid=0C5798DE7C14EE57D8BEA1E1E945872E )

The nice thing about this approach is that it integrate with the linux 
ecosystem; people can uninstall you cleanly, and the OS can stop your files 
getting stamped on.

Have I done this? Well, we release RPMs, which get built on Linux using 
<rpmbuild>; these then get handed off to other people to put in their 
repositories. So I dont know how to set up a Yum-compatible file system. I do 
know how to build RPMs under Ant, taking .spec files and setting them up. Its 
painful, but once you get the hang of things not too hard. You do just need a 
clean RPM-based VM around to test your installation on, which is where EC2, 
VMware or Xen come into the picture...we test locally on VMWare, but now I can 
start/stop EC2 images during tests that could be targeted directly.

The big issue is the engineering effort to create the RPMs, to write the tests 
and maintain the .spec files. Surely there must be people out there who create 
there own Hadoop RPMs? Ideally we'd take existing work -such as a hadoop-core 
RPM, and add a new hadoop-on-ec2 RPM that depended on the base RPMs



> Make EC2 image independent of Hadoop version
> --------------------------------------------
>
>                 Key: HADOOP-2409
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2409
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/ec2
>            Reporter: Tom White
>         Attachments: HADOOP-2409.patch
>
>
> Instead of building a new image for each released version of Hadoop, install 
> Hadoop on instance start up. Since it is a small download this would not add 
> significantly to startup time. Hadoop releases would be mirrored on S3 for 
> scalability (and to avoid bandwidth costs). The version to install would be 
> found from the instance metadata - this would be a download URL. 
> More generally, the instance could retrieve a script to run on start up from 
> a URL specified in the metadata. The script would install and configure 
> Hadoop, but it could be extended to do cluster-specific set up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to