Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by TomWhite: http://wiki.apache.org/lucene-hadoop/AmazonEC2 ------------------------------------------------------------------------------ == Future Work == - Ideally Hadoop could directly access job data from [http://www.amazon.com/gp/browse.html?node=16427261 Amazon S3] (Simple Storage Service). Initial input could be read from S3 when a cluster is launched, and the final output could be written back to S3 before the cluster is decomissioned. Intermediate, temporary data, only needed between MapReduce passes, would be more efficiently stored in Hadoop's DFS. This would require an implementation of a Hadoop [http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/fs/FileSystem.html FileSystem] for S3. There are two issues in Hadoop's bug database related to this: + Ideally Hadoop could directly access job data from [http://www.amazon.com/gp/browse.html?node=16427261 Amazon S3] (Simple Storage Service). Initial input could be read from S3 when a cluster is launched, and the final output could be written back to S3 before the cluster is decomissioned. Intermediate, temporary data, only needed between MapReduce passes, would be more efficiently stored in Hadoop's DFS. From Hadoop 0.10.1 onwards there is an implementation of a Hadoop [http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/fs/FileSystem.html FileSystem] for S3. See ["AmazonS3"]. - * [http://issues.apache.org/jira/browse/HADOOP-574 HADOOP-574] - * [http://issues.apache.org/jira/browse/HADOOP-571 HADOOP-571] - - Please vote for these issues in Jira if you feel this would help your project. (Anyone can create themselves a Jira account in order to vote on issues, etc.) - - [[Anchor(AutomatedScripts])]] + [[Anchor(AutomatedScripts)]] = Automated Scripts = == Setting up ==