Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.

The following page has been changed by TomWhite:
http://wiki.apache.org/lucene-hadoop/AmazonEC2

------------------------------------------------------------------------------
  
  == Future Work ==
  
- Ideally Hadoop could directly access job data from 
[http://www.amazon.com/gp/browse.html?node=16427261 Amazon S3] (Simple Storage 
Service).  Initial input could be read from S3 when a cluster is launched, and 
the final output could be written back to S3 before the cluster is 
decomissioned.  Intermediate, temporary data, only needed between MapReduce 
passes, would be more efficiently stored in Hadoop's DFS.  This would require 
an implementation of a Hadoop 
[http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/fs/FileSystem.html 
FileSystem] for S3.  There are two issues in Hadoop's bug database related to 
this:
+ Ideally Hadoop could directly access job data from 
[http://www.amazon.com/gp/browse.html?node=16427261 Amazon S3] (Simple Storage 
Service).  Initial input could be read from S3 when a cluster is launched, and 
the final output could be written back to S3 before the cluster is 
decomissioned.  Intermediate, temporary data, only needed between MapReduce 
passes, would be more efficiently stored in Hadoop's DFS. From Hadoop 0.10.1 
onwards there is an implementation of a Hadoop 
[http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/fs/FileSystem.html 
FileSystem] for S3. See ["AmazonS3"].
  
-  * [http://issues.apache.org/jira/browse/HADOOP-574 HADOOP-574]
-  * [http://issues.apache.org/jira/browse/HADOOP-571 HADOOP-571]
- 
- Please vote for these issues in Jira if you feel this would help your 
project.  (Anyone can create themselves a Jira account in order to vote on 
issues, etc.)
- 
- [[Anchor(AutomatedScripts])]]
+ [[Anchor(AutomatedScripts)]]
  = Automated Scripts =
  
  == Setting up ==

Reply via email to