[Hadoop Wiki] Update of "AmazonS3" by DavidPhillips

Apache Wiki Tue, 02 Dec 2008 15:09:52 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by DavidPhillips:
http://wiki.apache.org/hadoop/AmazonS3

The comment on the change is:
updated rename info as HADOOP-3361 was fixed in 0.19.0

------------------------------------------------------------------------------
  Hadoop provides two filesystems that use S3.
  
   S3 Native FileSystem (URI scheme: s3n)::
-  A native filesystem for reading and writing regular files on S3. The 
advantage of this filesystem is that you can access files on S3 that were 
written with other tools. Conversely, other tools can access files written 
using Hadoop. The disadvantage is the 5GB limit on file size imposed by S3. For 
this reason it is not suitable as a replacement for HDFS (which has support for 
very large files). Also, the current implementation does not support renames 
([https://issues.apache.org/jira/browse/HADOOP-3361 HADOOP-3361]).
+  A native filesystem for reading and writing regular files on S3. The 
advantage of this filesystem is that you can access files on S3 that were 
written with other tools. Conversely, other tools can access files written 
using Hadoop. The disadvantage is the 5GB limit on file size imposed by S3. For 
this reason it is not suitable as a replacement for HDFS (which has support for 
very large files).
   
   S3 Block FileSystem (URI scheme: s3)::
   A block-based filesystem backed by S3. Files are stored as blocks, just like 
they are in HDFS. This permits efficient implementation of renames. This 
filesystem requires you to dedicate a bucket for the filesystem - you should 
not use an existing bucket containing files, or write other files to the same 
bucket. The files stored by this filesystem can be larger than 5GB, but they 
are not interoperable with other S3 tools.
@@ -20, +20 @@

  = History =
   * The S3 block filesystem was introduced in Hadoop 0.10.0 
([http://issues.apache.org/jira/browse/HADOOP-574 HADOOP-574]), but this had a 
few bugs so you should use Hadoop 0.10.1 or later.
   * The S3 native filesystem was introduced in Hadoop 0.18.0 
([http://issues.apache.org/jira/browse/HADOOP-930 HADOOP-930])
+  * The S3 native filesystem gained support for rename in Hadoop 0.19.0 
([https://issues.apache.org/jira/browse/HADOOP-3361 HADOOP-3361])
  
  = Setting up hadoop to use S3 as a replacement for HDFS =
  
@@ -44, +45 @@

  
  For the S3 native filesystem, just replace `s3` with `s3n` in the above.
  
- Note that you can not use s3n as a replacement for HDFS since rename is not 
supported.  This should be resolved once 
[https://issues.apache.org/jira/browse/HADOOP-3361 HADOOP-3361] is released.
+ Note that you can not use s3n as a replacement for HDFS on Hadoop versions 
prior to 0.19.0 since rename was not supported.
  
  Alternatively, you can put the access key ID and the secret access key into a 
''s3'' (or ''s3n'') URI as the user info:

[Hadoop Wiki] Update of "AmazonS3" by DavidPhillips

Reply via email to