[jira] Updated: (HADOOP-930) Add support for reading regular (non-block-based) files from S3 in S3FileSystem

Tom White (JIRA) Tue, 06 May 2008 12:19:23 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tom White updated HADOOP-930:
-----------------------------

    Attachment: hadoop-930.patch

Here's a patch for a native S3 filesystem.
 * Writes are supported.
 * The scheme is s3n making it completely independent of the existing 
block-based S3 filesystem. It might be possible to make a general (read-only) 
S3 filesystem that can read both types, but I haven't attempted that here (it 
can go in another Jira if needed).
 * Empty directories are written using the naming convention of appending 
"_$folder$" to the key. This is the approach taken by S3Fox, and - crucially 
for efficiency - it makes it possible to tell if a key represents a file or a 
directory from a list bucket operation.
 * There's a new unit test (FileSystemContractBaseTest) for the contract of 
FileSystem to ensure that different implementations are consistent. Both S3 
filesystems and HDFS are tested using this test. It would be good to add other 
filesystems later.
 * Renames are not supported as S3 doesn't support them natively (yet). It 
would be possible to support renames by getting the client to copy the data out 
of S3 then back again.
 * The Jets3t library has been upgraded to the latest version (0.6.0)

> Add support for reading regular (non-block-based) files from S3 in 
> S3FileSystem
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-930
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Attachments: hadoop-930.patch, jets3t-0.6.0.jar
>
>
> People often have input data on S3 that they want to use for a Map Reduce job 
> and the current S3FileSystem implementation cannot read it since it assumes a 
> block-based format.
> We would add the following metadata to files written by S3FileSystem: an 
> indication that it is block oriented ("S3FileSystem.type=block") and a 
> filesystem version number ("S3FileSystem.version=1.0"). Regular S3 files 
> would not have the type metadata so S3FileSystem would not try to interpret 
> them as inodes.
> An extension to write regular files to S3 would not be covered by this change 
> - we could do this as a separate piece of work (we still need to decide 
> whether to introduce another scheme - e.g. rename block-based S3 to "s3fs" 
> and call regular S3 "s3" - or whether to just use a configuration property to 
> control block-based vs. regular writes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-930) Add support for reading regular (non-block-based) files from S3 in S3FileSystem

Reply via email to