[ 
http://issues.apache.org/jira/browse/HADOOP-574?page=comments#action_12448811 ] 
            
Doug Cutting commented on HADOOP-574:
-------------------------------------

Here're some thoughts I sent Jim about this:

DFS stores files as a sequence of ~100MB blocks.  I think a scheme like this 
will be useful for an S3-based FileSystem too.

When creating, each DFS block is first written locally to a temporary file, 
and, only when the block is full (or the file is closed) is the block actually 
written to DFS.  This is instead of trying to trickle things to the network as 
they're written, which can run into timeout issues, etc.  It also means that 
when a block write fails it can be easily retried.

Very large files (up to a terabyte) should be supported.  Breaking things into 
blocks should help here too.  S3 limits an object value to 5GB.  So each file 
can be represented as a set of ~100MB S3 object values.  The set can be listed 
when the file is opened and used to guide seeks and reads of the data.  The 
block number can be placed at the end of the name using a delimiter, so that 
access to metadata is not required when opening files or listing directories.


> want FileSystem implementation for Amazon S3
> --------------------------------------------
>
>                 Key: HADOOP-574
>                 URL: http://issues.apache.org/jira/browse/HADOOP-574
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Doug Cutting
>
> An S3-based Hadoop FileSystem would make a great addition to Hadoop.
> It would facillitate use of Hadoop on Amazon's EC2 computing grid, as 
> discussed here:
> http://www.mail-archive.com/hadoop-user@lucene.apache.org/msg00318.html
> This is related to HADOOP-571, which would make Hadoop's FileSystem 
> considerably easier to extend.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to