[jira] Commented: (HADOOP-1563) Create FileSystem implementation to read HDFS data via http

Doug Cutting (JIRA) Sat, 07 Jul 2007 14:03:28 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12510929
 ]


Doug Cutting commented on HADOOP-1563:
--------------------------------------

A couple of thoughts:

1. If, for performance, we find we must cache FileStatus in most 
FileSystem#listPaths implementations, then that means the FileSystem API is 
inappropriate.  In this case, we should replace FileSystem#listPaths() and 
#getFileStatus() with a single new method:

public abstract Map<Path,FileStatus> listStatus(Path path) throws IOException;

2. If, in HttpFileSystem, we find that (e.g., in order to efficiently support 
#listStatus) an HTML-based implementation is insufficient for HDFS, then we 
should not implement other directory formats by subclassing.  Rather 
HttpFileSystem should use plugins for various formats.  That fits the existing 
FileSystem extension mechanism better, which dispatches on protocol only.

The plugin interface might look like:

public interface HttpFileServer {
  /** Set connection properties prior to connect, typically authentication 
headers. */
  void prepareConnection(HttpURLConnection connection);
  /** Parse directory content. */
  Map<Path,FileStatus> parseDirectoryContent(byte[] content);
}

HttpFileSystem would pick an HttpFileServer implementation based hostname, 
content type or something.  Content-type would be elegant, but probably 
insufficient, since, e.g., S3 returns a content-type of application/xml.  
Hostname would require reconfiguration for each site.  Perhaps we can use the 
"Server" header.  That would work for S3, and we could set it for HDFS.

> Create FileSystem implementation to read HDFS data via http
> -----------------------------------------------------------
>
>                 Key: HADOOP-1563
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1563
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Owen O'Malley
>            Assignee: Chris Douglas
>         Attachments: httpfs.patch
>
>
> There should be a FileSystem implementation that can read from a Namenode's 
> http interface. This would have a couple of useful abilities:
>   1. Copy using distcp between different versions of HDFS.
>   2. Use map/reduce inputs from a different version of HDFS. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1563) Create FileSystem implementation to read HDFS data via http

Reply via email to