[ 
https://issues.apache.org/jira/browse/HADOOP-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663431#action_12663431
 ] 

Marco Nicosia commented on HADOOP-5010:
---------------------------------------

bq. Distcp is the best tool today for this. How is it insufficient?

Distcp works for pulling data from a source, or pushing data to a source. In 
both cases, distcp implies running a Hadoop job. There is currently no external 
way to push data to an HDFS nor pull from an HDFS using an existing standard; 
instead anyone wishing to do so must install HDFS clients on computers that do 
not otherwise run Hadoop software.

bq. That's possible. An appropriate HTTP-based standard for filesystem access 
might be WebDav.
bq. Implementing an accepted standard is a more ambitious project.

I remember previous attempts to make WebDav available, and recognize that as an 
ambitious goal.

My naive thought is that HFTP is very close to a much simpler feature. The main 
purpose of the HDFS proxy _could be_ to make HDFS files available to a standard 
web client (curl, Net::HTTP, etc) to retrieve file listings and file contents 
from the HDFS proxy without installing an HDFS client, which is required to 
speak H{S}FTP.

The only difference is that HDFS proxy/H{S}FTP have invented an internal way of 
exposing this data where existing standards could have been used?


> Replace HFTP/HSFTP with plain HTTP/HTTPS
> ----------------------------------------
>
>                 Key: HADOOP-5010
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5010
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hdfsproxy
>    Affects Versions: 0.18.0
>            Reporter: Marco Nicosia
>
> In HADOOP-1563, [~cutting] wrote:
> bq. The URI for this should be something like hftp://host:port/a/b/c, since, 
> while HTTP will be used as the transport, this will not be a FileSystem for 
> arbitrary HTTP urls.
> Recently, we've been talking about implementing an HDFS proxy (HADOOP-4575) 
> which would be a secure way to make HFTP/HSFTP available. In so doing, we may 
> even remove HFTP/HSFTP from being offered on the HDFS itself (that's another 
> discussion).
> In the case of the HDFS proxy, does it make sense to do away with the 
> artificial HFTP/HSFTP protocols, and instead simply offer standard HTTP and 
> HTTPS? That would allow non-HDFS-specific clients, as well as using various 
> standard HTTP infrastructure, such as load balancers, etc.
> NB, to the best of my knowledge, HFTP is only documented on the 
> [distcp|http://hadoop.apache.org/core/docs/current/distcp.html] page, and 
> HSFTP is not documented at all?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to