[
https://issues.apache.org/jira/browse/HADOOP-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663431#action_12663431
]
Marco Nicosia commented on HADOOP-5010:
---------------------------------------
bq. Distcp is the best tool today for this. How is it insufficient?
Distcp works for pulling data from a source, or pushing data to a source. In
both cases, distcp implies running a Hadoop job. There is currently no external
way to push data to an HDFS nor pull from an HDFS using an existing standard;
instead anyone wishing to do so must install HDFS clients on computers that do
not otherwise run Hadoop software.
bq. That's possible. An appropriate HTTP-based standard for filesystem access
might be WebDav.
bq. Implementing an accepted standard is a more ambitious project.
I remember previous attempts to make WebDav available, and recognize that as an
ambitious goal.
My naive thought is that HFTP is very close to a much simpler feature. The main
purpose of the HDFS proxy _could be_ to make HDFS files available to a standard
web client (curl, Net::HTTP, etc) to retrieve file listings and file contents
from the HDFS proxy without installing an HDFS client, which is required to
speak H{S}FTP.
The only difference is that HDFS proxy/H{S}FTP have invented an internal way of
exposing this data where existing standards could have been used?
> Replace HFTP/HSFTP with plain HTTP/HTTPS
> ----------------------------------------
>
> Key: HADOOP-5010
> URL: https://issues.apache.org/jira/browse/HADOOP-5010
> Project: Hadoop Core
> Issue Type: Improvement
> Components: contrib/hdfsproxy
> Affects Versions: 0.18.0
> Reporter: Marco Nicosia
>
> In HADOOP-1563, [~cutting] wrote:
> bq. The URI for this should be something like hftp://host:port/a/b/c, since,
> while HTTP will be used as the transport, this will not be a FileSystem for
> arbitrary HTTP urls.
> Recently, we've been talking about implementing an HDFS proxy (HADOOP-4575)
> which would be a secure way to make HFTP/HSFTP available. In so doing, we may
> even remove HFTP/HSFTP from being offered on the HDFS itself (that's another
> discussion).
> In the case of the HDFS proxy, does it make sense to do away with the
> artificial HFTP/HSFTP protocols, and instead simply offer standard HTTP and
> HTTPS? That would allow non-HDFS-specific clients, as well as using various
> standard HTTP infrastructure, such as load balancers, etc.
> NB, to the best of my knowledge, HFTP is only documented on the
> [distcp|http://hadoop.apache.org/core/docs/current/distcp.html] page, and
> HSFTP is not documented at all?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.