[
https://issues.apache.org/jira/browse/HDFS-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129108#comment-13129108
]
Alejandro Abdelnur commented on HDFS-2178:
------------------------------------------
*On Sanjay's create and append:*
You are correct, an HDFS proxy deployment does not need to do a redirection (to
a DN); it will be handled itself by the proxy.
Still, for authentication purposes a probing should be done before attempting
uploading data. Because of this the create & append requests are identical in
the hdfs-proxy (hoop) and in the built-in (NN&DN http serving) modes. In the
case of hdfs-proxy the probing is for auth only, in the case of built-in the
probing is for both authentication and potential redirection.
This means that we can have the exact same API for both hdfs-proxy and built-in
modes.
Still the use of 100-continue is an open issue, more of this at the end of this
comment.
*On Sanjay's comment on 'some thoughts of webhdfs & hoop':*
* Support for trusted proxies (doAs functionality) it does make sense in the
case of hdfs-proxy and it is already supported by Hoop. I.e. server-side apps
that need/want HTTP access to HDFS and act on behalf of other users. I.e. for
somebody using the Java API to access HDFS via hdfs-proxy and using a doAs
block.
* Support for delegation tokens to access hdfs-proxy it does make sense. I.e.
when using distcp via hdfs-proxy; in this case, delegation tokens should work
across clusters (this may not be supported today but IMO it should eventually
work).
* You meantion code/param/return clean up. What kind of clean up are you
referring to?
*On Sanjay's 'As we move forward':*
* What subset of webhdfs API makes sense for a proxy? IMO, they should be
identical, a user should not see a difference if they access a built-in or an
hdfs-proxy HTTP setup.
* Regarding a 'pure proxy'. This would be more like a reverse proxy and then
all URLs would have to be relative or resolved with knowledge of the reverse
proxy. IMO, a hdfs-proxy on its own has its merits.
*Open issues:*
1* *Use of 100-CONTINUE for create & append*, it seems not all client HTTP
libraries handle this (JDK HttpURLConnection to start). Plus the servlet API
does not provide support for it, it seems some servlet containers handle it but
in a way that it is non-standard (http://jira.codehaus.org/browse/JETTY-341) or
in a way that it never reaches the servlet
(http://stackoverflow.com/questions/848378/sending-100-continue-using-java-servlet-api).
Because of this I'm inclined to use a handle request as shown in the attached
API doc.
2* *Are we OK with the attached API* (except for the discussion on #1)?
3* *Codebase*, Hoop was using TestNG for testcases and non-apache package
names, I've been working on refactoring to work with JUnit, to refactor package
names and to organize the code in a way that fits in the current source layout.
In the mean time, for webhdfs (built-in http) some code from Hoop has been
cloned, modified and integrated into HDFS. This code has changed significantly,
thus integrating it with Hoop will require some serious rewriting of Hoop.
Giving the current timeframe we are shooting for 0.23, should we add Hoop as a
separate module to have hdfs-proxy like support and later see how merge the
code?
> Contributing Hoop to HDFS, replacement for HDFS proxy with read/write
> capabilities
> ----------------------------------------------------------------------------------
>
> Key: HDFS-2178
> URL: https://issues.apache.org/jira/browse/HDFS-2178
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 0.23.0
> Reporter: Alejandro Abdelnur
> Assignee: Alejandro Abdelnur
> Fix For: 0.23.0
>
> Attachments: HDFSoverHTTP-API.html, HdfsHttpAPI.pdf
>
>
> We'd like to contribute Hoop to Hadoop HDFS as a replacement (an improvement)
> for HDFS Proxy.
> Hoop provides access to all Hadoop Distributed File System (HDFS) operations
> (read and write) over HTTP/S.
> The Hoop server component is a REST HTTP gateway to HDFS supporting all file
> system operations. It can be accessed using standard HTTP tools (i.e. curl
> and wget), HTTP libraries from different programing languages (i.e. Perl,
> Java Script) as well as using the Hoop client. The Hoop server component is a
> standard Java web-application and it has been implemented using Jersey
> (JAX-RS).
> The Hoop client component is an implementation of Hadoop FileSystem client
> that allows using the familiar Hadoop filesystem API to access HDFS data
> through a Hoop server.
> Repo: https://github.com/cloudera/hoop
> Docs: http://cloudera.github.com/hoop
> Blog: http://www.cloudera.com/blog/2011/07/hoop-hadoop-hdfs-over-http/
> Hoop is a Maven based project that depends on Hadoop HDFS and Alfredo (for
> Kerberos HTTP SPNEGO authentication).
> To make the integration easy, HDFS Mavenization (HDFS-2096) would have to be
> done first, as well as the Alfredo contribution (HADOOP-7119).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira