[jira] [Commented] (HDFS-2178) Contributing Hoop to HDFS, replacement for HDFS proxy with read/write capabilities

Alejandro Abdelnur (Commented) (JIRA) Mon, 17 Oct 2011 12:29:37 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129108#comment-13129108
 ]


Alejandro Abdelnur commented on HDFS-2178:
------------------------------------------

*On Sanjay's create and append:*

You are correct, an HDFS proxy deployment does not need to do a redirection (to 
a DN); it will be handled itself by the proxy.

Still, for authentication purposes a probing should be done before attempting 
uploading data. Because of this the create & append requests are identical in 
the hdfs-proxy (hoop) and in the built-in (NN&DN http serving) modes. In the 
case of hdfs-proxy the probing is for auth only, in the case of built-in the 
probing is for both authentication and potential redirection.

This means that we can have the exact same API for both hdfs-proxy and built-in 
modes.

Still the use of 100-continue is an open issue, more of this at the end of this 
comment.

*On Sanjay's comment on 'some thoughts of webhdfs & hoop':*

 * Support for trusted proxies (doAs functionality) it does make sense in the 
case of hdfs-proxy and it is already supported by Hoop. I.e. server-side apps 
that need/want HTTP access to HDFS and act on behalf of other users. I.e. for 
somebody using the Java API to access HDFS via hdfs-proxy and using a doAs 
block.

 * Support for delegation tokens to access hdfs-proxy it does make sense. I.e. 
when using distcp via hdfs-proxy; in this case, delegation tokens should work 
across clusters (this may not be supported today but IMO it should eventually 
work).

 * You meantion code/param/return clean up. What kind of clean up are you 
referring to?

*On Sanjay's 'As we move forward':*

 * What subset of webhdfs API makes sense for a proxy? IMO, they should be 
identical, a user should not see a difference if they access a built-in or an 
hdfs-proxy HTTP setup.

 * Regarding a 'pure proxy'. This would be more like a reverse proxy and then 
all URLs would have to be relative or resolved with knowledge of the reverse 
proxy. IMO, a hdfs-proxy on its own has its merits.

*Open issues:*

 1* *Use of 100-CONTINUE for create & append*, it seems not all client HTTP 
libraries handle this (JDK HttpURLConnection to start). Plus the servlet API 
does not provide support for it, it seems some servlet containers handle it but 
in a way that it is non-standard (http://jira.codehaus.org/browse/JETTY-341) or 
in a way that it never reaches the servlet 
(http://stackoverflow.com/questions/848378/sending-100-continue-using-java-servlet-api).
 Because of this I'm inclined to use a handle request as shown in the attached 
API doc.

 2* *Are we OK with the attached API* (except for the discussion on #1)?

 3* *Codebase*, Hoop was using TestNG for testcases and non-apache package 
names, I've been working on refactoring to work with JUnit, to refactor package 
names and to organize the code in a way that fits in the current source layout. 
In the mean time, for webhdfs (built-in http) some code from Hoop has been 
cloned, modified and integrated into HDFS. This code has changed significantly, 
thus integrating it with Hoop will require some serious rewriting of Hoop. 
Giving the current timeframe we are shooting for 0.23, should we add Hoop as a 
separate module to have hdfs-proxy like support and later see how merge the 
code?

                
> Contributing Hoop to HDFS, replacement for HDFS proxy with read/write 
> capabilities
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-2178
>                 URL: https://issues.apache.org/jira/browse/HDFS-2178
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 0.23.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 0.23.0
>
>         Attachments: HDFSoverHTTP-API.html, HdfsHttpAPI.pdf
>
>
> We'd like to contribute Hoop to Hadoop HDFS as a replacement (an improvement) 
> for HDFS Proxy.
> Hoop provides access to all Hadoop Distributed File System (HDFS) operations 
> (read and write) over HTTP/S.
> The Hoop server component is a REST HTTP gateway to HDFS supporting all file 
> system operations. It can be accessed using standard HTTP tools (i.e. curl 
> and wget), HTTP libraries from different programing languages (i.e. Perl, 
> Java Script) as well as using the Hoop client. The Hoop server component is a 
> standard Java web-application and it has been implemented using Jersey 
> (JAX-RS).
> The Hoop client component is an implementation of Hadoop FileSystem client 
> that allows using the familiar Hadoop filesystem API to access HDFS data 
> through a Hoop server.
>   Repo: https://github.com/cloudera/hoop
>   Docs: http://cloudera.github.com/hoop
>   Blog: http://www.cloudera.com/blog/2011/07/hoop-hadoop-hdfs-over-http/
> Hoop is a Maven based project that depends on Hadoop HDFS and Alfredo (for 
> Kerberos HTTP SPNEGO authentication). 
> To make the integration easy, HDFS Mavenization (HDFS-2096) would have to be 
> done first, as well as the Alfredo contribution (HADOOP-7119).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2178) Contributing Hoop to HDFS, replacement for HDFS proxy with read/write capabilities

Reply via email to