All - As of KNOX-358 the gateway can now serve REST requests to the Hadoop java client for data access through webhdfs. We had to pivot from the redirection based mechanism to an internal forwarding of the servlet request since the hadoop client doesn't follow redirects that aren't expected as part of the API.
We may pursue adding that ability to the core java client code but in the mean time forwards work and are actually better in terms of network hops. Currently, we only have support for the Pseudo authentication mechanism but as we add SPNEGO support as an authentication provider in Knox we will have support for both secured and unsecured Hadoop clusters! We need to consider whether the default topology name of "sandbox" is what we want to go with - seems appropriate since we have an OOTB sandbox.xml. We may want to revisit both of those. The following is output from a LISTSTATUS from within a hadoop VM: [vagrant@sandbox knox]$ hadoop fs -fs webhdfs://localhost:8443 -ls /tmp Found 5 items drwxr-xr-x - hue hdfs 0 2014-04-17 07:33 /tmp/hive-beeswax-hue drwxrwxrwx - hive hdfs 0 2014-04-09 12:01 /tmp/hive-hive -rw-r--r-- 1 ambari-qa hdfs 1878 2014-04-09 12:04 /tmp/id000a0f02_date030914 drwxr-xr-x - guest hdfs 0 2014-04-25 14:16 /tmp/ljm drwxrwxrwx - hdfs hdfs 0 2014-04-17 07:32 /tmp/udfs Continued testing of multi-step commands will be happening in the next couple days. I will be adding documentation to the users guide for this functionality and fully expect some of the clunkier parts of this to evolve in the near time. thanks, --larry On Fri, Apr 25, 2014 at 5:14 PM, larry mccay <lmc...@apache.org> wrote: > All - > > At this point, the external REST client to the KNOX Hadoop APIs have > served external clients that are outside the Hadoop perimeter quite well. > > As we move from beyond being a solution for perimeter protection of > external client access to Hadoop, we have to consider the clients that > would like to consume REST APIs from within the cluster or CLIs in bastion > nodes. > > I've filed the following Jira for this support: > https://issues.apache.org/jira/browse/KNOX-353 > > I've created a POC and am in the process of finalizing it as the initial > support these clients. > > The problem at hand is that Knox Hadoop APIs require the gateway_path and > cluster_name context path. This allows Knox to present a single > hierarchical API across all of the managed clusters which is one of the > core tenets of our charter. > > We will continue to provide this single access point and URL but also > provide the ability for adapting URLs used by existing Hadoop clients that > do not yet benefit from this single access point. > > This adaptor is implemented as a simple servlet that redirects all traffic > to the Knox instance's "default" application - that is, the application at > the root context path - to a configured topology deployment. This > configured topology represents the actual cluster access point that hadoop > clients with access to the Knox instance will be requesting resources from. > > The adaptor is installed into a Knox instance with the deployment of an > empty topology file whose name matches that of the configured name for the > defaultTopologyName - otherwise, it defaults top "_default.xml". The act of > deploying the default topology to the topologies directory - results in the > creation of a special adapter web app at the root context for the instance. > All traffic to the adapter will then be redirected to the appropriately > configured topology, otherwise to the default "/gateway/sandbox". > > I will be adding tests for the initial implementation and committing this > in the next day or so. > > We can evolve this approach as needed in follow up jiras but in the mean > time this adapter will usher in the ability to use Knox by mapreduce jobs > and with those existing hadoop clients that can leverage the Hadoop REST > APIs. > > We will need to immediately follow this up with a federation provider that > accepts user.name as the identity assertion. This will allow unsecured > cluster clients to work appropriately. I will likely file a child jira for > this support. > > Finally, support for SPNEGO authentication through a new authentication > provider will allow secured Hadoop cluster use of Knox. We should already > have a jira for SPNEGO support. > > Please feel free to ask questions, raise concerns, etc. > > thanks, > > --larry > >