Added: knox/site/books/knox-1-0-0/user-guide.html
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-1-0-0/user-guide.html?rev=1823389&view=auto
==============================================================================
--- knox/site/books/knox-1-0-0/user-guide.html (added)
+++ knox/site/books/knox-1-0-0/user-guide.html Tue Feb  6 20:46:11 2018
@@ -0,0 +1,7068 @@
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--><p><link href="book.css" rel="stylesheet"/></p><p><img src="knox-logo.gif" 
alt="Knox"/> <!-- <img src="apache-logo.gif" alt="Apache"/> --> <img 
src="apache-logo.gif" align="right" alt="Apache"/></p><h1><a 
id="Apache+Knox+Gateway+1.0.x+User's+Guide">Apache Knox Gateway 1.0.x 
User&rsquo;s Guide</a> <a href="#Apache+Knox+Gateway+1.0.x+User's+Guide"><img 
src="markbook-section-link.png"/></a></h1><h2><a id="Table+Of+Contents">Table 
Of Contents</a> <a href="#Table+Of+Contents"><img 
src="markbook-section-link.png"/></a></h2>
+<ul>
+  <li><a href="#Introduction">Introduction</a></li>
+  <li><a href="#Quick+Start">Quick Start</a></li>
+  <li><a href="#Gateway+Samples">Gateway Samples</a></li>
+  <li><a href="#Apache+Knox+Details">Apache Knox Details</a>
+  <ul>
+    <li><a href="#Apache+Knox+Directory+Layout">Apache Knox Directory 
Layout</a></li>
+    <li><a href="#Supported+Services">Supported Services</a></li>
+  </ul></li>
+  <li><a href="#Gateway+Details">Gateway Details</a>
+  <ul>
+    <li><a href="#URL+Mapping">URL Mapping</a>
+    <ul>
+      <li><a href="#Default+Topology+URLs">Default Topology URLs</a></li>
+      <li><a href="#Fully+Qualified+URLs">Fully Qualified URLs</a></li>
+      <li><a href="#Topology+Port+Mapping">Topology Port Mapping</a></li>
+    </ul></li>
+    <li><a href="#Configuration">Configuration</a>
+    <ul>
+      <li><a href="#Gateway+Server+Configuration">Gateway Server 
Configuration</a></li>
+      <li><a href="#Simplified+Topology+Descriptors">Simplified Topology 
Descriptors</a></li>
+      <li><a href="#Externalized+Provider+Configurations">Externalized 
Provider Configurations</a></li>
+      <li><a href="#Sharing+HA+Providers">Sharing HA Providers</a></li>
+      <li><a href="#Simplified+Descriptor+Files">Simplified Descriptor 
Files</a></li>
+      <li><a href="#Cluster+Configuration+Monitoring">Cluster Configuration 
Monitoring</a></li>
+      <li><a href="#Remote+Configuration+Monitor">Remote Configuration 
Monitor</a></li>
+      <li><a href="#Remote+Configuration+Registry+Clients">Remote 
Configuration Registry Clients</a></li>
+      <li><a href="#Topology+Descriptors">Topology Descriptors</a></li>
+      <li><a href="#Hostmap+Provider">Hostmap Provider</a></li>
+    </ul></li>
+    <li><a href="#Knox+CLI">Knox CLI</a></li>
+    <li><a href="#Admin+API">Admin API</a></li>
+    <li><a href="#X-Forwarded-*+Headers+Support">X-Forwarded-* Headers 
Support</a></li>
+    <li><a href="#Metrics">Metrics</a></li>
+  </ul></li>
+  <li><a href="#Authentication">Authentication</a>
+  <ul>
+    <li><a href="#Advanced+LDAP+Authentication">Advanced LDAP 
Authentication</a></li>
+    <li><a href="#LDAP+Authentication+Caching">LDAP Authentication 
Caching</a></li>
+    <li><a href="#LDAP+Group+Lookup">LDAP Group Lookup</a></li>
+    <li><a href="#LDAP+Group+Lookup">LDAP Group Lookup</a></li>
+    <li><a href="#PAM+based+Authentication">PAM based Authentication</a></li>
+    <li><a href="#HadoopAuth+Authentication+Provider">HadoopAuth 
Authentication Provider</a></li>
+    <li><a href="#Preauthenticated+SSO+Provider">Preauthenticated SSO 
Provider</a></li>
+    <li><a href="#SSO+Cookie+Provider">SSO Cookie Provider</a></li>
+    <li><a href="#JWT+Provider">JWT Provider</a></li>
+    <li><a href="#Pac4j+Provider+-+CAS+/+OAuth+/+SAML+/+OpenID+Connect">Pac4j 
Provider - CAS / OAuth / SAML / OpenID Connect</a></li>
+    <li><a href="#KnoxSSO+Setup+and+Configuration">KnoxSSO Setup and 
Configuration</a></li>
+    <li><a href="#KnoxToken+Configuration">KnoxToken Configuration</a></li>
+    <li><a href="#Mutual+Authentication+with+SSL">Mutual Authentication with 
SSL</a></li>
+  </ul></li>
+  <li><a href="#Authorization">Authorization</a></li>
+  <li><a href="#Identity+Assertion">Identity Assertion</a>
+  <ul>
+    <li><a href="#Default+Identity+Assertion+Provider">Default Identity 
Assertion Provider</a></li>
+    <li><a href="#Concat+Identity+Assertion+Provider">Concat Identity 
Assertion Provider</a></li>
+    <li><a href="#SwitchCase+Identity+Assertion+Provider">SwitchCase Identity 
Assertion Provider</a></li>
+    <li><a href="#Regular+Expression+Identity+Assertion+Provider">Regular 
Expression Identity Assertion Provider</a></li>
+    <li><a href="#Hadoop+Group+Lookup+Provider">Hadoop Group Lookup 
Provider</a></li>
+  </ul></li>
+  <li><a href="#Secure+Clusters">Secure Clusters</a></li>
+  <li><a href="#High+Availability">High Availability</a></li>
+  <li><a href="#Web+App+Security+Provider">Web App Security Provider</a>
+  <ul>
+    <li><a href="#CSRF">CSRF</a></li>
+    <li><a href="#CORS">CORS</a></li>
+    <li><a href="#X-Frame-Options">X-Frame-Options</a></li>
+    <li><a href="#HTTP+Strict-Tranport-Security+-+HSTS">HTTP 
Strict-Tranport-Security - HSTS</a></li>
+  </ul></li>
+  <li><a href="#Websocket+Support">Websocket Support</a></li>
+  <li><a href="#Audit">Audit</a></li>
+  <li><a href="#Client+Details">Client Details</a>
+  <ul>
+    <li><a href="#Client+Quickstart">Client Quickstart</a></li>
+    <li><a href="#Client+Token+Sessions">Client Token Sessions</a>
+    <ul>
+      <li><a href="#Server+Setup">Server Setup</a></li>
+    </ul></li>
+    <li><a href="#Client+DSL+and+SDK+Details">Client DSL and SDK 
Details</a></li>
+  </ul></li>
+  <li><a href="#Service+Details">Service Details</a>
+  <ul>
+    <li><a href="#WebHDFS">WebHDFS</a></li>
+    <li><a href="#WebHCat">WebHCat</a></li>
+    <li><a href="#Oozie">Oozie</a></li>
+    <li><a href="#HBase">HBase</a></li>
+    <li><a href="#Hive">Hive</a></li>
+    <li><a href="#Yarn">Yarn</a></li>
+    <li><a href="#Kafka">Kafka</a></li>
+    <li><a href="#Storm">Storm</a></li>
+    <li><a href="#SOLR">SOLR</a></li>
+    <li><a href="#Avatica">Avatica</a></li>
+    <li><a href="#Livy+Server">Livy Server</a></li>
+    <li><a href="#Common+Service+Config">Common Service Config</a></li>
+    <li><a href="#Default+Service+HA+support">Default Service HA 
support</a></li>
+  </ul></li>
+  <li><a href="#UI+Service+Details">UI Service Details</a></li>
+  <li><a href="#Admin+UI">Admin UI</a></li>
+  <li><a href="#Limitations">Limitations</a></li>
+  <li><a href="#Troubleshooting">Troubleshooting</a></li>
+  <li><a href="#Export+Controls">Export Controls</a></li>
+</ul><h2><a id="Introduction">Introduction</a> <a href="#Introduction"><img 
src="markbook-section-link.png"/></a></h2><p>The Apache Knox Gateway is a 
system that provides a single point of authentication and access for Apache 
Hadoop services in a cluster. The goal is to simplify Hadoop security for both 
users (i.e. who access the cluster data and execute jobs) and operators (i.e. 
who control access and manage the cluster). The gateway runs as a server (or 
cluster of servers) that provide centralized access to one or more Hadoop 
clusters. In general the goals of the gateway are as follows:</p>
+<ul>
+  <li>Provide perimeter security for Hadoop REST APIs to make Hadoop security 
easier to setup and use
+  <ul>
+    <li>Provide authentication and token verification at the perimeter</li>
+    <li>Enable authentication integration with enterprise and cloud identity 
management systems</li>
+    <li>Provide service level authorization at the perimeter</li>
+  </ul></li>
+  <li>Expose a single URL hierarchy that aggregates REST APIs of a Hadoop 
cluster
+  <ul>
+    <li>Limit the network endpoints (and therefore firewall holes) required to 
access a Hadoop cluster</li>
+    <li>Hide the internal Hadoop cluster topology from potential attackers</li>
+  </ul></li>
+</ul>
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--><h2><a id="Quick+Start">Quick Start</a> <a href="#Quick+Start"><img 
src="markbook-section-link.png"/></a></h2><p>Here are the steps to have Apache 
Knox up and running against a Hadoop Cluster:</p>
+<ol>
+  <li>Verify system requirements</li>
+  <li>Download a virtual machine (VM) with Hadoop</li>
+  <li>Download Apache Knox Gateway</li>
+  <li>Start the virtual machine with Hadoop</li>
+  <li>Install Knox</li>
+  <li>Start the LDAP embedded within Knox</li>
+  <li>Start the Knox Gateway</li>
+  <li>Do Hadoop with Knox</li>
+</ol><h3><a id="1+-+Requirements">1 - Requirements</a> <a 
href="#1+-+Requirements"><img src="markbook-section-link.png"/></a></h3><h4><a 
id="Java">Java</a> <a href="#Java"><img 
src="markbook-section-link.png"/></a></h4><p>Java 1.7 or later is required for 
the Knox Gateway runtime. Use the command below to check the version of Java 
installed on the system where Knox will be running.</p>
+<pre><code>java -version
+</code></pre><h4><a id="Hadoop">Hadoop</a> <a href="#Hadoop"><img 
src="markbook-section-link.png"/></a></h4><p>Knox 1.0.0 supports Hadoop 3.x, 
the quick start instructions assume a Hadoop 2.x virtual machine based 
environment.</p><h3><a id="2+-+Download+Hadoop+2.x+VM">2 - Download Hadoop 2.x 
VM</a> <a href="#2+-+Download+Hadoop+2.x+VM"><img 
src="markbook-section-link.png"/></a></h3><p>The quick start provides a link to 
download Hadoop 2.0 based Hortonworks virtual machine <a 
href="http://hortonworks.com/products/hdp-2/#install";>Sandbox</a>. Please note 
Knox supports other Hadoop distributions and is configurable against a 
full-blown Hadoop cluster. Configuring Knox for Hadoop 2.x version, or Hadoop 
deployed in EC2 or a custom Hadoop cluster is documented in advance deployment 
guide.</p><h3><a id="3+-+Download+Apache+Knox+Gateway">3 - Download Apache Knox 
Gateway</a> <a href="#3+-+Download+Apache+Knox+Gateway"><img 
src="markbook-section-link.png"/></a></h3><p>Download one of the dist
 ributions below from the <a 
href="http://www.apache.org/dyn/closer.cgi/knox";>Apache mirrors</a>.</p>
+<ul>
+  <li>Source archive: <a 
href="http://www.apache.org/dyn/closer.cgi/knox/1.0.0/knox-1.0.0-src.zip";>knox-1.0.0-src.zip</a>
 (<a href="http://www.apache.org/dist/knox/1.0.0/knox-1.0.0-src.zip.asc";>PGP 
signature</a>, <a 
href="http://www.apache.org/dist/knox/1.0.0/knox-1.0.0-src.zip.sha";>SHA1 
digest</a>, <a 
href="http://www.apache.org/dist/knox/1.0.0/knox-1.0.0-src.zip.md5";>MD5 
digest</a>)</li>
+  <li>Binary archive: <a 
href="http://www.apache.org/dyn/closer.cgi/knox/1.0.0/knox-1.0.0.zip";>knox-1.0.0.zip</a>
 (<a href="http://www.apache.org/dist/knox/1.0.0/knox-1.0.0.zip.asc";>PGP 
signature</a>, <a 
href="http://www.apache.org/dist/knox/1.0.0/knox-1.0.0.zip.sha";>SHA1 
digest</a>, <a 
href="http://www.apache.org/dist/knox/1.0.0/knox-1.0.0.zip.md5";>MD5 
digest</a>)</li>
+</ul><p>Apache Knox Gateway releases are available under the <a 
href="http://www.apache.org/licenses/LICENSE-2.0";>Apache License, Version 
2.0</a>. See the NOTICE file contained in each release artifact for applicable 
copyright attribution notices.</p><h3><a id="Verify">Verify</a> <a 
href="#Verify"><img src="markbook-section-link.png"/></a></h3><p>While 
recommended, verify is an optional step. You can verify the integrity of any 
downloaded files using the PGP signatures. Please read <a 
href="http://httpd.apache.org/dev/verification.html";>Verifying Apache HTTP 
Server Releases</a> for more information on why you should verify our 
releases.</p><p>The PGP signatures can be verified using PGP or GPG. First 
download the <a 
href="https://dist.apache.org/repos/dist/release/knox/KEYS";>KEYS</a> file as 
well as the .asc signature files for the relevant release packages. Make sure 
you get these files from the main distribution directory linked above, rather 
than from a mirror. Then verify the si
 gnatures using one of the methods below.</p>
+<pre><code>% pgpk -a KEYS
+% pgpv knox-1.0.0.zip.asc
+</code></pre><p>or</p>
+<pre><code>% pgp -ka KEYS
+% pgp knox-1.0.0.zip.asc
+</code></pre><p>or</p>
+<pre><code>% gpg --import KEYS
+% gpg --verify knox-1.0.0.zip.asc
+</code></pre><h3><a id="4+-+Start+Hadoop+virtual+machine">4 - Start Hadoop 
virtual machine</a> <a href="#4+-+Start+Hadoop+virtual+machine"><img 
src="markbook-section-link.png"/></a></h3><p>Start the Hadoop virtual 
machine.</p><h3><a id="5+-+Install+Knox">5 - Install Knox</a> <a 
href="#5+-+Install+Knox"><img src="markbook-section-link.png"/></a></h3><p>The 
steps required to install the gateway will vary depending upon which 
distribution format (zip | rpm) was downloaded. In either case you will end up 
with a directory where the gateway is installed. This directory will be 
referred to as your <code>{GATEWAY_HOME}</code> throughout this 
document.</p><h4><a id="ZIP">ZIP</a> <a href="#ZIP"><img 
src="markbook-section-link.png"/></a></h4><p>If you downloaded the Zip 
distribution you can simply extract the contents into a directory. The example 
below provides a command that can be executed to do this. Note the 
<code>{VERSION}</code> portion of the command must be replaced with an actual 
Apa
 che Knox Gateway version number. This might be 1.0.0 for example.</p>
+<pre><code>unzip knox-{VERSION}.zip
+</code></pre><p>This will create a directory <code>knox-{VERSION}</code> in 
your current directory. The directory <code>knox-{VERSION}</code> will 
considered your <code>{GATEWAY_HOME}</code></p><h3><a 
id="6+-+Start+LDAP+embedded+in+Knox">6 - Start LDAP embedded in Knox</a> <a 
href="#6+-+Start+LDAP+embedded+in+Knox"><img 
src="markbook-section-link.png"/></a></h3><p>Knox comes with an LDAP server for 
demonstration purposes. Note: If the tool used to extract the contents of the 
Tar or tar.gz file was not capable of making the files in the bin directory 
executable</p>
+<pre><code>cd {GATEWAY_HOME}
+bin/ldap.sh start
+</code></pre><h3><a id="7+-+Create+the+Master+Secret">7 - Create the Master 
Secret</a> <a href="#7+-+Create+the+Master+Secret"><img 
src="markbook-section-link.png"/></a></h3><p>Run the knoxcli create-master 
command in order to persist the master secret that is used to protect the key 
and credential stores for the gateway instance.</p>
+<pre><code>cd {GATEWAY_HOME}
+bin/knoxcli.sh create-master
+</code></pre><p>The cli will prompt you for the master secret (i.e. 
password).</p><h3><a id="7+-+Start+Knox">7 - Start Knox</a> <a 
href="#7+-+Start+Knox"><img src="markbook-section-link.png"/></a></h3><p>The 
gateway can be started using the provided shell script.</p><p>The server will 
discover the persisted master secret during start up and complete the setup 
process for demo installs. A demo install will consist of a knox gateway 
instance with an identity certificate for localhost. This will require clients 
to be on the same machine or to turn off hostname verification. For more 
involved deployments, See the Knox CLI section of this document for additional 
configuration options, including the ability to create a self-signed 
certificate for a specific hostname.</p>
+<pre><code>cd {GATEWAY_HOME}
+bin/gateway.sh start
+</code></pre><p>When starting the gateway this way the process will be run in 
the background. The log files will be written to {GATEWAY_HOME}/logs and the 
process ID files (PIDS) will b written to {GATEWAY_HOME}/pids.</p><p>In order 
to stop a gateway that was started with the script use this command.</p>
+<pre><code>cd {GATEWAY_HOME}
+bin/gateway.sh stop
+</code></pre><p>If for some reason the gateway is stopped other than by using 
the command above you may need to clear the tracking PID.</p>
+<pre><code>cd {GATEWAY_HOME}
+bin/gateway.sh clean
+</code></pre><p><strong>NOTE: This command will also clear any .out and .err 
file from the {GATEWAY_HOME}/logs directory so use this with 
caution.</strong></p><h3><a id="8+-+Do+Hadoop+with+Knox">8 - Do Hadoop with 
Knox</a> <a href="#8+-+Do+Hadoop+with+Knox"><img 
src="markbook-section-link.png"/></a></h3><h4><a 
id="Invoke+the+LISTSTATUS+operation+on+WebHDFS+via+the+gateway.">Invoke the 
LISTSTATUS operation on WebHDFS via the gateway.</a> <a 
href="#Invoke+the+LISTSTATUS+operation+on+WebHDFS+via+the+gateway."><img 
src="markbook-section-link.png"/></a></h4><p>This will return a directory 
listing of the root (i.e. /) directory of HDFS.</p>
+<pre><code>curl -i -k -u guest:guest-password -X GET \
+    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/?op=LISTSTATUS&#39;
+</code></pre><p>The results of the above command should result in something to 
along the lines of the output below. The exact information returned is subject 
to the content within HDFS in your Hadoop cluster. Successfully executing this 
command at a minimum proves that the gateway is properly configured to provide 
access to WebHDFS. It does not necessarily provide that any of the other 
services are correct configured to be accessible. To validate that see the 
sections for the individual services in <a href="#Service+Details">Service 
Details</a>.</p>
+<pre><code>HTTP/1.1 200 OK
+Content-Type: application/json
+Content-Length: 760
+Server: Jetty(6.1.26)
+
+{&quot;FileStatuses&quot;:{&quot;FileStatus&quot;:[
+{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350595859762,&quot;owner&quot;:&quot;hdfs&quot;,&quot;pathSuffix&quot;:&quot;apps&quot;,&quot;permission&quot;:&quot;755&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;},
+{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;mapred&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350595874024,&quot;owner&quot;:&quot;mapred&quot;,&quot;pathSuffix&quot;:&quot;mapred&quot;,&quot;permission&quot;:&quot;755&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;},
+{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350596040075,&quot;owner&quot;:&quot;hdfs&quot;,&quot;pathSuffix&quot;:&quot;tmp&quot;,&quot;permission&quot;:&quot;777&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;},
+{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350595857178,&quot;owner&quot;:&quot;hdfs&quot;,&quot;pathSuffix&quot;:&quot;user&quot;,&quot;permission&quot;:&quot;755&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;}
+]}}
+</code></pre><h4><a id="Put+a+file+in+HDFS+via+Knox.">Put a file in HDFS via 
Knox.</a> <a href="#Put+a+file+in+HDFS+via+Knox."><img 
src="markbook-section-link.png"/></a></h4>
+<pre><code>curl -i -k -u guest:guest-password -X PUT \
+    
&#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/LICENSE?op=CREATE&#39;
+
+curl -i -k -u guest:guest-password -T LICENSE -X PUT \
+    &#39;{Value of Location header from response   above}&#39;
+</code></pre><h4><a id="Get+a+file+in+HDFS+via+Knox.">Get a file in HDFS via 
Knox.</a> <a href="#Get+a+file+in+HDFS+via+Knox."><img 
src="markbook-section-link.png"/></a></h4>
+<pre><code>curl -i -k -u guest:guest-password -X GET \
+    
&#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/LICENSE?op=OPEN&#39;
+
+curl -i -k -u guest:guest-password -X GET \
+    &#39;{Value of Location header from command response above}&#39;
+</code></pre><h2><a id="Apache+Knox+Details">Apache Knox Details</a> <a 
href="#Apache+Knox+Details"><img 
src="markbook-section-link.png"/></a></h2><p>This section provides everything 
you need to know to get the Knox gateway up and running against a Hadoop 
cluster.</p><h4><a id="Hadoop">Hadoop</a> <a href="#Hadoop"><img 
src="markbook-section-link.png"/></a></h4><p>An existing Hadoop 2.x cluster is 
required for Knox to sit in front of and protect. It is possible to use a 
Hadoop cluster deployed on EC2 but this will require additional configuration 
not covered here. It is also possible to protect access to a services of a 
Hadoop cluster that is secured with Kerberos. This too requires additional 
configuration that is described in other sections of this guide. See <a 
href="#Supported+Services">Supported Services</a> for details on what is 
supported for this release.</p><p>The Hadoop cluster should be ensured to have 
at least WebHDFS, WebHCat (i.e. Templeton) and Oozie configured, deploy
 ed and running. HBase/Stargate and Hive can also be accessed via the Knox 
Gateway given the proper versions and configuration.</p><p>The instructions 
that follow assume a few things:</p>
+<ol>
+  <li>The gateway is <em>not</em> collocated with the Hadoop clusters 
themselves.</li>
+  <li>The host names and IP addresses of the cluster services are accessible 
by the gateway where ever it happens to be running.</li>
+</ol><p>All of the instructions and samples provided here are tailored and 
tested to work &ldquo;out of the box&rdquo; against a <a 
href="http://hortonworks.com/products/hortonworks-sandbox";>Hortonworks Sandbox 
2.x VM</a>.</p><h4><a id="Apache+Knox+Directory+Layout">Apache Knox Directory 
Layout</a> <a href="#Apache+Knox+Directory+Layout"><img 
src="markbook-section-link.png"/></a></h4><p>Knox can be installed by expanding 
the zip/archive file.</p><p>The table below provides a brief explanation of the 
important files and directories within <code>{GATEWAY_HOME}</code></p>
+<table>
+  <thead>
+    <tr>
+      <th>Directory </th>
+      <th>Purpose </th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>conf/ </td>
+      <td>Contains configuration files that apply to the gateway globally 
(i.e. not cluster specific ). </td>
+    </tr>
+    <tr>
+      <td>data/ </td>
+      <td>Contains security and topology specific artifacts that require 
read/write access at runtime </td>
+    </tr>
+    <tr>
+      <td>conf/topologies/ </td>
+      <td>Contains topology files that represent Hadoop clusters which the 
gateway uses to deploy cluster proxies </td>
+    </tr>
+    <tr>
+      <td>data/security/ </td>
+      <td>Contains the persisted master secret and keystore dir </td>
+    </tr>
+    <tr>
+      <td>data/security/keystores/ </td>
+      <td>Contains the gateway identity keystore and credential stores for the 
gateway and each deployed cluster topology </td>
+    </tr>
+    <tr>
+      <td>data/services </td>
+      <td>Contains service behavior definitions for the services currently 
supported. </td>
+    </tr>
+    <tr>
+      <td>bin/ </td>
+      <td>Contains the executable shell scripts, batch files and JARs for 
clients and servers. </td>
+    </tr>
+    <tr>
+      <td>data/deployments/ </td>
+      <td>Contains deployed cluster topologies used to protect access to 
specific Hadoop clusters. </td>
+    </tr>
+    <tr>
+      <td>lib/ </td>
+      <td>Contains the JARs for all the components that make up the gateway. 
</td>
+    </tr>
+    <tr>
+      <td>dep/ </td>
+      <td>Contains the JARs for all of the components upon which the gateway 
depends. </td>
+    </tr>
+    <tr>
+      <td>ext/ </td>
+      <td>A directory where user supplied extension JARs can be placed to 
extends the gateways functionality. </td>
+    </tr>
+    <tr>
+      <td>pids/ </td>
+      <td>Contains the process ids for running ldap and gateway servers </td>
+    </tr>
+    <tr>
+      <td>samples/ </td>
+      <td>Contains a number of samples that can be used to explore the 
functionality of the gateway. </td>
+    </tr>
+    <tr>
+      <td>templates/ </td>
+      <td>Contains default configuration files that can be copied and 
customized. </td>
+    </tr>
+    <tr>
+      <td>README </td>
+      <td>Provides basic information about the Apache Knox Gateway. </td>
+    </tr>
+    <tr>
+      <td>ISSUES </td>
+      <td>Describes significant know issues. </td>
+    </tr>
+    <tr>
+      <td>CHANGES </td>
+      <td>Enumerates the changes between releases. </td>
+    </tr>
+    <tr>
+      <td>LICENSE </td>
+      <td>Documents the license under which this software is provided. </td>
+    </tr>
+    <tr>
+      <td>NOTICE </td>
+      <td>Documents required attribution notices for included dependencies. 
</td>
+    </tr>
+  </tbody>
+</table><h3><a id="Supported+Services">Supported Services</a> <a 
href="#Supported+Services"><img 
src="markbook-section-link.png"/></a></h3><p>This table enumerates the versions 
of various Hadoop services that have been tested to work with the Knox 
Gateway.</p>
+<table>
+  <thead>
+    <tr>
+      <th>Service </th>
+      <th>Version </th>
+      <th>Non-Secure </th>
+      <th>Secure </th>
+      <th>HA </th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>WebHDFS </td>
+      <td>2.4.0 </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="check.png"  alt="y"/></td>
+    </tr>
+    <tr>
+      <td>WebHCat/Templeton </td>
+      <td>0.13.0 </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="check.png"  alt="y"/></td>
+    </tr>
+    <tr>
+      <td>Oozie </td>
+      <td>4.0.0 </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="check.png"  alt="y"/></td>
+    </tr>
+    <tr>
+      <td>HBase </td>
+      <td>0.98.0 </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="check.png"  alt="y"/></td>
+    </tr>
+    <tr>
+      <td>Hive (via WebHCat) </td>
+      <td>0.13.0 </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="check.png"  alt="y"/></td>
+    </tr>
+    <tr>
+      <td>Hive (via JDBC/ODBC) </td>
+      <td>0.13.0 </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="check.png"  alt="y"/></td>
+    </tr>
+    <tr>
+      <td>Yarn ResourceManager </td>
+      <td>2.5.0 </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="error.png"  alt="n"/></td>
+    </tr>
+    <tr>
+      <td>Kafka (via REST Proxy) </td>
+      <td>0.10.0 </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="check.png"  alt="y"/></td>
+    </tr>
+    <tr>
+      <td>Storm </td>
+      <td>0.9.3 </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="error.png"  alt="n"/> </td>
+      <td><img src="error.png"  alt="n"/></td>
+    </tr>
+    <tr>
+      <td>SOLR </td>
+      <td>5.5+ and 6+ </td>
+      <td><img src="check.png"  alt="y"/> </td>
+      <td><img src="error.png"  alt="n"/> </td>
+      <td><img src="check.png"  alt="y"/></td>
+    </tr>
+  </tbody>
+</table><h3><a id="More+Examples">More Examples</a> <a 
href="#More+Examples"><img src="markbook-section-link.png"/></a></h3><p>These 
examples provide more detail about how to access various Apache Hadoop services 
via the Apache Knox Gateway.</p>
+<ul>
+  <li><a href="#WebHDFS+Examples">WebHDFS Examples</a></li>
+  <li><a href="#WebHCat+Examples">WebHCat Examples</a></li>
+  <li><a href="#Oozie+Examples">Oozie Examples</a></li>
+  <li><a href="#HBase+Examples">HBase Examples</a></li>
+  <li><a href="#Hive+Examples">Hive Examples</a></li>
+  <li><a href="#Yarn+Examples">Yarn Examples</a></li>
+  <li><a href="#Storm+Examples">Storm Examples</a></li>
+</ul><h3><a id="Gateway+Samples">Gateway Samples</a> <a 
href="#Gateway+Samples"><img src="markbook-section-link.png"/></a></h3><p>The 
purpose of the samples within the {GATEWAY_HOME}/samples directory is to 
demonstrate the capabilities of the Apache Knox Gateway to provide access to 
the numerous APIs that are available from the service components of a Hadoop 
cluster.</p><p>Depending on exactly how your Knox installation was done, there 
will be some number of steps required in order fully install and configure the 
samples for use.</p><p>This section will help describe the assumptions of the 
samples and the steps to get them to work in a couple of different deployment 
scenarios.</p><h4><a id="Assumptions+of+the+Samples">Assumptions of the 
Samples</a> <a href="#Assumptions+of+the+Samples"><img 
src="markbook-section-link.png"/></a></h4><p>The samples were initially written 
with the intent of working out of the box for the various Hadoop demo 
environments that are deployed as a single no
 de cluster inside of a VM. The following assumptions were made from that 
context and should be understood in order to get the samples to work in other 
deployment scenarios:</p>
+<ul>
+  <li>That there is a valid java JDK on the PATH for executing the samples</li>
+  <li>The Knox Demo LDAP server is running on localhost and port 33389 which 
is the default port for the ApacheDS LDAP server.</li>
+  <li>That the LDAP directory in use has a set of demo users provisioned with 
the convention of username and username&ldquo;-password&rdquo; as the password. 
Most of the samples have some variation of this pattern with 
&ldquo;guest&rdquo; and &ldquo;guest-password&rdquo;.</li>
+  <li>That the Knox Gateway instance is running on the same machine which you 
will be running the samples from - therefore &ldquo;localhost&rdquo; and that 
the default port of &ldquo;8443&rdquo; is being used.</li>
+  <li>Finally, that there is a properly provisioned sandbox.xml topology in 
the <code>{GATEWAY_HOME}/conf/topologies</code> directory that is configured to 
point to the actual host and ports of running service components.</li>
+</ul><h4><a id="Steps+for+Demo+Single+Node+Clusters">Steps for Demo Single 
Node Clusters</a> <a href="#Steps+for+Demo+Single+Node+Clusters"><img 
src="markbook-section-link.png"/></a></h4><p>There should be little to do if 
anything in a demo environment that has been provisioned with illustrating the 
use of Apache Knox.</p><p>However, the following items will be worth ensuring 
before you start:</p>
+<ol>
+  <li>The sandbox.xml topology is configured properly for the deployed 
services</li>
+  <li>That there is a LDAP server running with guest/guest-password user 
available in the directory</li>
+</ol><h4><a id="Steps+for+Ambari+Deployed+Knox+Gateway">Steps for Ambari 
Deployed Knox Gateway</a> <a 
href="#Steps+for+Ambari+Deployed+Knox+Gateway"><img 
src="markbook-section-link.png"/></a></h4><p>Apache Knox instances that are 
under the management of Ambari are generally assumed not to be demo instances. 
These instances are in place to facilitate development, testing or production 
Hadoop clusters.</p><p>The Knox samples can however be made to work with Ambari 
managed Knox instances with a few steps:</p>
+<ol>
+  <li>You need to have ssh access to the environment in order for the 
localhost assumption within the samples to be valid.</li>
+  <li>The Knox Demo LDAP Server is started - you can start it from Ambari</li>
+  <li>The default.xml topology file can be copied to sandbox.xml in order to 
satisfy the topology name assumption in the samples.</li>
+  <li><p>Be sure to use an actual Java JRE to run the sample with something 
like:</p><p>/usr/jdk64/jdk1.7.0_67/bin/java -jar bin/shell.jar 
samples/ExampleWebHdfsLs.groovy</p></li>
+</ol><h4><a id="Steps+for+a+Manually+Installed+Knox+Gateway">Steps for a 
Manually Installed Knox Gateway</a> <a 
href="#Steps+for+a+Manually+Installed+Knox+Gateway"><img 
src="markbook-section-link.png"/></a></h4><p>For manually installed Knox 
instances, there is really no way for the installer to know how to configure 
the topology file for you.</p><p>Essentially, these steps are identical to the 
Ambari deployed instance except that #3 should be replaced with the 
configuration of the out of the box sandbox.xml to point the configuration at 
the proper hosts and ports.</p>
+<ol>
+  <li>You need to have ssh access to the environment in order for the 
localhost assumption within the samples to be valid.</li>
+  <li>The Knox Demo LDAP Server is started - you can start it from Ambari</li>
+  <li>Change the hosts and ports within the 
<code>{GATEWAY_HOME}/conf/topologies/sandbox.xml</code> to reflect your actual 
cluster service locations.</li>
+  <li><p>Be sure to use an actual Java JRE to run the sample with something 
like:</p><p>/usr/jdk64/jdk1.7.0_67/bin/java -jar bin/shell.jar 
samples/ExampleWebHdfsLs.groovy</p></li>
+</ol>
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--><h2><a id="Gateway+Details">Gateway Details</a> <a 
href="#Gateway+Details"><img src="markbook-section-link.png"/></a></h2><p>This 
section describes the details of the Knox Gateway itself. Including: </p>
+<ul>
+  <li>How URLs are mapped between a gateway that services multiple Hadoop 
clusters and the clusters themselves</li>
+  <li>How the gateway is configured through gateway-site.xml and cluster 
specific topology files</li>
+  <li>How to configure the various policy enforcement provider features such 
as authentication, authorization, auditing, hostmapping, etc.</li>
+</ul><h3><a id="URL+Mapping">URL Mapping</a> <a href="#URL+Mapping"><img 
src="markbook-section-link.png"/></a></h3><p>The gateway functions much like a 
reverse proxy. As such, it maintains a mapping of URLs that are exposed 
externally by the gateway to URLs that are provided by the Hadoop 
cluster.</p><h4><a id="Default+Topology+URLs">Default Topology URLs</a> <a 
href="#Default+Topology+URLs"><img 
src="markbook-section-link.png"/></a></h4><p>In order to provide compatibility 
with the Hadoop java client and existing CLI tools, the Knox Gateway has 
provided a feature called the Default Topology. This refers to a topology 
deployment that will be able to route URLs without the additional context that 
the gateway uses for differentiating from one Hadoop cluster to another. This 
allows the URLs to match those used by existing clients that may access webhdfs 
through the Hadoop file system abstraction.</p><p>When a topology file is 
deployed with a file name that matches the configured defaul
 t topology name, a specialized mapping for URLs is installed for that 
particular topology. This allows the URLs that are expected by the existing 
Hadoop CLIs for webhdfs to be used in interacting with the specific Hadoop 
cluster that is represented by the default topology file.</p><p>The 
configuration for the default topology name is found in gateway-site.xml as a 
property called: &ldquo;default.app.topology.name&rdquo;.</p><p>The default 
value for this property is &ldquo;sandbox&rdquo;.</p><p>Therefore, when 
deploying the sandbox.xml topology, both of the following example URLs work for 
the same underlying Hadoop cluster:</p>
+<pre><code>https://{gateway-host}:{gateway-port}/webhdfs
+https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/webhdfs
+</code></pre><p>These default topology URLs exist for all of the services in 
the topology.</p><h4><a id="Fully+Qualified+URLs">Fully Qualified URLs</a> <a 
href="#Fully+Qualified+URLs"><img 
src="markbook-section-link.png"/></a></h4><p>Examples of mappings for the 
WebHDFS, WebHCat, Oozie and HBase are shown below. These mapping are generated 
from the combination of the gateway configuration file (i.e. 
<code>{GATEWAY_HOME}/conf/gateway-site.xml</code>) and the cluster topology 
descriptors (e.g. 
<code>{GATEWAY_HOME}/conf/topologies/{cluster-name}.xml</code>). The port 
numbers shown for the Cluster URLs represent the default ports for these 
services. The actual port number may be different for a given cluster.</p>
+<ul>
+  <li>WebHDFS
+  <ul>
+    <li>Gateway: 
<code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/webhdfs</code></li>
+    <li>Cluster: <code>http://{webhdfs-host}:50070/webhdfs</code></li>
+  </ul></li>
+  <li>WebHCat (Templeton)
+  <ul>
+    <li>Gateway: 
<code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/templeton</code></li>
+    <li>Cluster: <code>http://{webhcat-host}:50111/templeton}</code></li>
+  </ul></li>
+  <li>Oozie
+  <ul>
+    <li>Gateway: 
<code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/oozie</code></li>
+    <li>Cluster: <code>http://{oozie-host}:11000/oozie}</code></li>
+  </ul></li>
+  <li>HBase
+  <ul>
+    <li>Gateway: 
<code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/hbase</code></li>
+    <li>Cluster: <code>http://{hbase-host}:8080</code></li>
+  </ul></li>
+  <li>Hive JDBC
+  <ul>
+    <li>Gateway: 
<code>jdbc:hive2://{gateway-host}:{gateway-port}/;ssl=true;sslTrustStore={gateway-trust-store-path};trustStorePassword={gateway-trust-store-password};transportMode=http;httpPath={gateway-path}/{cluster-name}/hive</code></li>
+    <li>Cluster: <code>http://{hive-host}:10001/cliservice</code></li>
+  </ul></li>
+</ul><p>The values for <code>{gateway-host}</code>, 
<code>{gateway-port}</code>, <code>{gateway-path}</code> are provided via the 
gateway configuration file (i.e. 
<code>{GATEWAY_HOME}/conf/gateway-site.xml</code>).</p><p>The value for 
<code>{cluster-name}</code> is derived from the file name of the cluster 
topology descriptor (e.g. 
<code>{GATEWAY_HOME}/deployments/{cluster-name}.xml</code>).</p><p>The value 
for <code>{webhdfs-host}</code>, <code>{webhcat-host}</code>, 
<code>{oozie-host}</code>, <code>{hbase-host}</code> and 
<code>{hive-host}</code> are provided via the cluster topology descriptor (e.g. 
<code>{GATEWAY_HOME}/conf/topologies/{cluster-name}.xml</code>).</p><p>Note: 
The ports 50070, 50111, 11000, 8080 and 10001 are the defaults for WebHDFS, 
WebHCat, Oozie, HBase and Hive respectively. Their values can also be provided 
via the cluster topology descriptor if your Hadoop cluster uses different 
ports.</p><p>Note: The HBase REST API uses port 8080 by default. This often 
clash
 es with other running services. In the Hortonworks Sandbox, Apache Ambari 
might be running on this port so you might have to change it to a different 
port (e.g. 60080). </p><h4><a id="Topology+Port+Mapping">Topology Port 
Mapping</a> <a href="#Topology+Port+Mapping"><img 
src="markbook-section-link.png"/></a></h4><p>This feature allows mapping of a 
topology to a port, as a result one can have a specific topology listening on a 
configured port. This feature routes URLs to these port-mapped topologies 
without the additional context that the gateway uses for differentiating from 
one Hadoop cluster to another, just like the <a 
href="#Default+Topology+URLs">Default Topology URLs</a> feature, but on a 
dedicated port. </p><p>The configuration for Topology Port Mapping goes in 
<code>gateway-site.xml</code> file. The configuration uses the property name 
and value model to configure the settings for this feature. The format for the 
property name is <code>gateway.port.mapping.{topologyName}</cod
 e> and value is the port number that this topology would listen on. </p><p>In 
the following example, the topology <code>development</code> will listen on 
9443 (if the port is not already taken).</p>
+<pre><code>  &lt;property&gt;
+      &lt;name&gt;gateway.port.mapping.development&lt;/name&gt;
+      &lt;value&gt;9443&lt;/value&gt;
+      &lt;description&gt;Topology and Port mapping&lt;/description&gt;
+  &lt;/property&gt;
+</code></pre><p>An example of how one can access WebHDFS URL using the above 
configuration is</p>
+<pre><code> https://{gateway-host}:9443/webhdfs
+ https://{gateway-host}:9443/{gateway-path}/development/webhdfs
+ https://{gateway-host}:{gateway-port}/{gateway-path}/development/webhdfs
+</code></pre><p>All of the above URL will be valid URLs for the above 
described configuration.</p><p>This feature is turned on by default, to turn it 
off use the property <code>gateway.port.mapping.enabled</code>. e.g.</p>
+<pre><code> &lt;property&gt;
+     &lt;name&gt;gateway.port.mapping.enabled&lt;/name&gt;
+     &lt;value&gt;false&lt;/value&gt;
+     &lt;description&gt;Enable/Disable port mapping 
feature.&lt;/description&gt;
+ &lt;/property&gt;
+</code></pre>
+<!--If a topology mapped port is in use by another topology or process then an 
ERROR message is logged and gateway startup continues as normal.-->
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--><h3><a id="Configuration">Configuration</a> <a href="#Configuration"><img 
src="markbook-section-link.png"/></a></h3><p>Configuration for Apache Knox 
includes:</p>
+<ol>
+  <li><a href="#Related+Cluster+Configuration">Related Cluster 
Configuration</a> that must be done within the Hadoop cluster to allow Knox to 
communicate with various services</li>
+  <li><a href="#Gateway+Server+Configuration">Gateway Server Configuration</a> 
- which is the configurable elements of the server itself which applies to 
behavior that spans all topologies or managed Hadoop clusters</li>
+  <li><a href="#Topology+Descriptors">Topology Descriptors</a> which are the 
descriptors for controlling access to Hadoop clusters in various ways</li>
+</ol><h3><a id="Related+Cluster+Configuration">Related Cluster 
Configuration</a> <a href="#Related+Cluster+Configuration"><img 
src="markbook-section-link.png"/></a></h3><p>The following configuration 
changes must be made to your cluster to allow Apache Knox to dispatch requests 
to the various service components on behalf of end users.</p><h4><a 
id="Grant+Proxy+privileges+for+Knox+user+in+`core-site.xml`+on+Hadoop+master+nodes">Grant
 Proxy privileges for Knox user in <code>core-site.xml</code> on Hadoop master 
nodes</a> <a 
href="#Grant+Proxy+privileges+for+Knox+user+in+`core-site.xml`+on+Hadoop+master+nodes"><img
 src="markbook-section-link.png"/></a></h4><p>Update <code>core-site.xml</code> 
and add the following lines towards the end of the file.</p><p>Replace 
<code>FQDN_OF_KNOX_HOST</code> with the fully qualified domain name of the host 
running the Knox gateway. You can usually find this by running <code>hostname 
-f</code> on that host.</p><p>You can use <code>*</code> for local de
 veloper testing if the Knox host does not have a static IP.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;hadoop.proxyuser.knox.groups&lt;/name&gt;
+    &lt;value&gt;users&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+    &lt;name&gt;hadoop.proxyuser.knox.hosts&lt;/name&gt;
+    &lt;value&gt;FQDN_OF_KNOX_HOST&lt;/value&gt;
+&lt;/property&gt;
+</code></pre><h4><a 
id="Grant+proxy+privilege+for+Knox+in+`webhcat-site.xml`+on+Hadoop+master+nodes">Grant
 proxy privilege for Knox in <code>webhcat-site.xml</code> on Hadoop master 
nodes</a> <a 
href="#Grant+proxy+privilege+for+Knox+in+`webhcat-site.xml`+on+Hadoop+master+nodes"><img
 src="markbook-section-link.png"/></a></h4><p>Update 
<code>webhcat-site.xml</code> and add the following lines towards the end of 
the file.</p><p>Replace <code>FQDN_OF_KNOX_HOST</code> with the fully qualified 
domain name of the host running the Knox gateway. You can use <code>*</code> 
for local developer testing if the Knox host does not have a static IP.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;webhcat.proxyuser.knox.groups&lt;/name&gt;
+    &lt;value&gt;users&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+    &lt;name&gt;webhcat.proxyuser.knox.hosts&lt;/name&gt;
+    &lt;value&gt;FQDN_OF_KNOX_HOST&lt;/value&gt;
+&lt;/property&gt;
+</code></pre><h4><a 
id="Grant+proxy+privilege+for+Knox+in+`oozie-site.xml`+on+Oozie+host">Grant 
proxy privilege for Knox in <code>oozie-site.xml</code> on Oozie host</a> <a 
href="#Grant+proxy+privilege+for+Knox+in+`oozie-site.xml`+on+Oozie+host"><img 
src="markbook-section-link.png"/></a></h4><p>Update <code>oozie-site.xml</code> 
and add the following lines towards the end of the file.</p><p>Replace 
<code>FQDN_OF_KNOX_HOST</code> with the fully qualified domain name of the host 
running the Knox gateway. You can use <code>*</code> for local developer 
testing if the Knox host does not have a static IP.</p>
+<pre><code>&lt;property&gt;
+    
&lt;name&gt;oozie.service.ProxyUserService.proxyuser.knox.groups&lt;/name&gt;
+    &lt;value&gt;users&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+    
&lt;name&gt;oozie.service.ProxyUserService.proxyuser.knox.hosts&lt;/name&gt;
+    &lt;value&gt;FQDN_OF_KNOX_HOST&lt;/value&gt;
+&lt;/property&gt;
+</code></pre><h4><a 
id="Enable+http+transport+mode+and+use+substitution+in+HiveServer2">Enable http 
transport mode and use substitution in HiveServer2</a> <a 
href="#Enable+http+transport+mode+and+use+substitution+in+HiveServer2"><img 
src="markbook-section-link.png"/></a></h4><p>Update <code>hive-site.xml</code> 
and set the following properties on HiveServer2 hosts. Some of the properties 
may already be in the hive-site.xml. Ensure that the values match the ones 
below.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;hive.server2.allow.user.substitution&lt;/name&gt;
+    &lt;value&gt;true&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    &lt;name&gt;hive.server2.transport.mode&lt;/name&gt;
+    &lt;value&gt;http&lt;/value&gt;
+    &lt;description&gt;Server transport mode. &quot;binary&quot; or 
&quot;http&quot;.&lt;/description&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    &lt;name&gt;hive.server2.thrift.http.port&lt;/name&gt;
+    &lt;value&gt;10001&lt;/value&gt;
+    &lt;description&gt;Port number when in HTTP mode.&lt;/description&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    &lt;name&gt;hive.server2.thrift.http.path&lt;/name&gt;
+    &lt;value&gt;cliservice&lt;/value&gt;
+    &lt;description&gt;Path component of URL endpoint when in HTTP 
mode.&lt;/description&gt;
+&lt;/property&gt;
+</code></pre><h4><a id="Gateway+Server+Configuration">Gateway Server 
Configuration</a> <a href="#Gateway+Server+Configuration"><img 
src="markbook-section-link.png"/></a></h4><p>The following table illustrates 
the configurable elements of the Apache Knox Gateway at the server level via 
gateway-site.xml.</p>
+<table>
+  <thead>
+    <tr>
+      <th>property </th>
+      <th>description </th>
+      <th>default</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>gateway.deployment.dir</td>
+      <td>The directory within GATEWAY_HOME that contains gateway topology 
deployments.</td>
+      <td>{GATEWAY_HOME}/data/deployments</td>
+    </tr>
+    <tr>
+      <td>gateway.security.dir</td>
+      <td>The directory within GATEWAY_HOME that contains the required 
security artifacts</td>
+      <td>{GATEWAY_HOME}/data/security</td>
+    </tr>
+    <tr>
+      <td>gateway.data.dir</td>
+      <td>The directory within GATEWAY_HOME that contains the gateway instance 
data</td>
+      <td>{GATEWAY_HOME}/data</td>
+    </tr>
+    <tr>
+      <td>gateway.services.dir</td>
+      <td>The directory within GATEWAY_HOME that contains the gateway services 
definitions.</td>
+      <td>{GATEWAY_HOME}/services</td>
+    </tr>
+    <tr>
+      <td>gateway.hadoop.conf.dir</td>
+      <td>The directory within GATEWAY_HOME that contains the gateway 
configuration</td>
+      <td>{GATEWAY_HOME}/conf</td>
+    </tr>
+    <tr>
+      <td>gateway.frontend.url</td>
+      <td>The URL that should be used during rewriting so that it can rewrite 
the URLs with the correct &ldquo;frontend&rdquo; URL</td>
+      <td>none</td>
+    </tr>
+    <tr>
+      <td>gateway.xforwarded.enabled</td>
+      <td>Indicates whether support for some X-Forwarded-* headers is 
enabled</td>
+      <td>true</td>
+    </tr>
+    <tr>
+      <td>gateway.trust.all.certs</td>
+      <td>Indicates whether all presented client certs should establish 
trust</td>
+      <td>false</td>
+    </tr>
+    <tr>
+      <td>gateway.client.auth.needed</td>
+      <td>Indicates whether clients are required to establish a trust 
relationship with client certificates</td>
+      <td>false</td>
+    </tr>
+    <tr>
+      <td>gateway.truststore.path</td>
+      <td>Location of the truststore for client certificates to be trusted</td>
+      <td>gateway.jks</td>
+    </tr>
+    <tr>
+      <td>gateway.truststore.type</td>
+      <td>Indicates the type of truststore</td>
+      <td>JKS</td>
+    </tr>
+    <tr>
+      <td>gateway.keystore.type</td>
+      <td>Indicates the type of keystore for the identity store</td>
+      <td>JKS</td>
+    </tr>
+    <tr>
+      <td>gateway.jdk.tls.ephemeralDHKeySize</td>
+      <td>jdk.tls.ephemeralDHKeySize, is defined to customize the ephemeral DH 
key sizes. The minimum acceptable DH key size is 1024 bits, except for 
exportable cipher suites or legacy mode (jdk.tls.ephemeralDHKeySize=legacy)</td>
+      <td>2048</td>
+    </tr>
+    <tr>
+      <td>gateway.threadpool.max</td>
+      <td>The maximum concurrent requests the server will process. The default 
is 254. Connections beyond this will be queued.</td>
+      <td>254</td>
+    </tr>
+    <tr>
+      <td>gateway.httpclient.maxConnections</td>
+      <td>The maximum number of connections that a single httpclient will 
maintain to a single host:port. The default is 32.</td>
+      <td>32</td>
+    </tr>
+    <tr>
+      <td>gateway.httpclient.connectionTimeout</td>
+      <td>The amount of time to wait when attempting a connection. The natural 
unit is milliseconds but a &lsquo;s&rsquo; or &lsquo;m&rsquo; suffix may be 
used for seconds or minutes respectively. The default timeout is 20 sec. </td>
+      <td>20 sec.</td>
+    </tr>
+    <tr>
+      <td>gateway.httpclient.socketTimeout</td>
+      <td>The amount of time to wait for data on a socket before aborting the 
connection. The natural unit is milliseconds but a &lsquo;s&rsquo; or 
&lsquo;m&rsquo; suffix may be used for seconds or minutes respectively. The 
default timeout is 20 sec. </td>
+      <td>20 sec.</td>
+    </tr>
+    <tr>
+      <td>gateway.httpserver.requestBuffer</td>
+      <td>The size of the HTTP server request buffer. The default is 16K.</td>
+      <td>16384</td>
+    </tr>
+    <tr>
+      <td>gateway.httpserver.requestHeaderBuffer</td>
+      <td>The size of the HTTP server request header buffer. The default is 
8K.</td>
+      <td>8192</td>
+    </tr>
+    <tr>
+      <td>gateway.httpserver.responseBuffer</td>
+      <td>The size of the HTTP server response buffer. The default is 32K.</td>
+      <td>32768</td>
+    </tr>
+    <tr>
+      <td>gateway.httpserver.responseHeaderBuffer</td>
+      <td>The size of the HTTP server response header buffer. The default is 
8K.</td>
+      <td>8192</td>
+    </tr>
+    <tr>
+      <td>gateway.websocket.feature.enabled</td>
+      <td>Enable/Disable websocket feature.</td>
+      <td>false</td>
+    </tr>
+    <tr>
+      <td>gateway.gzip.compress.mime.types</td>
+      <td>Content types to be gzip compressed by Knox on the way out to 
browser.</td>
+      <td>text/html, text/plain, text/xml, text/css, application/javascript, 
text/javascript, application/x-javascript</td>
+    </tr>
+    <tr>
+      <td>gateway.signing.keystore.name</td>
+      <td>OPTIONAL Filename of keystore file that contains the signing 
keypair. NOTE: An alias needs to be created using &ldquo;knoxcli.sh 
create-alias&rdquo; for the alias name signing.key.passphrase in order to 
provide the passphrase to access the keystore.</td>
+      <td>null</td>
+    </tr>
+    <tr>
+      <td>gateway.signing.key.alias</td>
+      <td>OPTIONAL alias for the signing keypair within the keystore specified 
via gateway.signing.keystore.name.</td>
+      <td>null</td>
+    </tr>
+    <tr>
+      <td>ssl.enabled</td>
+      <td>Indicates whether SSL is enabled for the Gateway</td>
+      <td>true</td>
+    </tr>
+    <tr>
+      <td>ssl.include.ciphers</td>
+      <td>A comma separated list of ciphers to accept for SSL. See the <a 
href="http://docs.oracle.com/javase/8/docs/technotes/guides/security/SunProviders.html#SunJSSEProvider";>JSSE
 Provider docs</a> for possible ciphers. These can also contain regular 
expressions as shown in the <a 
href="http://www.eclipse.org/jetty/documentation/current/configuring-ssl.html";>Jetty
 documentation</a>.</td>
+      <td>all</td>
+    </tr>
+    <tr>
+      <td>ssl.exclude.ciphers</td>
+      <td>A comma separated list of ciphers to reject for SSL. See the <a 
href="http://docs.oracle.com/javase/8/docs/technotes/guides/security/SunProviders.html#SunJSSEProvider";>JSSE
 Provider docs</a> for possible ciphers. These can also contain regular 
expressions as shown in the <a 
href="http://www.eclipse.org/jetty/documentation/current/configuring-ssl.html";>Jetty
 documentation</a>.</td>
+      <td>none</td>
+    </tr>
+    <tr>
+      <td>ssl.exclude.protocols</td>
+      <td>Excludes a comma separated list of protocols to not accept for SSL 
or &ldquo;none&rdquo;</td>
+      <td>SSLv3</td>
+    </tr>
+    <tr>
+      <td>gateway.remote.config.monitor.client</td>
+      <td>A reference to the <a 
href="#Remote+Configuration+Registry+Clients">remote configuration registry 
client</a> the remote configuration monitor will employ.</td>
+      <td>null</td>
+    </tr>
+    <tr>
+      <td>gateway.remote.config.registry.<b>&lt;name&gt;</b></td>
+      <td>A named <a href="#Remote+Configuration+Registry+Clients">remote 
configuration registry client</a> definition</td>
+      <td>null</td>
+    </tr>
+    <tr>
+      <td>gateway.cluster.config.monitor.ambari.enabled </td>
+      <td>Indicates whether the cluster monitoring and associated dynamic 
topology updating is enabled. </td>
+      <td>false</td>
+    </tr>
+    <tr>
+      <td>gateway.cluster.config.monitor.ambari.interval </td>
+      <td>The interval (in seconds) at which the cluster monitor will poll 
Ambari for cluster configuration changes. </td>
+      <td>60</td>
+    </tr>
+  </tbody>
+</table><h4><a id="Topology+Descriptors">Topology Descriptors</a> <a 
href="#Topology+Descriptors"><img 
src="markbook-section-link.png"/></a></h4><p>The topology descriptor files 
provide the gateway with per-cluster configuration information. This includes 
configuration for both the providers within the gateway and the services within 
the Hadoop cluster. These files are located in 
<code>{GATEWAY_HOME}/conf/topologies</code>. The general outline of this 
document looks like this.</p>
+<pre><code>&lt;topology&gt;
+    &lt;gateway&gt;
+        &lt;provider&gt;
+        &lt;/provider&gt;
+    &lt;/gateway&gt;
+    &lt;service&gt;
+    &lt;/service&gt;
+&lt;/topology&gt;
+</code></pre><p>There are typically multiple <code>&lt;provider&gt;</code> and 
<code>&lt;service&gt;</code> elements.</p>
+<dl><dt>/topology</dt><dd>Defines the provider and configuration and service 
topology for a single Hadoop cluster.</dd><dt>/topology/gateway</dt><dd>Groups 
all of the provider elements</dd><dt>/topology/gateway/provider</dt><dd>Defines 
the configuration of a specific provider for the 
cluster.</dd><dt>/topology/service</dt><dd>Defines the location of a specific 
Hadoop service within the Hadoop cluster.</dd>
+</dl><h5><a id="Provider+Configuration">Provider Configuration</a> <a 
href="#Provider+Configuration"><img 
src="markbook-section-link.png"/></a></h5><p>Provider configuration is used to 
customize the behavior of a particular gateway feature. The general outline of 
a provider element looks like this.</p>
+<pre><code>&lt;provider&gt;
+    &lt;role&gt;authentication&lt;/role&gt;
+    &lt;name&gt;ShiroProvider&lt;/name&gt;
+    &lt;enabled&gt;true&lt;/enabled&gt;
+    &lt;param&gt;
+        &lt;name&gt;&lt;/name&gt;
+        &lt;value&gt;&lt;/value&gt;
+    &lt;/param&gt;
+&lt;/provider&gt;
+</code></pre>
+<dl><dt>/topology/gateway/provider</dt><dd>Groups information for a specific 
provider.</dd><dt>/topology/gateway/provider/role</dt><dd>Defines the role of a 
particular provider. There are a number of pre-defined roles used by 
out-of-the-box provider plugins for the gateway. These roles are: 
authentication, identity-assertion, rewrite and 
hostmap</dd><dt>/topology/gateway/provider/name</dt><dd>Defines the name of the 
provider for which this configuration applies. There can be multiple provider 
implementations for a given role. Specifying the name is used to identify which 
particular provider is being configured. Typically each topology descriptor 
should contain only one provider for each role but there are 
exceptions.</dd><dt>/topology/gateway/provider/enabled</dt><dd>Allows a 
particular provider to be enabled or disabled via <code>true</code> or 
<code>false</code> respectively. When a provider is disabled any filters 
associated with that provider are excluded from the processing cha
 in.</dd><dt>/topology/gateway/provider/param</dt><dd>These elements are used 
to supply provider configuration. There can be zero or more of these per 
provider.</dd><dt>/topology/gateway/provider/param/name</dt><dd>The name of a 
parameter to pass to the 
provider.</dd><dt>/topology/gateway/provider/param/value</dt><dd>The value of a 
parameter to pass to the provider.</dd>
+</dl><h5><a id="Service+Configuration">Service Configuration</a> <a 
href="#Service+Configuration"><img 
src="markbook-section-link.png"/></a></h5><p>Service configuration is used to 
specify the location of services within the Hadoop cluster. The general outline 
of a service element looks like this.</p>
+<pre><code>&lt;service&gt;
+    &lt;role&gt;WEBHDFS&lt;/role&gt;
+    &lt;url&gt;http://localhost:50070/webhdfs&lt;/url&gt;
+&lt;/service&gt;
+</code></pre>
+<dl><dt>/topology/service</dt><dd>Provider information about a particular 
service within the Hadoop cluster. Not all services are necessarily exposed as 
gateway endpoints.</dd><dt>/topology/service/role</dt><dd>Identifies the role 
of this service. Currently supported roles are: WEBHDFS, WEBHCAT, WEBHBASE, 
OOZIE, HIVE, NAMENODE, JOBTRACKER, RESOURCEMANAGER Additional service roles can 
be supported via plugins. Note: The role names are case sensitive and must be 
upper case.</dd><dt>topology/service/url</dt><dd>The URL identifying the 
location of a particular service within the Hadoop cluster.</dd>
+</dl><h4><a id="Hostmap+Provider">Hostmap Provider</a> <a 
href="#Hostmap+Provider"><img src="markbook-section-link.png"/></a></h4><p>The 
purpose of the Hostmap provider is to handle situations where hosts are known 
by one name within the cluster and another name externally. This frequently 
occurs when virtual machines are used and in particular when using cloud 
hosting services. Currently, the Hostmap provider is configured as part of the 
topology file. The basic structure is shown below.</p>
+<pre><code>&lt;topology&gt;
+    &lt;gateway&gt;
+        ...
+        &lt;provider&gt;
+            &lt;role&gt;hostmap&lt;/role&gt;
+            &lt;name&gt;static&lt;/name&gt;
+            &lt;enabled&gt;true&lt;/enabled&gt;
+            
&lt;param&gt;&lt;name&gt;external-host-name&lt;/name&gt;&lt;value&gt;internal-host-name&lt;/value&gt;&lt;/param&gt;
+        &lt;/provider&gt;
+        ...
+    &lt;/gateway&gt;
+    ...
+&lt;/topology&gt;
+</code></pre><p>This mapping is required because the Hadoop services running 
within the cluster are unaware that they are being accessed from outside the 
cluster. Therefore URLs returned as part of REST API responses will typically 
contain internal host names. Since clients outside the cluster will be unable 
to resolve those host name they must be mapped to external host 
names.</p><h5><a id="Hostmap+Provider+Example+-+EC2">Hostmap Provider Example - 
EC2</a> <a href="#Hostmap+Provider+Example+-+EC2"><img 
src="markbook-section-link.png"/></a></h5><p>Consider an EC2 example where two 
VMs have been allocated. Each VM has an external host name by which it can be 
accessed via the internet. However the EC2 VM is unaware of this external host 
name and instead is configured with the internal host name.</p>
+<pre><code>External HOSTNAMES:
+ec2-23-22-31-165.compute-1.amazonaws.com
+ec2-23-23-25-10.compute-1.amazonaws.com
+
+Internal HOSTNAMES:
+ip-10-118-99-172.ec2.internal
+ip-10-39-107-209.ec2.internal
+</code></pre><p>The Hostmap configuration required to allow access external to 
the Hadoop cluster via the Apache Knox Gateway would be this.</p>
+<pre><code>&lt;topology&gt;
+    &lt;gateway&gt;
+        ...
+        &lt;provider&gt;
+            &lt;role&gt;hostmap&lt;/role&gt;
+            &lt;name&gt;static&lt;/name&gt;
+            &lt;enabled&gt;true&lt;/enabled&gt;
+            &lt;param&gt;
+                
&lt;name&gt;ec2-23-22-31-165.compute-1.amazonaws.com&lt;/name&gt;
+                &lt;value&gt;ip-10-118-99-172.ec2.internal&lt;/value&gt;
+            &lt;/param&gt;
+            &lt;param&gt;
+                
&lt;name&gt;ec2-23-23-25-10.compute-1.amazonaws.com&lt;/name&gt;
+                &lt;value&gt;ip-10-39-107-209.ec2.internal&lt;/value&gt;
+            &lt;/param&gt;
+        &lt;/provider&gt;
+        ...
+    &lt;/gateway&gt;
+    ...
+&lt;/topology&gt;
+</code></pre><h5><a id="Hostmap+Provider+Example+-+Sandbox">Hostmap Provider 
Example - Sandbox</a> <a href="#Hostmap+Provider+Example+-+Sandbox"><img 
src="markbook-section-link.png"/></a></h5><p>The Hortonworks Sandbox 2.x poses 
a different challenge for host name mapping. This version of the Sandbox uses 
port mapping to make the Sandbox VM appear as though it is accessible via 
localhost. However the Sandbox VM is internally configured to consider 
sandbox.hortonworks.com as the host name. So from the perspective of a client 
accessing Sandbox the external host name is localhost. The Hostmap 
configuration required to allow access to Sandbox from the host operating 
system is this.</p>
+<pre><code>&lt;topology&gt;
+    &lt;gateway&gt;
+        ...
+        &lt;provider&gt;
+            &lt;role&gt;hostmap&lt;/role&gt;
+            &lt;name&gt;static&lt;/name&gt;
+            &lt;enabled&gt;true&lt;/enabled&gt;
+            &lt;param&gt;
+                &lt;name&gt;localhost&lt;/name&gt;
+                &lt;value&gt;sandbox,sandbox.hortonworks.com&lt;/value&gt;
+            &lt;/param&gt;
+        &lt;/provider&gt;
+        ...
+    &lt;/gateway&gt;
+    ...
+&lt;/topology&gt;
+</code></pre><h5><a id="Hostmap+Provider+Configuration">Hostmap Provider 
Configuration</a> <a href="#Hostmap+Provider+Configuration"><img 
src="markbook-section-link.png"/></a></h5><p>Details about each provider 
configuration element is enumerated below.</p>
+<dl><dt>topology/gateway/provider/role</dt><dd>The role for a Hostmap provider 
must always be 
<code>hostmap</code>.</dd><dt>topology/gateway/provider/name</dt><dd>The 
Hostmap provider supplied out-of-the-box is selected via the name 
<code>static</code>.</dd><dt>topology/gateway/provider/enabled</dt><dd>Host 
mapping can be enabled or disabled by providing <code>true</code> or 
<code>false</code>.</dd><dt>topology/gateway/provider/param</dt><dd>Host 
mapping is configured by providing parameters for each external to internal 
mapping.</dd><dt>topology/gateway/provider/param/name</dt><dd>The parameter 
names represent the external host names associated with the internal host names 
provided by the value element. This can be a comma separated list of host names 
that all represent the same physical host. When mapping from internal to 
external host name the first external host name in the list is 
used.</dd><dt>topology/gateway/provider/param/value</dt><dd>The parameter 
values represent the int
 ernal host names associated with the external host names provider by the name 
element. This can be a comma separated list of host names that all represent 
the same physical host. When mapping from external to internal host names the 
first internal host name in the list is used.</dd>
+</dl><h4><a id="Simplified+Topology+Descriptors">Simplified Topology 
Descriptors</a> <a href="#Simplified+Topology+Descriptors"><img 
src="markbook-section-link.png"/></a></h4><p>Simplified descriptors are a means 
to facilitate provider configuration sharing and service endpoint discovery. 
Rather than editing an XML topology descriptor, it&rsquo;s possible to create a 
simpler YAML (or JSON) descriptor specifying the desired contents of a 
topology, which will yield a full topology descriptor and deployment.</p><h5><a 
id="Externalized+Provider+Configurations">Externalized Provider 
Configurations</a> <a href="#Externalized+Provider+Configurations"><img 
src="markbook-section-link.png"/></a></h5><p>Sometimes, the same provider 
configuration is applied to multiple Knox topologies. With the provider 
configuration externalized from the simple descriptors, a single configuration 
can be referenced by multiple topologies. This helps reduce the duplication of 
configuration, and the need to updat
 e multiple configuration files when a policy change is required. Updating a 
provider configuration will trigger an update to all those topologies that 
reference it.</p><p>The contents of externalized provider configuration is 
identical to the gateway element from a full topology descriptor. The only 
difference is that it&rsquo;s defined in its own XML file in 
<code>{GATEWAY_HOME}/conf/shared-providers/</code></p><p><em>Provider 
Configuration Example</em></p>
+<pre><code>&lt;gateway&gt;
+    &lt;provider&gt;
+        &lt;role&gt;authentication&lt;/role&gt;
+        &lt;name&gt;ShiroProvider&lt;/name&gt;
+        &lt;enabled&gt;true&lt;/enabled&gt;
+        &lt;param&gt;
+            &lt;name&gt;sessionTimeout&lt;/name&gt;
+            &lt;value&gt;30&lt;/value&gt;
+        &lt;/param&gt;
+        &lt;param&gt;
+            &lt;name&gt;main.ldapRealm&lt;/name&gt;
+            
&lt;value&gt;org.apache.knox.gateway.shirorealm.KnoxLdapRealm&lt;/value&gt;
+        &lt;/param&gt;
+        &lt;param&gt;
+            &lt;name&gt;main.ldapContextFactory&lt;/name&gt;
+            
&lt;value&gt;org.apache.knox.gateway.shirorealm.KnoxLdapContextFactory&lt;/value&gt;
+        &lt;/param&gt;
+        &lt;param&gt;
+            &lt;name&gt;main.ldapRealm.contextFactory&lt;/name&gt;
+            &lt;value&gt;$ldapContextFactory&lt;/value&gt;
+        &lt;/param&gt;
+        &lt;param&gt;
+            &lt;name&gt;main.ldapRealm.userDnTemplate&lt;/name&gt;
+            
&lt;value&gt;uid={0},ou=people,dc=hadoop,dc=apache,dc=org&lt;/value&gt;
+        &lt;/param&gt;
+        &lt;param&gt;
+            &lt;name&gt;main.ldapRealm.contextFactory.url&lt;/name&gt;
+            &lt;value&gt;ldap://localhost:33389&lt;/value&gt;
+        &lt;/param&gt;
+        &lt;param&gt;
+            
&lt;name&gt;main.ldapRealm.contextFactory.authenticationMechanism&lt;/name&gt;
+            &lt;value&gt;simple&lt;/value&gt;
+        &lt;/param&gt;
+        &lt;param&gt;
+            &lt;name&gt;urls./**&lt;/name&gt;
+            &lt;value&gt;authcBasic&lt;/value&gt;
+        &lt;/param&gt;
+    &lt;/provider&gt;
+
+    &lt;provider&gt;
+        &lt;role&gt;identity-assertion&lt;/role&gt;
+        &lt;name&gt;Default&lt;/name&gt;
+        &lt;enabled&gt;true&lt;/enabled&gt;
+    &lt;/provider&gt;
+
+    &lt;provider&gt;
+        &lt;role&gt;hostmap&lt;/role&gt;
+        &lt;name&gt;static&lt;/name&gt;
+        &lt;enabled&gt;true&lt;/enabled&gt;
+        
&lt;param&gt;&lt;name&gt;localhost&lt;/name&gt;&lt;value&gt;sandbox,sandbox.hortonworks.com&lt;/value&gt;&lt;/param&gt;
+    &lt;/provider&gt;
+&lt;/gateway&gt;
+</code></pre><h6><a id="Sharing+HA+Providers">Sharing HA Providers</a> <a 
href="#Sharing+HA+Providers"><img 
src="markbook-section-link.png"/></a></h6><p>HA Providers are a special concern 
with respect to sharing provider configuration because they include 
service-specific (and possibly cluster-specific) configuration.</p><p>This 
requires extra attention because the service configurations corresponding to 
the associated HA Provider configuration must contain the correct content to 
function properly.</p><p>For a shared provider configuration with an HA 
Provider service:</p>
+<ul>
+  <li>If the referencing descriptor does not declare the corresponding 
service, then the HA Provider configuration is effectively ignored since the 
service isn&rsquo;t exposed by the topology.</li>
+  <li>If a corresponding service is declared in the descriptor
+  <ul>
+    <li>If service endpoint discovery is employed, then Knox should populate 
the URLs correctly to support the HA behavior.</li>
+    <li>Otherwise, the URLs must be explicitly specified for that service in 
the descriptor.</li>
+  </ul></li>
+  <li>If the descriptor content is correct, but the cluster service is not 
configured for HA, then the HA behavior obviously won&rsquo;t work.</li>
+</ul><p><em>Apache ZooKeeper-based HA Provider Services</em></p><p>The HA 
Provider configuration for some services (e.g., <a 
href="#HiveServer2+HA">HiveServer2</a>, <a href="#Kafka+HA">Kafka</a>) includes 
references to Apache ZooKeeper hosts (i.e., the ZooKeeper ensemble). It&rsquo;s 
important to understand the relationship of that ensemble configuration to the 
topologies referencing it. These ZooKeeper ensembles are often 
cluster-specific. If the ZooKeeper ensemble in the provider configuration is 
part of cluster A, then it&rsquo;s probably incorrect to reference it in a 
topology for cluster B since the Hadoop service endpoints will probably be the 
wrong ones. However, if multiple clusters are working with a common ZooKeeper 
ensemble, then sharing this provider configuration <em>may</em> be appropriate. 
Note that service endpoint discovery does <em>not</em> handle these ZooKeeper 
ensemble details; they are static provider configuration.</p><p>Be sure to pay 
extra attention when sha
 ring HA Provider configuration across topologies.</p><h5><a 
id="Simplified+Descriptor+Files">Simplified Descriptor Files</a> <a 
href="#Simplified+Descriptor+Files"><img 
src="markbook-section-link.png"/></a></h5><p>Simplified descriptors allow 
service URLs to be defined explicitly, just like full topology descriptors. 
However, if URLs are omitted for a service, Knox will attempt to discover that 
service&rsquo;s URLs from the Hadoop cluster. Currently, this behavior is only 
supported for clusters managed by Ambari. In any case, the simplified 
descriptors are much more concise than a full topology 
descriptor.</p><p><em>Descriptor Properties</em></p>
+<table>
+  <thead>
+    <tr>
+      
<th>property&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</th>
+      <th>description</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>discovery-type</td>
+      <td>The discovery source type. (Currently, the only supported type is 
<em>AMBARI</em>).</td>
+    </tr>
+    <tr>
+      <td>discovery-address</td>
+      <td>The endpoint address for the discovery source.</td>
+    </tr>
+    <tr>
+      <td>discovery-user</td>
+      <td>The username with permission to access the discovery source. If 
omitted, then Knox will check for an alias named 
<em>ambari.discovery.user</em>, and use its value if defined.</td>
+    </tr>
+    <tr>
+      <td>discovery-pwd-alias</td>
+      <td>The alias of the password for the user with permission to access the 
discovery source. If omitted, then Knox will check for an alias named 
<em>ambari.discovery.password</em>, and use its value if defined.</td>
+    </tr>
+    <tr>
+      <td>provider-config-ref</td>
+      <td>A reference to a provider configuration in 
<code>{GATEWAY_HOME}/conf/shared-providers/</code>.</td>
+    </tr>
+    <tr>
+      <td>cluster</td>
+      <td>The name of the cluster from which the topology service endpoints 
should be determined.</td>
+    </tr>
+    <tr>
+      <td>services</td>
+      <td>The collection of services to be included in the topology.</td>
+    </tr>
+    <tr>
+      <td>applications</td>
+      <td>The collection of applications to be included in the topology.</td>
+    </tr>
+  </tbody>
+</table><p>Two file formats are supported for two distinct purposes.</p>
+<ul>
+  <li>YAML is intended for the individual hand-editing a simplified descriptor 
because of its readability.</li>
+  <li>JSON is intended to be used for <a href="#Admin+API">API</a> 
interactions.</li>
+</ul><p>That being said, there is nothing preventing the hand-editing of files 
in the JSON format. However, the API will <em>not</em> accept YAML files as 
input.</p><p><em>YAML Example</em> (based on the HDP Docker Sandbox)</p>
+<pre><code>---
+# Discovery source config
+discovery-type : AMBARI
+discovery-address : http://sandbox.hortonworks.com:8080
+
+# If this is not specified, the alias ambari.discovery.user is checked for a 
username
+discovery-user : maria_dev
+
+# If this is not specified, the default alias ambari.discovery.password is used
+discovery-pwd-alias : sandbox.discovery.password
+
+# Provider config reference, the contents of which will be included in the 
resulting topology descriptor
+provider-config-ref : sandbox-providers
+
+# The cluster for which the details should be discovered
+cluster: Sandbox
+
+# The services to declare in the resulting topology descriptor, whose URLs 
will be discovered (unless a value is specified)
+services:
+    - name: NAMENODE
+    - name: JOBTRACKER
+    - name: WEBHDFS
+    - name: WEBHCAT
+    - name: OOZIE
+    - name: WEBHBASE
+    - name: HIVE
+    - name: RESOURCEMANAGER
+    - name: KNOXSSO
+      params:
+          knoxsso.cookie.secure.only: true
+          knoxsso.token.ttl: 100000
+    - name: AMBARI
+      urls:
+          - http://sandbox.hortonworks.com:8080
+    - name: AMBARIUI
+      urls:
+          - http://sandbox.hortonworks.com:8080
+</code></pre><p><em>JSON Example</em> (based on the HDP Docker Sandbox)</p>
+<pre><code>{
+  &quot;discovery-type&quot;:&quot;AMBARI&quot;,
+  
&quot;discovery-address&quot;:&quot;http://sandbox.hortonworks.com:8080&quot;,
+  &quot;discovery-user&quot;:&quot;maria_dev&quot;,
+  &quot;discovery-pwd-alias&quot;:&quot;sandbox.discovery.password&quot;,
+  &quot;provider-config-ref&quot;:&quot;sandbox-providers&quot;,
+  &quot;cluster&quot;:&quot;Sandbox&quot;,
+  &quot;services&quot;:[
+    {&quot;name&quot;:&quot;NAMENODE&quot;},
+    {&quot;name&quot;:&quot;JOBTRACKER&quot;},
+    {&quot;name&quot;:&quot;WEBHDFS&quot;},
+    {&quot;name&quot;:&quot;WEBHCAT&quot;},
+    {&quot;name&quot;:&quot;OOZIE&quot;},
+    {&quot;name&quot;:&quot;WEBHBASE&quot;},
+    {&quot;name&quot;:&quot;HIVE&quot;},
+    {&quot;name&quot;:&quot;RESOURCEMANAGER&quot;},
+    {&quot;name&quot;:&quot;KNOXSSO&quot;,
+      &quot;params&quot;:{
+      &quot;knoxsso.cookie.secure.only&quot;:&quot;true&quot;,
+      &quot;knoxsso.token.ttl&quot;:&quot;100000&quot;
+      }
+    },
+    {&quot;name&quot;:&quot;AMBARI&quot;, 
&quot;urls&quot;:[&quot;http://sandbox.hortonworks.com:8080&quot;]},
+    {&quot;name&quot;:&quot;AMBARIUI&quot;, 
&quot;urls&quot;:[&quot;http://sandbox.hortonworks.com:8080&quot;]}
+  ]
+}
+</code></pre><p>Both of these examples illustrate the specification of 
credentials for the interaction with Ambari. If no credentials are specified, 
then the default aliases are queried. Use of the default aliases is sufficient 
for scenarios where topology discovery will only interact with a single Ambari 
instance. For multiple Ambari instances however, it&rsquo;s most likely that 
each will require different sets of credentials. The discovery-user and 
discovery-pwd-alias properties exist for this purpose. Note that whether using 
the default credential aliases or specifying a custom password alias, these <a 
href="#Alias+creation">aliases must be defined</a> prior to any attempt to 
deploy a topology using a simplified descriptor.</p><h5><a 
id="Deployment+Directories">Deployment Directories</a> <a 
href="#Deployment+Directories"><img 
src="markbook-section-link.png"/></a></h5><p>Effecting topology changes is as 
simple as modifying files in two specific directories.</p><p>The <code>{GATEW
 AY_HOME}/conf/shared-providers/</code> directory is the location where Knox 
looks for provider configurations. This directory is monitored for changes, 
such that modifying a provider configuration file therein will trigger updates 
to any referencing simplified descriptors in the 
<code>{GATEWAY_HOME}/conf/descriptors/</code> directory. <em>Care should be 
taken when deleting these files if there are referencing descriptors; any 
subsequent modifications of referencing descriptors will fail when the deleted 
provider configuration cannot be found. The references should all be modified 
before deleting the provider configuration.</em></p><p>Likewise, the 
<code>{GATEWAY_HOME}/conf/descriptors/</code> directory is monitored for 
changes, such that adding or modifying a simplified descriptor file in this 
directory will trigger the generation and deployment of a topology descriptor. 
Deleting a descriptor from this directory will conversely result in the removal 
of the previously-generated topol
 ogy descriptor, and the associated topology will be undeployed.</p><p>If the 
service details for a deployed (generated) topology are changed in the cluster, 
then the Knox topology can be updated by &rsquo;touch&rsquo;ing the simplified 
descriptor. This will trigger discovery and regeneration/redeployment of the 
topology descriptor.</p><p>Note that deleting a generated topology descriptor 
from <code>{GATEWAY_HOME}/conf/topologies/</code> is not sufficient for its 
removal. If the source descriptor is modified, or Knox is restarted, the 
topology descriptor will be regenerated and deployed. Removing generated 
topology descriptors should be done by removing the associated simplified 
descriptor. For the same reason, editing generated topology descriptors is 
strongly discouraged since they can be inadvertently overwritten.</p><p>Another 
means by which these topology changes can be effected is the <a 
href="#Admin+API">Admin API</a>.</p><h5><a 
id="Cluster+Configuration+Monitoring">Cluster Co
 nfiguration Monitoring</a> <a href="#Cluster+Configuration+Monitoring"><img 
src="markbook-section-link.png"/></a></h5><p>Another benefit gained through the 
use of simplified topology descriptors, and the associated service discovery, 
is the ability to monitor clusters for configuration changes. <strong>Like 
service discovery, this is currently only available for clusters managed by 
Ambari.</strong></p><p>The gateway can monitor Ambari cluster configurations, 
and respond to changes by dynamically regenerating and redeploying the affected 
topologies. The following properties in gateway-site.xml can be used to control 
this behavior.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;gateway.cluster.config.monitor.ambari.enabled&lt;/name&gt;
+    &lt;value&gt;false&lt;/value&gt;
+    &lt;description&gt;Enable/disable Ambari cluster configuration 
monitoring.&lt;/description&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    &lt;name&gt;gateway.cluster.config.monitor.ambari.interval&lt;/name&gt;
+    &lt;value&gt;60&lt;/value&gt;
+    &lt;description&gt;The interval (in seconds) for polling Ambari for 
cluster configuration changes.&lt;/description&gt;
+&lt;/property&gt;
+</code></pre><p>Since service discovery supports multiple Ambari instances as 
discovery sources, multiple Ambari instances can be monitored for cluster 
configuration changes.</p><p>For example, if the cluster monitor is enabled, 
deployment of the following simple descriptor would trigger monitoring of the 
<em>Sandbox</em> cluster managed by Ambari @ <a 
href="http://sandbox.hortonworks.com:8080";>http://sandbox.hortonworks.com:8080</a></p>
+<pre><code>---
+discovery-address : http://sandbox.hortonworks.com:8080
+discovery-user : maria_dev
+discovery-pwd-alias : sandbox.discovery.password
+cluster: Sandbox
+provider-config-ref : sandbox-providers
+services:
+    - name: NAMENODE
+    - name: JOBTRACKER
+    - name: WEBHDFS
+    - name: WEBHCAT
+    - name: OOZIE
+    - name: WEBHBASE
+    - name: HIVE
+    - name: RESOURCEMANAGER
+</code></pre><p>Another <em>Sandbox</em> cluster, managed by a 
<strong>different</strong> Ambari instance, could simultaneously be monitored 
by the same gateway instance.</p><p>Now, topologies can be kept in sync with 
their respective target cluster configurations, without administrator 
intervention or service interruption.</p><h5><a 
id="Remote+Configuration+Monitor">Remote Configuration Monitor</a> <a 
href="#Remote+Configuration+Monitor"><img 
src="markbook-section-link.png"/></a></h5><p>In addition to monitoring local 
directories for provider configurations and simplified descriptors, the gateway 
similarly supports monitoring ZooKeeper.</p><p>This monitor depends on a <a 
href="#Remote+Configuration+Registry+Clients">remote configuration registry 
client</a>, and that client must be specified by setting the following property 
in gateway-site.xml</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;gateway.remote.config.monitor.client&lt;/name&gt;
+    &lt;value&gt;sandbox-zookeeper-client&lt;/value&gt;
+    &lt;description&gt;Remote configuration monitor client 
name.&lt;/description&gt;
+&lt;/property&gt;
+</code></pre><p>This client identifier is a reference to a remote 
configuration registry client, as in this example (also defined in 
gateway-site.xml)</p>
+<pre><code>&lt;property&gt;
+    
&lt;name&gt;gateway.remote.config.registry.sandbox-zookeeper-client&lt;/name&gt;
+    &lt;value&gt;type=ZooKeeper;address=localhost:2181&lt;/value&gt;
+    &lt;description&gt;ZooKeeper configuration registry client 
details.&lt;/description&gt;
+&lt;/property&gt;
+</code></pre><p><em>The actual name of the client (e.g., 
sandbox-zookeeper-client) is not important, except that the reference matches 
the name specified in the client definition.</em></p><p>With this 
configuration, the gateway will monitor the following znodes in the specified 
ZooKeeper instance</p>
+<pre><code>/knox
+   /config
+      /shared-providers
+      /descriptors
+</code></pre><p>The creation of these znodes, and the population of their 
respective contents, is an activity <strong>not</strong> currently managed by 
the gateway. However, the <a href="#Knox+CLI">KNOX CLI</a> includes commands 
for managing the contents of these znodes.</p><p>These znodes are treated 
similarly to the local <em>shared-providers</em> and <em>descriptors</em> 
directories described in <a href="#Deployment+Directories">Deployment 
Directories</a>. When the monitor notices a change to these znodes, it will 
attempt to effect the same change locally.</p><p>If a provider configuration is 
added to the <em>/knox/config/shared-providers</em> znode, the monitor will 
download the new configuration to the local shared-providers directory. 
Likewise, if a descriptor is added to the <em>/knox/config/descriptors</em> 
znode, the monitor will download the new descriptor to the local descriptors 
directory, which will trigger an attempt to generate and deploy a corresponding 
topology.</p>
 <p>Modifications to the contents of these znodes, will yield the same behavior 
as can be seen resulting from the corresponding local modification.</p>
+<table>
+  <thead>
+    <tr>
+      <th>znode </th>
+      <th>action </th>
+      <th>result</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>/knox/config/shared-providers </td>
+      <td>add </td>
+      <td>Download the new file to the local shared-providers directory</td>
+    </tr>
+    <tr>
+      <td>/knox/config/shared-providers </td>
+      <td>modify </td>
+      <td>Download the new file to the local shared-providers directory; If 
there are any existing descriptor references, then topology will be regenerated 
and redeployed for those referencing descriptors.</td>
+    </tr>
+    <tr>
+      <td>/knox/config/shared-providers </td>
+      <td>delete </td>
+      <td>Delete the corresponding file from the local shared-providers 
directory</td>
+    </tr>
+    <tr>
+      <td>/knox/config/descriptors </td>
+      <td>add </td>
+      <td>Download the new file to the local descriptors directory; A 
corresponding topology will be generated and deployed.</td>
+    </tr>
+    <tr>
+      <td>/knox/config/descriptors </td>
+      <td>modify </td>
+      <td>Download the new file to the local descriptors directory; The 
corresponding topology will be regenerated and redeployed.</td>
+    </tr>
+    <tr>
+      <td>/knox/config/descriptors </td>
+      <td>delete </td>
+      <td>Delete the corresponding file from the local descriptors 
directory</td>
+    </tr>
+  </tbody>
+</table><p>This simplifies the configuration for HA gateway deployments, in 
that the gateway instances can all be configured to monitor the same ZooKeeper 
instance, and changes to the znodes&rsquo; contents will be applied to all 
those gateway instances. With this approach, it is no longer necessary to 
manually deploy topologies to each of the gateway instances.</p><p><em>A Note 
About ACLs</em></p>
+<pre><code>While the gateway does not currently require secure interactions 
with remote registries, it is recommended
+that ACLs be applied to restrict at least writing of the entries referenced by 
this monitor. If write
+access is available to everyone, then the contents of the configuration cannot 
be known to be trustworthy,
+and there is the potential for malicious activity. Be sure to carefully 
consider who will have the ability
+to define configuration in monitored remote registries, and apply the 
necessary measures to ensure its
+trustworthiness.
+</code></pre><h4><a id="Remote+Configuration+Registry+Clients">Remote 
Configuration Registry Clients</a> <a 
href="#Remote+Configuration+Registry+Clients"><img 
src="markbook-section-link.png"/></a></h4><p>One or more features of the 
gateway employ remote configuration registry (e.g., ZooKeeper) clients. These 
clients are configured by setting properties in the gateway configuration 
(gateway-site.xml).</p><p>Each client configuration is a single property, the 
name of which is prefixed with <strong>gateway.remote.config.registry.</strong> 
and suffixed by the client identifier. The value of such a property, is a 
registry-type-specific set of semicolon-delimited properties for that client, 
including the type of registry with which it will interact.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;gateway.remote.config.registry.a-zookeeper-client&lt;/name&gt;
+    
&lt;value&gt;type=ZooKeeper;address=zkhost1:2181,zkhost2:2181,zkhost3:2181&lt;/value&gt;
+    &lt;description&gt;ZooKeeper configuration registry client 
details.&lt;/description&gt;
+&lt;/property&gt;
+</code></pre><p>In the preceeding example, the client identifier is 
<strong>a-zookeeper-client</strong>, by way of the property name 
<strong>gateway.remote.config.registry.a-zookeeper-client</strong>.</p><p>The 
property value specifies that the client is intended to interact with 
ZooKeeper. It also specifies the particular ZooKeeper ensemble with which it 
will interact; this could be a single ZooKeeper instance as well.</p><p>The 
property value may also include an optional namespace, to which the client will 
be restricted (i.e., &ldquo;chroot&rdquo; the client).</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;gateway.remote.config.registry.a-zookeeper-client&lt;/name&gt;
+    
&lt;value&gt;type=ZooKeeper;address=zkhost1:2181,zkhost2:2181,zkhost3:2181;namespace=/knox/config&lt;/value&gt;
+    &lt;description&gt;ZooKeeper configuration registry client 
details.&lt;/description&gt;
+&lt;/property&gt;

[... 5965 lines stripped ...]

Reply via email to