Added: knox/trunk/books/1.2.0/service_solr.md URL: http://svn.apache.org/viewvc/knox/trunk/books/1.2.0/service_solr.md?rev=1845937&view=auto ============================================================================== --- knox/trunk/books/1.2.0/service_solr.md (added) +++ knox/trunk/books/1.2.0/service_solr.md Tue Nov 6 16:47:30 2018 @@ -0,0 +1,110 @@ +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +---> + +### Solr ### + +Knox provides gateway functionality to Solr with support for versions 5.5+ and 6+. The Solr REST APIs allow the user to view the status +of the collections, perform administrative actions and query collections. + +See the Solr Quickstart (http://lucene.apache.org/solr/quickstart.html) section of the Solr documentation for examples of the Solr REST API. + +Since Knox provides an abstraction over Solr and ZooKeeper, the use of the SolrJ CloudSolrClient is no longer supported. You should replace +instances of CloudSolrClient with HttpSolrClient. + +<p>Note: Updates to Solr via Knox require a POST operation require the use of preemptive authentication which is not directly supported by the +SolrJ API at this time.</p> + +To enable this functionality, a topology file needs to have the following configuration: + + <service> + <role>SOLR</role> + <version>6.0.0</version> + <url>http://<solr-host>:<solr-port></url> + </service> + +The default Solr port is 8983. Adjust the version specified to either '5.5.0 or '6.0.0'. + +#### Solr URL Mapping #### + +For Solr URLs, the mapping of Knox Gateway accessible URLs to direct Solr URLs is the following. + +| ------- | ------------------------------------------------------------------------------------- | +| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/solr` | +| Cluster | `http://{solr-host}:{solr-port}/solr` | + + +#### Solr Examples via cURL + +Some of the various calls that can be made and examples using curl are listed below. + + # 0. Query collection + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/solr/select?q=*:*&wt=json' + + # 1. Query cluster status + + curl -ikv -u guest:guest-password -X POST 'https://localhost:8443/gateway/sandbox/solr/admin/collections?action=CLUSTERSTATUS' + +### Solr HA ### + +Knox provides basic failover functionality for calls made to Solr Cloud when more than one Solr instance is +installed in the cluster and registered with the same ZooKeeper ensemble. The HA functionality in this case fetches the +Solr URL information from a ZooKeeper ensemble, so the user need only supply the necessary ZooKeeper +configuration and not the Solr connection URLs. + +To enable HA functionality for Solr Cloud in Knox the following configuration has to be added to the topology file. + + <provider> + <role>ha</role> + <name>HaProvider</name> + <enabled>true</enabled> + <param> + <name>SOLR</name> + <value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true;zookeeperEnsemble=machine1:2181,machine2:2181,machine3:2181</value> + </param> + </provider> + +The role and name of the provider above must be as shown. The name in the 'param' section must match that of the service +role name that is being configured for HA and the value in the 'param' section is the configuration for that particular +service in HA mode. In this case the name is 'SOLR'. + +The various configuration parameters are described below: + +* maxFailoverAttempts - +This is the maximum number of times a failover will be attempted. The failover strategy at this time is very simplistic +in that the next URL in the list of URLs provided for the service is used and the one that failed is put at the bottom +of the list. If the list is exhausted and the maximum number of attempts is not reached then the first URL will be tried +again after the list is fetched again from ZooKeeper (a refresh of the list is done at this point) + +* failoverSleep - +The amount of time in millis that the process will wait or sleep before attempting to failover. + +* enabled - +Flag to turn the particular service on or off for HA. + +* zookeeperEnsemble - +A comma separated list of host names (or IP addresses) of the zookeeper hosts that consist of the ensemble that the Solr +servers register their information with. + +And for the service configuration itself the URLs need NOT be added to the list. For example. + + <service> + <role>SOLR</role> + <version>6.0.0</version> + </service> + +Please note that there is no `<url>` tag specified here as the URLs for the Solr servers are obtained from ZooKeeper.
Added: knox/trunk/books/1.2.0/service_storm.md URL: http://svn.apache.org/viewvc/knox/trunk/books/1.2.0/service_storm.md?rev=1845937&view=auto ============================================================================== --- knox/trunk/books/1.2.0/service_storm.md (added) +++ knox/trunk/books/1.2.0/service_storm.md Tue Nov 6 16:47:30 2018 @@ -0,0 +1,112 @@ +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +---> + +### Storm ### + +Storm is a distributed realtime computation system. Storm exposes REST APIs for UI functionality that can be used for +retrieving metrics data and configuration information as well as management operations such as starting or stopping topologies. + +The docs for this can be found here + +https://github.com/apache/storm/blob/master/docs/STORM-UI-REST-API.md + +To enable this functionality, a topology file needs to have the following configuration: + + <service> + <role>STORM</role> + <url>http://<hostname>:<port></url> + </service> + +The default UI daemon port is 8744. If it is configured to some other port, that configuration can be +found in `storm.yaml` as the value for the property `ui.port`. + +In addition to the storm service configuration above, a STORM-LOGVIEWER service must be configured if the +log files are to be retrieved through Knox. The value of the port for the logviewer can be found by the property +`logviewer.port` also in the file `storm.yaml`. + + <service> + <role>STORM-LOGVIEWER</role> + <url>http://<hostname>:<port></url> + </service> + + +#### Storm URL Mapping #### + +For Storm URLs, the mapping of Knox Gateway accessible URLs to direct Storm URLs is the following. + +| ------- | ------------------------------------------------------------------------------------- | +| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/storm` | +| Cluster | `http://{storm-host}:{storm-port}` | + +For the log viewer the mapping is as follows + +| ------- | ------------------------------------------------------------------------------------- | +| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/storm/logviewer` | +| Cluster | `http://{storm-logviewer-host}:{storm-logviewer-port}` | + + +#### Storm Examples + +Some of the various calls that can be made and examples using curl are listed below. + + # 0. Getting cluster configuration + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/cluster/configuration' + + # 1. Getting cluster summary information + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/cluster/summary' + + # 2. Getting supervisor summary information + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/supervisor/summary' + + # 3. topologies summary information + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/topology/summary' + + # 4. Getting specific topology information. Substitute {id} with the topology id. + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/topology/{id}' + + # 5. To get component level information. Substitute {id} with the topology id and {component} with the component id e.g. 'spout' + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/topology/{id}/component/{component}' + + +The following POST operations all require a 'x-csrf-token' header along with other information that can be stored in a cookie file. +In particular the 'ring-session' header and 'JSESSIONID'. + + # 6. To activate a topology. Substitute {id} with the topology id and {token-value} with the x-csrf-token value. + + curl -ik -b ~/cookiejar.txt -c ~/cookiejar.txt -u guest:guest-password -H 'x-csrf-token:{token-value}' -X POST \ + http://localhost:8744/api/v1/topology/{id}/activate + + # 7. To de-activate a topology. Substitute {id} with the topology id and {token-value} with the x-csrf-token value. + + curl -ik -b ~/cookiejar.txt -c ~/cookiejar.txt -u guest:guest-password -H 'x-csrf-token:{token-value}' -X POST \ + http://localhost:8744/api/v1/topology/{id}/deactivate + + # 8. To rebalance a topology. Substitute {id} with the topology id and {token-value} with the x-csrf-token value. + + curl -ik -b ~/cookiejar.txt -c ~/cookiejar.txt -u guest:guest-password -H 'x-csrf-token:{token-value}' -X POST \ + http://localhost:8744/api/v1/topology/{id}/rebalance/0 + + # 9. To kill a topology. Substitute {id} with the topology id and {token-value} with the x-csrf-token value. + + curl -ik -b ~/cookiejar.txt -c ~/cookiejar.txt -u guest:guest-password -H 'x-csrf-token:{token-value}' -X POST \ + http://localhost:8744/api/v1/topology/{id}/kill/0 Added: knox/trunk/books/1.2.0/service_webhcat.md URL: http://svn.apache.org/viewvc/knox/trunk/books/1.2.0/service_webhcat.md?rev=1845937&view=auto ============================================================================== --- knox/trunk/books/1.2.0/service_webhcat.md (added) +++ knox/trunk/books/1.2.0/service_webhcat.md Tue Nov 6 16:47:30 2018 @@ -0,0 +1,181 @@ +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +---> + +### WebHCat ### + +WebHCat (also called _Templeton_) is a related but separate service from HiveServer2. +As such it is installed and configured independently. +The [WebHCat wiki pages](https://cwiki.apache.org/confluence/display/Hive/WebHCat) describe this processes. +In sandbox this configuration file for WebHCat is located at `/etc/hadoop/hcatalog/webhcat-site.xml`. +Note the properties shown below as they are related to configuration required by the gateway. + + <property> + <name>templeton.port</name> + <value>50111</value> + </property> + +Also important is the configuration of the JOBTRACKER RPC endpoint. +For Hadoop 2 this can be found in the `yarn-site.xml` file. +In Sandbox this file can be found at `/etc/hadoop/conf/yarn-site.xml`. +The property `yarn.resourcemanager.address` within that file is relevant for the gateway's configuration. + + <property> + <name>yarn.resourcemanager.address</name> + <value>sandbox.hortonworks.com:8050</value> + </property> + +See #[WebHDFS] for details about locating the Hadoop configuration for the NAMENODE endpoint. + +The gateway by default includes a sample topology descriptor file `{GATEWAY_HOME}/deployments/sandbox.xml`. +The values in this sample are configured to work with an installed Sandbox VM. + + <service> + <role>NAMENODE</role> + <url>hdfs://localhost:8020</url> + </service> + <service> + <role>JOBTRACKER</role> + <url>rpc://localhost:8050</url> + </service> + <service> + <role>WEBHCAT</role> + <url>http://localhost:50111/templeton</url> + </service> + +The URLs provided for the role NAMENODE and JOBTRACKER do not result in an endpoint being exposed by the gateway. +This information is only required so that other URLs can be rewritten that reference the appropriate RPC address for Hadoop services. +This prevents clients from needing to be aware of the internal cluster details. +Note that for Hadoop 2 the JOBTRACKER RPC endpoint is provided by the Resource Manager component. + +By default the gateway is configured to use the HTTP endpoint for WebHCat in the Sandbox. +This could alternatively be configured to use the HTTPS endpoint by providing the correct address. + +#### WebHCat URL Mapping #### + +For WebHCat URLs, the mapping of Knox Gateway accessible URLs to direct WebHCat URLs is simple. + +| ------- | ------------------------------------------------------------------------------- | +| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/templeton` | +| Cluster | `http://{webhcat-host}:{webhcat-port}/templeton}` | + + +#### WebHCat via cURL + +Users can use cURL to directly invoke the REST APIs via the gateway. For the full list of available REST calls look at the WebHCat documentation. This is a simple curl command to test the connection: + + curl -i -k -u guest:guest-password 'https://localhost:8443/gateway/sandbox/templeton/v1/status' + + +#### WebHCat Example #### + +This example will submit the familiar WordCount Java MapReduce job to the Hadoop cluster via the gateway using the KnoxShell DSL. +There are several ways to do this depending upon your preference. + +You can use the "embedded" Groovy interpreter provided with the distribution. + + java -jar bin/shell.jar samples/ExampleWebHCatJob.groovy + +You can manually type in the KnoxShell DSL script into the "embedded" Groovy interpreter provided with the distribution. + + java -jar bin/shell.jar + +Each line from the file `samples/ExampleWebHCatJob.groovy` would then need to be typed or copied into the interactive shell. + +#### WebHCat Client DSL #### + +##### submitJava() - Submit a Java MapReduce job. + +* Request + * jar (String) - The remote file name of the JAR containing the app to execute. + * app (String) - The app name to execute. This is _wordcount_ for example not the class name. + * input (String) - The remote directory name to use as input for the job. + * output (String) - The remote directory name to store output from the job. +* Response + * jobId : String - The job ID of the submitted job. Consumes body. +* Example + + + Job.submitJava(session) + .jar(remoteJarName) + .app(appName) + .input(remoteInputDir) + .output(remoteOutputDir) + .now() + .jobId + +##### submitPig() - Submit a Pig job. + +* Request + * file (String) - The remote file name of the pig script. + * arg (String) - An argument to pass to the script. + * statusDir (String) - The remote directory to store status output. +* Response + * jobId : String - The job ID of the submitted job. Consumes body. +* Example + * `Job.submitPig(session).file(remotePigFileName).arg("-v").statusDir(remoteStatusDir).now()` + +##### submitHive() - Submit a Hive job. + +* Request + * file (String) - The remote file name of the hive script. + * arg (String) - An argument to pass to the script. + * statusDir (String) - The remote directory to store status output. +* Response + * jobId : String - The job ID of the submitted job. Consumes body. +* Example + * `Job.submitHive(session).file(remoteHiveFileName).arg("-v").statusDir(remoteStatusDir).now()` + +#### submitSqoop Job API #### +Using the Knox DSL, you can now easily submit and monitor [Apache Sqoop](https://sqoop.apache.org) jobs. The WebHCat Job class now supports the `submitSqoop` command. + + Job.submitSqoop(session) + .command("import --connect jdbc:mysql://hostname:3306/dbname ... ") + .statusDir(remoteStatusDir) + .now().jobId + +The `submitSqoop` command supports the following arguments: + +* command (String) - The sqoop command string to execute. +* files (String) - Comma separated files to be copied to the templeton controller job. +* optionsfile (String) - The remote file which contain Sqoop command need to run. +* libdir (String) - The remote directory containing jdbc jar to include with sqoop lib +* statusDir (String) - The remote directory to store status output. + +A complete example is available here: https://cwiki.apache.org/confluence/display/KNOX/2016/11/08/Running+SQOOP+job+via+KNOX+Shell+DSL + + +##### queryQueue() - Return a list of all job IDs registered to the user. + +* Request + * No request parameters. +* Response + * BasicResponse +* Example + * `Job.queryQueue(session).now().string` + +##### queryStatus() - Check the status of a job and get related job information given its job ID. + +* Request + * jobId (String) - The job ID to check. This is the ID received when the job was created. +* Response + * BasicResponse +* Example + * `Job.queryStatus(session).jobId(jobId).now().string` + +### WebHCat HA ### + +Please look at #[Default Service HA support] Added: knox/trunk/books/1.2.0/service_webhdfs.md URL: http://svn.apache.org/viewvc/knox/trunk/books/1.2.0/service_webhdfs.md?rev=1845937&view=auto ============================================================================== --- knox/trunk/books/1.2.0/service_webhdfs.md (added) +++ knox/trunk/books/1.2.0/service_webhdfs.md Tue Nov 6 16:47:30 2018 @@ -0,0 +1,346 @@ +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +### WebHDFS ### + +REST API access to HDFS in a Hadoop cluster is provided by WebHDFS or HttpFS. +Both services provide the same API. +The [WebHDFS REST API](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html) documentation is available online. +WebHDFS must be enabled in the `hdfs-site.xml` configuration file and exposes the API on each NameNode and DataNode. +HttpFS however is a separate server to be configured and started separately. +In the sandbox this configuration file is located at `/etc/hadoop/conf/hdfs-site.xml`. +Note the properties shown below as they are related to configuration required by the gateway. +Some of these represent the default values and may not actually be present in `hdfs-site.xml`. + + <property> + <name>dfs.webhdfs.enabled</name> + <value>true</value> + </property> + <property> + <name>dfs.namenode.rpc-address</name> + <value>sandbox.hortonworks.com:8020</value> + </property> + <property> + <name>dfs.namenode.http-address</name> + <value>sandbox.hortonworks.com:50070</value> + </property> + <property> + <name>dfs.https.namenode.https-address</name> + <value>sandbox.hortonworks.com:50470</value> + </property> + +The values above need to be reflected in each topology descriptor file deployed to the gateway. +The gateway by default includes a sample topology descriptor file `{GATEWAY_HOME}/deployments/sandbox.xml`. +The values in this sample are configured to work with an installed Sandbox VM. + +Please also note that the port changed from 50070 to 9870 in Hadoop 3.0. + + <service> + <role>NAMENODE</role> + <url>hdfs://localhost:8020</url> + </service> + <service> + <role>WEBHDFS</role> + <url>http://localhost:50070/webhdfs</url> + </service> + +The URL provided for the role NAMENODE does not result in an endpoint being exposed by the gateway. +This information is only required so that other URLs can be rewritten that reference the Name Node's RPC address. +This prevents clients from needing to be aware of the internal cluster details. + +By default the gateway is configured to use the HTTP endpoint for WebHDFS in the Sandbox. +This could alternatively be configured to use the HTTPS endpoint by providing the correct address. + +##### HDFS NameNode Federation + +NameNode federation introduces some additional complexity when determining to which URL(s) Knox should proxy HDFS-related requests. + +The HDFS core-site.xml configuration includes additional properties, which represent options in terms of the NameNode endpoints. + +| ------- | ---------------------------------------------------- | ---------------------- | +| Property Name | Description | Example Value | +| dfs.internal.nameservices | The list of defined namespaces | ns1,ns2 | + +For each value enumerated by *dfs.internal.nameservices*, there is another property defined, for specifying the associated NameNode names. + +| ------- | ------------------------------------------------------------------ | ---------- | +| Property Name | Description | Example Value | +| dfs.ha.namenodes.ns1 | The NameNode identifiers associated with the ns1 namespace | nn1,nn2 | +| dfs.ha.namenodes.ns2 | The NameNode identifiers associated with the ns2 namespace | nn3,nn4 | + +For each namenode name enumerated by each of these properties, there are other properties defined, for specifying the associated host addresses. + +| ------- | ---------------------------------------------------- | ---------------------- | +| Property Name | Description | Example Value | +| dfs.namenode.http-address.ns1.nn1 | The HTTP host address of nn1 NameNode in the ns1 namespace | host1:50070 | +| dfs.namenode.https-address.ns1.nn1 | The HTTPS host address of nn1 NameNode in the ns1 namespace | host1:50470 | +| dfs.namenode.http-address.ns1.nn2 | The HTTP host address of nn2 NameNode in the ns1 namespace | host2:50070 | +| dfs.namenode.https-address.ns1.nn2 | The HTTPS host address of nn2 NameNode in the ns1 namespace | host2:50470 | +| dfs.namenode.http-address.ns2.nn3 | The HTTP host address of nn3 NameNode in the ns2 namespace | host3:50070 | +| dfs.namenode.https-address.ns2.nn3 | The HTTPS host address of nn3 NameNode in the ns2 namespace | host3:50470 | +| dfs.namenode.http-address.ns2.nn4 | The HTTP host address of nn4 NameNode in the ns2 namespace | host4:50070 | +| dfs.namenode.https-address.ns2.nn4 | The HTTPS host address of nn4 NameNode in the ns2 namespace | host4:50470 | + +So, if Knox should proxy the NameNodes associated with *ns1*, and the configuration does not dictate HTTPS, then the WEBHDFS service must +contain URLs based on the values of *dfs.namenode.http-address.ns1.nn1* and *dfs.namenode.http-address.ns1.nn2*. Likewise, if Knox should +proxy the NameNodes associated with *ns2*, the WEBHDFS service must contain URLs based on the values of *dfs.namenode.http-address.ns2.nn3* +and *dfs.namenode.http-address.ns2.nn3*. + +Fortunately, for Ambari-managed clusters, [descriptors](#Simplified+Descriptor+Files) and service discovery can handle this complexity for administrators. +In the descriptor, the service can be declared without any endpoints, and the desired namespace can be specified to disambiguate which endpoint(s) +should be proxied by way of a parameter named *discovery-namespace*. + + "services": [ + { + "name": "WEBHDFS", + "params": { + "discovery-nameservice": "ns2" + } + }, + +If no namespace is specified, then the default namespace will be applied. This default namespace is derived from the value of the +property named *fs.defaultFS* defined in the HDFS *core-site.xml* configuration. + +<br> + +#### WebHDFS URL Mapping #### + +For Name Node URLs, the mapping of Knox Gateway accessible WebHDFS URLs to direct WebHDFS URLs is simple. + +| ------- | ----------------------------------------------------------------------------- | +| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/webhdfs` | +| Cluster | `http://{webhdfs-host}:50070/webhdfs` | + +However, there is a subtle difference to URLs that are returned by WebHDFS in the Location header of many requests. +Direct WebHDFS requests may return Location headers that contain the address of a particular DataNode. +The gateway will rewrite these URLs to ensure subsequent requests come back through the gateway and internal cluster details are protected. + +A WebHDFS request to the NameNode to retrieve a file will return a URL of the form below in the Location header. + + http://{datanode-host}:{data-node-port}/webhdfs/v1/{path}?... + +Note that this URL contains the network location of a DataNode. +The gateway will rewrite this URL to look like the URL below. + + https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/webhdfs/data/v1/{path}?_={encrypted-query-parameters} + +The `{encrypted-query-parameters}` will contain the `{datanode-host}` and `{datanode-port}` information. +This information along with the original query parameters are encrypted so that the internal Hadoop details are protected. + +#### WebHDFS Examples #### + +The examples below upload a file, download the file and list the contents of the directory. + +##### WebHDFS via client DSL + +You can use the Groovy example scripts and interpreter provided with the distribution. + + java -jar bin/shell.jar samples/ExampleWebHdfsPutGet.groovy + java -jar bin/shell.jar samples/ExampleWebHdfsLs.groovy + +You can manually type the client DSL script into the KnoxShell interactive Groovy interpreter provided with the distribution. +The command below starts the KnoxShell in interactive mode. + + java -jar bin/shell.jar + +Each line below could be typed or copied into the interactive shell and executed. +This is provided as an example to illustrate the use of the client DSL. + + // Import the client DSL and a useful utilities for working with JSON. + import org.apache.knox.gateway.shell.Hadoop + import org.apache.knox.gateway.shell.hdfs.Hdfs + import groovy.json.JsonSlurper + + // Setup some basic config. + gateway = "https://localhost:8443/gateway/sandbox" + username = "guest" + password = "guest-password" + + // Start the session. + session = Hadoop.login( gateway, username, password ) + + // Cleanup anything leftover from a previous run. + Hdfs.rm( session ).file( "/user/guest/example" ).recursive().now() + + // Upload the README to HDFS. + Hdfs.put( session ).file( "README" ).to( "/user/guest/example/README" ).now() + + // Download the README from HDFS. + text = Hdfs.get( session ).from( "/user/guest/example/README" ).now().string + println text + + // List the contents of the directory. + text = Hdfs.ls( session ).dir( "/user/guest/example" ).now().string + json = (new JsonSlurper()).parseText( text ) + println json.FileStatuses.FileStatus.pathSuffix + + // Cleanup the directory. + Hdfs.rm( session ).file( "/user/guest/example" ).recursive().now() + + // Clean the session. + session.shutdown() + + +##### WebHDFS via cURL + +Users can use cURL to directly invoke the REST APIs via the gateway. + +###### Optionally cleanup the sample directory in case a previous example was run without cleaning up. + + curl -i -k -u guest:guest-password -X DELETE \ + 'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example?op=DELETE&recursive=true' + +###### Register the name for a sample file README in /user/guest/example. + + curl -i -k -u guest:guest-password -X PUT \ + 'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/README?op=CREATE' + +###### Upload README to /user/guest/example. Use the README in {GATEWAY_HOME}. + + curl -i -k -u guest:guest-password -T README -X PUT \ + '{Value of Location header from command above}' + +###### List the contents of the directory /user/guest/example. + + curl -i -k -u guest:guest-password -X GET \ + 'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example?op=LISTSTATUS' + +###### Request the content of the README file in /user/guest/example. + + curl -i -k -u guest:guest-password -X GET \ + 'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/README?op=OPEN' + +###### Read the content of the file. + + curl -i -k -u guest:guest-password -X GET \ + '{Value of Location header from command above}' + +###### Optionally cleanup the example directory. + + curl -i -k -u guest:guest-password -X DELETE \ + 'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example?op=DELETE&recursive=true' + + +##### WebHDFS client DSL + +###### get() - Get a file from HDFS (OPEN). + +* Request + * from( String name ) - The full name of the file in HDFS. + * file( String name ) - The name of a local file to create with the content. + If this isn't specified the file content must be read from the response. +* Response + * BasicResponse + * If file parameter specified content will be streamed to file. +* Example + * `Hdfs.get( session ).from( "/user/guest/example/README" ).now().string` + +###### ls() - Query the contents of a directory (LISTSTATUS) + +* Request + * dir( String name ) - The full name of the directory in HDFS. +* Response + * BasicResponse +* Example + * `Hdfs.ls( session ).dir( "/user/guest/example" ).now().string` + +###### mkdir() - Create a directory in HDFS (MKDIRS) + +* Request + * dir( String name ) - The full name of the directory to create in HDFS. + * perm( String perm ) - The permissions for the directory (e.g. 644). Optional: default="777" +* Response + * EmptyResponse - Implicit close(). +* Example + * `Hdfs.mkdir( session ).dir( "/user/guest/example" ).now()` + +###### put() - Write a file into HDFS (CREATE) + +* Request + * text( String text ) - Text to upload to HDFS. Takes precedence over file if both present. + * file( String name ) - The name of a local file to upload to HDFS. + * to( String name ) - The fully qualified name to create in HDFS. +* Response + * EmptyResponse - Implicit close(). +* Example + * `Hdfs.put( session ).file( README ).to( "/user/guest/example/README" ).now()` + +###### rm() - Delete a file or directory (DELETE) + +* Request + * file( String name ) - The fully qualified file or directory name in HDFS. + * recursive( Boolean recursive ) - Delete directory and all of its contents if True. Optional: default=False +* Response + * BasicResponse - Implicit close(). +* Example + * `Hdfs.rm( session ).file( "/user/guest/example" ).recursive().now()` + + +### WebHDFS HA ### + +Knox provides basic failover and retry functionality for REST API calls made to WebHDFS when HDFS HA has been +configured and enabled. + +To enable HA functionality for WebHDFS in Knox the following configuration has to be added to the topology file. + + <provider> + <role>ha</role> + <name>HaProvider</name> + <enabled>true</enabled> + <param> + <name>WEBHDFS</name> + <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value> + </param> + </provider> + +The role and name of the provider above must be as shown. The name in the 'param' section must match that of the service +role name that is being configured for HA and the value in the 'param' section is the configuration for that particular +service in HA mode. In this case the name is 'WEBHDFS'. + +The various configuration parameters are described below: + +* maxFailoverAttempts - +This is the maximum number of times a failover will be attempted. The failover strategy at this time is very simplistic +in that the next URL in the list of URLs provided for the service is used and the one that failed is put at the bottom +of the list. If the list is exhausted and the maximum number of attempts is not reached then the first URL that failed +will be tried again (the list will start again from the original top entry). + +* failoverSleep - +The amount of time in milliseconds that the process will wait or sleep before attempting to failover. + +* maxRetryAttempts - +The is the maximum number of times that a retry request will be attempted. Unlike failover, the retry is done on the +same URL that failed. This is a special case in HDFS when the node is in safe mode. The expectation is that the node will +come out of safe mode so a retry is desirable here as opposed to a failover. + +* retrySleep - +The amount of time in milliseconds that the process will wait or sleep before a retry is issued. + +* enabled - +Flag to turn the particular service on or off for HA. + +And for the service configuration itself the additional URLs should be added to the list. The active +URL (at the time of configuration) should ideally be added to the top of the list. + + + <service> + <role>WEBHDFS</role> + <url>http://{host1}:50070/webhdfs</url> + <url>http://{host2}:50070/webhdfs</url> + </service> + + Added: knox/trunk/books/1.2.0/service_yarn.md URL: http://svn.apache.org/viewvc/knox/trunk/books/1.2.0/service_yarn.md?rev=1845937&view=auto ============================================================================== --- knox/trunk/books/1.2.0/service_yarn.md (added) +++ knox/trunk/books/1.2.0/service_yarn.md Tue Nov 6 16:47:30 2018 @@ -0,0 +1,124 @@ +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +---> + +### Yarn ### + +Knox provides gateway functionality for the REST APIs of the ResourceManager. The ResourceManager REST APIs allow the +user to get information about the cluster - status on the cluster, metrics on the cluster, scheduler information, +information about nodes in the cluster, and information about applications on the cluster. Also as of Hadoop version +2.5.0, the user can submit a new application as well as kill it (or get state) using the 'Writable' APIs. + +The docs for this can be found here + +http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html + +To enable this functionality, a topology file needs to have the following configuration: + + <service> + <role>RESOURCEMANAGER</role> + <url>http://<hostname>:<port>/ws</url> + </service> + +The default resource manager http port is 8088. If it is configured to some other port, that configuration can be +found in `yarn-site.xml` under the property `yarn.resourcemanager.webapp.address`. + +#### Yarn URL Mapping #### + +For Yarn URLs, the mapping of Knox Gateway accessible URLs to direct Yarn URLs is the following. + +| ------- | ------------------------------------------------------------------------------------- | +| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/resourcemanager` | +| Cluster | `http://{yarn-host}:{yarn-port}/ws}` | + + +#### Yarn Examples via cURL + +Some of the various calls that can be made and examples using curl are listed below. + + # 0. Getting cluster info + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster' + + # 1. Getting cluster metrics + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/metrics' + + To get the same information in an xml format + + curl -ikv -u guest:guest-password -H Accept:application/xml -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/metrics' + + # 2. Getting scheduler information + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/scheduler' + + # 3. Getting all the applications listed and their information + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps' + + # 4. Getting applications statistics + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/appstatistics' + + Also query params can be used as below to filter the results + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/appstatistics?states=accepted,running,finished&applicationTypes=mapreduce' + + # 5. To get a specific application (please note, replace the application id with a real value) + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps/{application_id}' + + # 6. To get the attempts made for a particular application + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps/{application_id}/appattempts' + + # 7. To get information about the various nodes + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/nodes' + + Also to get a specific node, use an id obtained in the response from above (the node id is scrambled) and issue the following + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/nodes/{node_id}' + + # 8. To create a new Application + + curl -ikv -u guest:guest-password -X POST 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps/new-application' + + An application id is returned from the request above and this can be used to submit an application. + + # 9. To submit an application, put together a request containing the application id received in the above response (please refer to Yarn REST + API documentation). + + curl -ikv -u guest:guest-password -T request.json -H Content-Type:application/json -X POST 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps' + + Here the request is saved in a file called request.json + + #10. To get application state + + curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps/{application_id}/state' + + curl -ikv -u guest:guest-password -H Content-Type:application/json -X PUT -T state-killed.json 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps/application_1409008107556_0007/state' + + # 11. To kill an application that is running issue the below command with the application id of the application that is to be killed. + The contents of the state-killed.json file are : + + { + "state":"KILLED" + } + + + curl -ikv -u guest:guest-password -H Content-Type:application/json -X PUT -T state-killed.json 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps/{application_id}/state' + Added: knox/trunk/books/1.2.0/websocket-support.md URL: http://svn.apache.org/viewvc/knox/trunk/books/1.2.0/websocket-support.md?rev=1845937&view=auto ============================================================================== --- knox/trunk/books/1.2.0/websocket-support.md (added) +++ knox/trunk/books/1.2.0/websocket-support.md Tue Nov 6 16:47:30 2018 @@ -0,0 +1,76 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +## WebSocket Support ## + +### Introduction + +WebSocket is a communication protocol that allows full duplex communication over a single TCP connection. +Knox provides out-of-the-box support for the WebSocket protocol, currently only text messages are supported. + +### Configuration ### + +By default WebSocket functionality is disabled, it can be easily enabled by changing the `gateway.websocket.feature.enabled` property to `true` in `<KNOX-HOME>/conf/gateway-site.xml` file. + + <property> + <name>gateway.websocket.feature.enabled</name> + <value>true</value> + <description>Enable/Disable websocket feature.</description> + </property> + +Service and rewrite rules need to changed accordingly to match the appropriate websocket context. + +### Example ### + +In the following sample configuration we assume that the backend WebSocket URL is ws://myhost:9999/ws. And 'gateway.websocket.feature.enabled' property is set to 'true' as shown above. + +#### rewrite #### + +Example code snippet from `<KNOX-HOME>/data/services/{myservice}/{version}/rewrite.xml` where myservice = websocket and version = 0.6.0 + + <rules> + <rule dir="IN" name="WEBSOCKET/ws/inbound" pattern="*://*:*/**/ws"> + <rewrite template="{$serviceUrl[WEBSOCKET]}/ws"/> + </rule> + </rules> + +#### service #### + +Example code snippet from `<KNOX-HOME>/data/services/{myservice}/{version}/service.xml` where myservice = websocket and version = 0.6.0 + + <service role="WEBSOCKET" name="websocket" version="0.6.0"> + <policies> + <policy role="webappsec"/> + <policy role="authentication" name="Anonymous"/> + <policy role="rewrite"/> + <policy role="authorization"/> + </policies> + <routes> + <route path="/ws"> + <rewrite apply="WEBSOCKET/ws/inbound" to="request.url"/> + </route> + </routes> + </service> + +#### topology #### + +Finally, update the topology file at `<KNOX-HOME>/conf/{topology}.xml` with the backend service URL + + <service> + <role>WEBSOCKET</role> + <url>ws://myhost:9999/ws</url> + </service> Added: knox/trunk/books/1.2.0/x-forwarded-headers.md URL: http://svn.apache.org/viewvc/knox/trunk/books/1.2.0/x-forwarded-headers.md?rev=1845937&view=auto ============================================================================== --- knox/trunk/books/1.2.0/x-forwarded-headers.md (added) +++ knox/trunk/books/1.2.0/x-forwarded-headers.md Tue Nov 6 16:47:30 2018 @@ -0,0 +1,76 @@ +<!--- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +---> + +### X-Forwarded-* Headers Support ### +Out-of-the-box Knox provides support for some `X-Forwarded-*` headers through the use of a Servlet Filter. Specifically the +headers handled/populated by Knox are: + +* X-Forwarded-For +* X-Forwarded-Proto +* X-Forwarded-Port +* X-Forwarded-Host +* X-Forwarded-Server +* X-Forwarded-Context + +This functionality can be turned off by a configuration setting in the file gateway-site.xml and redeploying the +necessary topology/topologies. + +The setting is (under the 'configuration' tag) : + + <property> + <name>gateway.xforwarded.enabled</name> + <value>false</value> + </property> + +If this setting is absent, the default behavior is that the `X-Forwarded-*` header support is on or in other words, +`gateway.xforwarded.enabled` is set to `true` by default. + + +#### Header population #### + +The following are the various rules for population of these headers: + +##### X-Forwarded-For ##### + +This header represents a list of client IP addresses. If the header is already present Knox adds a comma separated value +to the list. The value added is the client's IP address as Knox sees it. This value is added to the end of the list. + +##### X-Forwarded-Proto ##### + +The protocol used in the client request. If this header is passed into Knox its value is maintained, otherwise Knox will +populate the header with the value 'https' if the request is a secure one or 'http' otherwise. + +##### X-Forwarded-Port ##### + +The port used in the client request. If this header is passed into Knox its value is maintained, otherwise Knox will +populate the header with the value of the port that the request was made coming into Knox. + +##### X-Forwarded-Host ##### + +Represents the original host requested by the client in the Host HTTP request header. The value passed into Knox is maintained +by Knox. If no value is present, Knox populates the header with the value of the HTTP Host header. + +##### X-Forwarded-Server ##### + +The hostname of the server Knox is running on. + +##### X-Forwarded-Context ##### + +This header value contains the context path of the request to Knox. + + + Modified: knox/trunk/build.xml URL: http://svn.apache.org/viewvc/knox/trunk/build.xml?rev=1845937&r1=1845936&r2=1845937&view=diff ============================================================================== --- knox/trunk/build.xml (original) +++ knox/trunk/build.xml Tue Nov 6 16:47:30 2018 @@ -44,6 +44,7 @@ <property name="book-0-14-0-dir" value="${book-target}/${gateway-artifact}-0-14-0"/> <property name="book-1-0-0-dir" value="${book-target}/${gateway-artifact}-1-0-0"/> <property name="book-1-1-0-dir" value="${book-target}/${gateway-artifact}-1-1-0"/> + <property name="book-1-2-0-dir" value="${book-target}/${gateway-artifact}-1-2-0"/> <property name="svn.release.path" value="https://dist.apache.org/repos/dist/release/incubator/${gateway-project}" /> <property name="svn.staging.path" value="https://dist.apache.org/repos/dist/dev/incubator/${gateway-project}" /> @@ -92,7 +93,7 @@ </target> <target name="books" depends="markbook,_books"/> - <target name="_books" depends="_book-0-3-0,_book-0-4-0,_book-0-5-0,_book-0-6-0,_book-0-7-0,_book-0-8-0,_book-0-9-0,_book-0-9-1,_book-0-10-0,_book-0-11-0,_book-0-12-0,_book-0-13-0,_book-0-14-0,_book-1-0-0,_book-1-1-0"/> + <target name="_books" depends="_book-0-3-0,_book-0-4-0,_book-0-5-0,_book-0-6-0,_book-0-7-0,_book-0-8-0,_book-0-9-0,_book-0-9-1,_book-0-10-0,_book-0-11-0,_book-0-12-0,_book-0-13-0,_book-0-14-0,_book-1-0-0,_book-1-1-0,_book-1-2-0"/> <target name="_book-0-3-0" depends="init"> <delete dir="${book-target}/${gateway-artifact}-0-3-0" includes="**/*.html,**/*.css,**/*.png"/> <java jar="markbook/target/markbook.jar" fork="true" failonerror="true"> @@ -346,6 +347,27 @@ <fileset dir="books/1.1.0/img/adminui"/> </copy> </target> + <target name="_book-1-2-0" depends="init"> + <delete dir="${book-target}/${gateway-artifact}-1-2-0" includes="**/*.html,**/*.css,**/*.png"/> + <java jar="markbook/target/markbook.jar" fork="true" failonerror="true"> + <arg value="-i"/><arg value="books/1.2.0/book.md"/> + <arg value="-o"/><arg value="${book-1-2-0-dir}/user-guide.html"/> + </java> + <java jar="markbook/target/markbook.jar" fork="true" failonerror="true"> + <arg value="-i"/><arg value="books/1.2.0/dev-guide/book.md"/> + <arg value="-o"/><arg value="${book-1-2-0-dir}/dev-guide.html"/> + </java> + <java jar="markbook/target/markbook.jar" fork="true" failonerror="true"> + <arg value="-i"/><arg value="books/1.2.0/dev-guide/knoxsso_integration.md"/> + <arg value="-o"/><arg value="${book-1-2-0-dir}/knoxsso_integration.html"/> + </java> + <copy todir="${book-target}/${gateway-artifact}-1-2-0"> + <fileset dir="books/static"/> + </copy> + <copy todir="${book-target}/${gateway-artifact}-1-2-0/adminui"> + <fileset dir="books/1.2.0/img/adminui"/> + </copy> + </target> <target name="markbook" depends="init" description="Build and package markbook tool."> <exec executable="${mvn.cmd}"> @@ -358,10 +380,10 @@ <target name="review-book" depends="init" description="Open the default book in the default browser."> <exec executable="${browser.cmd}"> - <arg line="${book-1-1-0-dir}/user-guide.html" /> + <arg line="${book-1-2-0-dir}/user-guide.html" /> </exec> <exec executable="${browser.cmd}"> - <arg line="${book-1-1-0-dir}/dev-guide.html" /> + <arg line="${book-1-2-0-dir}/dev-guide.html" /> </exec> </target>