make references DataNode consistent
Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/00a2a368 Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/00a2a368 Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/00a2a368 Branch: refs/heads/tutorial-proto Commit: 00a2a3684b9074a11f720c72be61fd1672d5aa1f Parents: 86ef700 Author: Lisa Owen <[email protected]> Authored: Thu Oct 20 10:59:58 2016 -0700 Committer: Lisa Owen <[email protected]> Committed: Thu Oct 20 10:59:58 2016 -0700 ---------------------------------------------------------------------- ddl/ddl-table.html.md.erb | 2 +- install/aws-config.html.md.erb | 2 +- install/select-hosts.html.md.erb | 4 ++-- overview/TableDistributionStorage.html.md.erb | 2 +- pxf/TroubleshootingPXF.html.md.erb | 4 ++-- query/query-performance.html.md.erb | 2 +- reference/HDFSConfigurationParameterReference.html.md.erb | 6 +++--- 7 files changed, 11 insertions(+), 11 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/00a2a368/ddl/ddl-table.html.md.erb ---------------------------------------------------------------------- diff --git a/ddl/ddl-table.html.md.erb b/ddl/ddl-table.html.md.erb index 62ece36..d0220d7 100644 --- a/ddl/ddl-table.html.md.erb +++ b/ddl/ddl-table.html.md.erb @@ -66,7 +66,7 @@ Foreign key constraints specify that the values in a column or a group of column All HAWQ tables are distributed. The default is `DISTRIBUTED RANDOMLY` \(round-robin distribution\) to determine the table row distribution. However, when you create or alter a table, you can optionally specify `DISTRIBUTED BY` to distribute data according to a hash-based policy. In this case, the `bucketnum` attribute sets the number of hash buckets used by a hash-distributed table. Columns of geometric or user-defined data types are not eligible as HAWQ distribution key columns. -Randomly distributed tables have benefits over hash distributed tables. For example, after expansion, HAWQ's elasticity feature lets it automatically use more resources without needing to redistribute the data. For extremely large tables, redistribution is very expensive. Also, data locality for randomly distributed tables is better, especially after the underlying HDFS redistributes its data during rebalancing or because of data node failures. This is quite common when the cluster is large. +Randomly distributed tables have benefits over hash distributed tables. For example, after expansion, HAWQ's elasticity feature lets it automatically use more resources without needing to redistribute the data. For extremely large tables, redistribution is very expensive. Also, data locality for randomly distributed tables is better, especially after the underlying HDFS redistributes its data during rebalancing or because of DataNode failures. This is quite common when the cluster is large. However, hash distributed tables can be faster than randomly distributed tables. For example, for TPCH queries, where there are several queries, HASH distributed tables can have performance benefits. Choose a distribution policy that best suits your application scenario. When you `CREATE TABLE`, you can also specify the `bucketnum` option. The `bucketnum` determines the number of hash buckets used in creating a hash-distributed table or for PXF external table intermediate processing. The number of buckets also affects how many virtual segments will be created when processing this data. The bucketnumber of a gpfdist external table is the number of gpfdist location, and the bucketnumber of a command external table is `ON #num`. PXF external tables use the `default_hash_table_bucket_number` parameter to control virtual segments. http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/00a2a368/install/aws-config.html.md.erb ---------------------------------------------------------------------- diff --git a/install/aws-config.html.md.erb b/install/aws-config.html.md.erb index e4106b1..21cadf5 100644 --- a/install/aws-config.html.md.erb +++ b/install/aws-config.html.md.erb @@ -34,7 +34,7 @@ Virtual devices for instance store volumes for HAWQ EC2 instance store instances A placement group is a logical grouping of instances within a single availability zone that together participate in a low-latency, 10 Gbps network. Your HAWQ master and segment cluster instances should support enhanced networking and reside in a single placement group (and subnet) for optimal network performance. -If your Ambari node is not a data node, locating the Ambari node instance in a subnet separate from the HAWQ master/segment placement group enables you to manage multiple HAWQ clusters from the single Ambari instance. +If your Ambari node is not a DataNode, locating the Ambari node instance in a subnet separate from the HAWQ master/segment placement group enables you to manage multiple HAWQ clusters from the single Ambari instance. Amazon recommends that you use the same instance type for all instances in the placement group and that you launch all instances within the placement group at the same time. http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/00a2a368/install/select-hosts.html.md.erb ---------------------------------------------------------------------- diff --git a/install/select-hosts.html.md.erb b/install/select-hosts.html.md.erb index c49f184..c2fbdff 100644 --- a/install/select-hosts.html.md.erb +++ b/install/select-hosts.html.md.erb @@ -8,10 +8,10 @@ Complete this procedure for all HAWQ deployments: 1. **Choose the host machines that will host a HAWQ segment.** Keep in mind these restrictions and requirements: - Each host must meet the system requirements for the version of HAWQ you are installing. - - Each HAWQ segment must be co-located on a host that runs an HDFS data node. + - Each HAWQ segment must be co-located on a host that runs an HDFS DataNode. - The HAWQ master segment and standby master segment must be hosted on separate machines. 2. **Choose the host machines that will run PXF.** Keep in mind these restrictions and requirements: - - PXF must be installed on the HDFS NameNode *and* on all HDFS data nodes. + - PXF must be installed on the HDFS NameNode *and* on all HDFS DataNodes. - If you have configured Hadoop with high availability, PXF must also be installed on all HDFS nodes including all NameNode services. - If you want to use PXF with HBase or Hive, you must first install the HBase client \(hbase-client\) and/or Hive client \(hive-client\) on each machine where you intend to install PXF. See the [HDP installation documentation](http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/index.html) for more information. 3. **Verify that required ports on all machines are unused.** By default, a HAWQ master or standby master service configuration uses port 5432. Hosts that run other PostgreSQL instances cannot be used to run a default HAWQ master or standby service configuration because the default PostgreSQL port \(5432\) conflicts with the default HAWQ port. You must either change the default port configuration of the running PostgreSQL instance or change the HAWQ master port setting during the HAWQ service installation to avoid port conflicts. http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/00a2a368/overview/TableDistributionStorage.html.md.erb ---------------------------------------------------------------------- diff --git a/overview/TableDistributionStorage.html.md.erb b/overview/TableDistributionStorage.html.md.erb index aa03b59..58f20f2 100755 --- a/overview/TableDistributionStorage.html.md.erb +++ b/overview/TableDistributionStorage.html.md.erb @@ -12,7 +12,7 @@ For all HAWQ table storage formats, AO \(Append-Only\) and Parquet, the data fil The default table distribution policy in HAWQ is random. -Randomly distributed tables have some benefits over hash distributed tables. For example, after cluster expansion, HAWQ can use more resources automatically without redistributing the data. For huge tables, redistribution is very expensive, and data locality for randomly distributed tables is better after the underlying HDFS redistributes its data during rebalance or data node failures. This is quite common when the cluster is large. +Randomly distributed tables have some benefits over hash distributed tables. For example, after cluster expansion, HAWQ can use more resources automatically without redistributing the data. For huge tables, redistribution is very expensive, and data locality for randomly distributed tables is better after the underlying HDFS redistributes its data during rebalance or DataNode failures. This is quite common when the cluster is large. On the other hand, for some queries, hash distributed tables are faster than randomly distributed tables. For example, hash distributed tables have some performance benefits for some TPC-H queries. You should choose the distribution policy that is best suited for your application's scenario. http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/00a2a368/pxf/TroubleshootingPXF.html.md.erb ---------------------------------------------------------------------- diff --git a/pxf/TroubleshootingPXF.html.md.erb b/pxf/TroubleshootingPXF.html.md.erb index 7b53065..d59e361 100644 --- a/pxf/TroubleshootingPXF.html.md.erb +++ b/pxf/TroubleshootingPXF.html.md.erb @@ -49,8 +49,8 @@ The following table lists some common errors encountered while using PXF: <td>Cannot find PXF Jar</td> </tr> <tr class="even"> -<td>ERROR: PXF API encountered a HTTP 404 error. Either the PXF service (tomcat) on data node was not started or PXF webapp was not started.</td> -<td>Either the required data node does not exist or PXF service (tcServer) on data node is not started or PXF webapp was not started</td> +<td>ERROR: PXF API encountered a HTTP 404 error. Either the PXF service (tomcat) on the DataNode was not started or the PXF webapp was not started.</td> +<td>Either the required DataNode does not exist or PXF service (tcServer) on the DataNode is not started or PXF webapp was not started</td> </tr> <tr class="odd"> <td>ERROR:  remote component error (500) from '<x>':  type  Exception report  message  java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/HTableInterface</td> http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/00a2a368/query/query-performance.html.md.erb ---------------------------------------------------------------------- diff --git a/query/query-performance.html.md.erb b/query/query-performance.html.md.erb index 4515575..b4f88fe 100644 --- a/query/query-performance.html.md.erb +++ b/query/query-performance.html.md.erb @@ -99,7 +99,7 @@ The following table describes the metrics related to data locality. Use these me </tr> <tr class="odd"> <td>continuity</td> -<td>reading a HDFS file discontinuously will introduce additional seek, which will slow the table scan of a query. A low value of continuity indicates that the blocks of a file are not continuously distributed on a datanode.</td> +<td>reading a HDFS file discontinuously will introduce additional seek, which will slow the table scan of a query. A low value of continuity indicates that the blocks of a file are not continuously distributed on a DataNode.</td> </tr> <tr class="even"> <td>DFS metadatacache</td> http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/00a2a368/reference/HDFSConfigurationParameterReference.html.md.erb ---------------------------------------------------------------------- diff --git a/reference/HDFSConfigurationParameterReference.html.md.erb b/reference/HDFSConfigurationParameterReference.html.md.erb index 8199de2..aef4ed2 100644 --- a/reference/HDFSConfigurationParameterReference.html.md.erb +++ b/reference/HDFSConfigurationParameterReference.html.md.erb @@ -13,13 +13,13 @@ This table describes the configuration parameters and values that are recommende | Parameter | Description | Recommended Value for HAWQ Installs | Comments | |-------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `dfs.allow.truncate` | Allows truncate. | true | HAWQ requires that you enable `dfs.allow.truncate`. The HAWQ service will fail to start if `dfs.allow.truncate` is not set to `true`. | -| `dfs.block.access.token.enable` | If `true`, access tokens are used as capabilities for accessing datanodes. If `false`, no access tokens are checked on accessing datanodes. | *false* for an unsecured HDFS cluster, or *true* for a secure cluster |  | +| `dfs.block.access.token.enable` | If `true`, access tokens are used as capabilities for accessing DataNodes. If `false`, no access tokens are checked on accessing DataNodes. | *false* for an unsecured HDFS cluster, or *true* for a secure cluster |  | | `dfs.block.local-path-access.user` | Comma separated list of the users allowed to open block files on legacy short-circuit local read. | gpadmin |  | | `dfs.client.read.shortcircuit` | This configuration parameter turns on short-circuit local reads. | true | In Ambari, this parameter corresponds to **HDFS Short-circuit read**. The value for this parameter should be the same in `hdfs-site.xml` and HAWQ's `hdfs-client.xml`. | | `dfs.client.socket-timeout` | The amount of time before a client connection times out when establishing a connection or reading. The value is expressed in milliseconds. | 300000000 |  | | `dfs.client.use.legacy.blockreader.local` | Setting this value to false specifies that the new version of the short-circuit reader is used. Setting this value to true means that the legacy short-circuit reader would be used. | false |  | -| `dfs.datanode.data.dir.perm` | Permissions for the directories on on the local filesystem where the DFS data node store its blocks. The permissions can either be octal or symbolic. | 750 | In Ambari, this parameter corresponds to **DataNode directories permission** | -| `dfs.datanode.handler.count` | The number of server threads for the datanode. | 60 |  | +| `dfs.datanode.data.dir.perm` | Permissions for the directories on on the local filesystem where the DFS DataNode stores its blocks. The permissions can either be octal or symbolic. | 750 | In Ambari, this parameter corresponds to **DataNode directories permission** | +| `dfs.datanode.handler.count` | The number of server threads for the DataNode. | 60 |  | | `dfs.datanode.max.transfer.threads` | Specifies the maximum number of threads to use for transferring data in and out of the DataNode. | 40960 | In Ambari, this parameter corresponds to **DataNode max data transfer threads** | | `dfs.datanode.socket.write.timeout` | The amount of time before a write operation times out, expressed in milliseconds. | 7200000 |  | | `dfs.domain.socket.path` | (Optional.) The path to a UNIX domain socket to use for communication between the DataNode and local HDFS clients. If the string "\_PORT" is present in this path, it is replaced by the TCP port of the DataNode. |  | If set, the value for this parameter should be the same in `hdfs-site.xml` and HAWQ's `hdfs-client.xml`. |
