Repository: incubator-hawq-docs Updated Branches: refs/heads/develop 25242858c -> de1e2e07e
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/reference/HDFSConfigurationParameterReference.html.md.erb ---------------------------------------------------------------------- diff --git a/reference/HDFSConfigurationParameterReference.html.md.erb b/reference/HDFSConfigurationParameterReference.html.md.erb deleted file mode 100644 index aef4ed2..0000000 --- a/reference/HDFSConfigurationParameterReference.html.md.erb +++ /dev/null @@ -1,257 +0,0 @@ ---- -title: HDFS Configuration Reference ---- - -This reference page describes HDFS configuration values that are configured for HAWQ either within `hdfs-site.xml`, `core-site.xml`, or `hdfs-client.xml`. - -## <a id="topic_ixj_xw1_1w"></a>HDFS Site Configuration (hdfs-site.xml and core-site.xml) - -This topic provides a reference of the HDFS site configuration values recommended for HAWQ installations. These parameters are located in either `hdfs-site.xml` or `core-site.xml` of your HDFS deployment. - -This table describes the configuration parameters and values that are recommended for HAWQ installations. Only HDFS parameters that need to be modified or customized for HAWQ are listed. - -| Parameter | Description | Recommended Value for HAWQ Installs | Comments | -|-------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `dfs.allow.truncate` | Allows truncate. | true | HAWQ requires that you enable `dfs.allow.truncate`. The HAWQ service will fail to start if `dfs.allow.truncate` is not set to `true`. | -| `dfs.block.access.token.enable` | If `true`, access tokens are used as capabilities for accessing DataNodes. If `false`, no access tokens are checked on accessing DataNodes. | *false* for an unsecured HDFS cluster, or *true* for a secure cluster |  | -| `dfs.block.local-path-access.user` | Comma separated list of the users allowed to open block files on legacy short-circuit local read. | gpadmin |  | -| `dfs.client.read.shortcircuit` | This configuration parameter turns on short-circuit local reads. | true | In Ambari, this parameter corresponds to **HDFS Short-circuit read**. The value for this parameter should be the same in `hdfs-site.xml` and HAWQ's `hdfs-client.xml`. | -| `dfs.client.socket-timeout` | The amount of time before a client connection times out when establishing a connection or reading. The value is expressed in milliseconds. | 300000000 |  | -| `dfs.client.use.legacy.blockreader.local` | Setting this value to false specifies that the new version of the short-circuit reader is used. Setting this value to true means that the legacy short-circuit reader would be used. | false |  | -| `dfs.datanode.data.dir.perm` | Permissions for the directories on on the local filesystem where the DFS DataNode stores its blocks. The permissions can either be octal or symbolic. | 750 | In Ambari, this parameter corresponds to **DataNode directories permission** | -| `dfs.datanode.handler.count` | The number of server threads for the DataNode. | 60 |  | -| `dfs.datanode.max.transfer.threads` | Specifies the maximum number of threads to use for transferring data in and out of the DataNode. | 40960 | In Ambari, this parameter corresponds to **DataNode max data transfer threads** | -| `dfs.datanode.socket.write.timeout` | The amount of time before a write operation times out, expressed in milliseconds. | 7200000 |  | -| `dfs.domain.socket.path` | (Optional.) The path to a UNIX domain socket to use for communication between the DataNode and local HDFS clients. If the string "\_PORT" is present in this path, it is replaced by the TCP port of the DataNode. |  | If set, the value for this parameter should be the same in `hdfs-site.xml` and HAWQ's `hdfs-client.xml`. | -| `dfs.namenode.accesstime.precision` | The access time for HDFS file is precise up to this value. Setting a value of 0 disables access times for HDFS. | 0 | In Ambari, this parameter corresponds to **Access time precision** | -| `dfs.namenode.handler.count` | The number of server threads for the NameNode. | 600 |  | -| `dfs.support.append` | Whether HDFS is allowed to append to files. | true |  | -| `ipc.client.connection.maxidletime` | The maximum time in milliseconds after which a client will bring down the connection to the server. | 3600000 | In core-site.xml | -| `ipc.client.connect.timeout` | Indicates the number of milliseconds a client will wait for the socket to establish a server connection. | 300000 | In core-site.xml | -| `ipc.server.listen.queue.size` | Indicates the length of the listen queue for servers accepting client connections. | 3300 | In core-site.xml | - -## <a id="topic_l1c_zw1_1w"></a>HDFS Client Configuration (hdfs-client.xml) - -This topic provides a reference of the HAWQ configuration values located in `$GPHOME/etc/hdfs-client.xml`. - -This table describes the configuration parameters and their default values: - -<table> -<colgroup> -<col width="25%" /> -<col width="25%" /> -<col width="25%" /> -<col width="25%" /> -</colgroup> -<thead> -<tr class="header"> -<th>Parameter</th> -<th>Description</th> -<th>Default Value</th> -<th>Comments</th> -</tr> -</thead> -<tbody> -<tr class="odd"> -<td><code class="ph codeph">dfs.client.failover.max.attempts</code></td> -<td>The maximum number of times that the DFS client retries issuing a RPC call when multiple NameNodes are configured.</td> -<td>15</td> -<td> </td> -</tr> -<tr class="even"> -<td><code class="ph codeph">dfs.client.log.severity</code></td> -<td>The minimal log severity level. Valid values include: FATAL, ERROR, INFO, DEBUG1, DEBUG2, and DEBUG3.</td> -<td>INFO</td> -<td> </td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">dfs.client.read.shortcircuit</code></td> -<td>Determines whether the DataNode is bypassed when reading file blocks, if the block and client are on the same node. The default value, true, bypasses the DataNode.</td> -<td>true</td> -<td>The value for this parameter should be the same in <code class="ph codeph">hdfs-site.xml</code> and HAWQ's <code class="ph codeph">hdfs-client.xml</code>.</td> -</tr> -<tr class="even"> -<td><code class="ph codeph">dfs.client.use.legacy.blockreader.local</code></td> -<td>Determines whether the legacy short-circuit reader implementation, based on HDFS-2246, is used. Set this property to true on non-Linux platforms that do not have the new implementation based on HDFS-347.</td> -<td>false</td> -<td> </td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">dfs.default.blocksize</code></td> -<td>Default block size, in bytes.</td> -<td>134217728</td> -<td>Default is equivalent to 128 MB. </td> -</tr> -<tr class="even"> -<td><code class="ph codeph">dfs.default.replica</code></td> -<td>The default number of replicas.</td> -<td>3</td> -<td> </td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">dfs.domain.socket.path</code></td> -<td>(Optional.) The path to a UNIX domain socket to use for communication between the DataNode and local HDFS clients. If the string "_PORT" is present in this path, it is replaced by the TCP port of the DataNode.</td> -<td> </td> -<td>If set, the value for this parameter should be the same in <code class="ph codeph">hdfs-site.xml</code> and HAWQ's <code class="ph codeph">hdfs-client.xml</code>.</td> -</tr> -<tr class="even"> -<td><code class="ph codeph">dfs.prefetchsize</code></td> -<td>The number of blocks for which information is pre-fetched.</td> -<td>10</td> -<td> </td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">hadoop.security.authentication</code></td> -<td>Specifies the type of RPC authentication to use. A value of <code class="ph codeph">simple</code> indicates no authentication. A value of <code class="ph codeph">kerberos</code> enables authentication by Kerberos.</td> -<td>simple</td> -<td> </td> -</tr> -<tr class="even"> -<td><code class="ph codeph">input.connect.timeout</code></td> -<td>The timeout interval, in milliseconds, for when the input stream is setting up a connection to a DataNode.</td> -<td>600000</td> -<td> Default is equal to 1 hour.</td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">input.localread.blockinfo.cachesize</code></td> -<td>The size of the file block path information cache, in bytes.</td> -<td>1000</td> -<td> </td> -</tr> -<tr class="even"> -<td><code class="ph codeph">input.localread.default.buffersize</code></td> -<td>The size of the buffer, in bytes, used to hold data from the file block and verify the checksum. This value is used only when <code class="ph codeph">dfs.client.read.shortcircuit</code> is set to true.</td> -<td>1048576</td> -<td>Default is equal to 1MB. Only used when is set to true. -<p>If an older version of <code class="ph codeph">hdfs-client.xml</code> is retained during upgrade, to avoid performance degradation, set the <code class="ph codeph">input.localread.default.buffersize</code> to 2097152. </p></td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">input.read.getblockinfo.retry</code></td> -<td>The maximum number of times the client should retry getting block information from the NameNode.</td> -<td>3</td> -<td></td> -</tr> -<tr class="even"> -<td><code class="ph codeph">input.read.timeout</code></td> -<td>The timeout interval, in milliseconds, for when the input stream is reading from a DataNode.</td> -<td>3600000</td> -<td>Default is equal to 1 hour.</td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">input.write.timeout</code></td> -<td>The timeout interval, in milliseconds, for when the input stream is writing to a DataNode.</td> -<td>3600000</td> -<td> </td> -</tr> -<tr class="even"> -<td><code class="ph codeph">output.close.timeout</code></td> -<td>The timeout interval for closing an output stream, in milliseconds.</td> -<td>900000</td> -<td>Default is equal to 1.5 hours.</td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">output.connect.timeout</code></td> -<td>The timeout interval, in milliseconds, for when the output stream is setting up a connection to a DataNode.</td> -<td>600000</td> -<td>Default is equal to 10 minutes.</td> -</tr> -<tr class="even"> -<td><code class="ph codeph">output.default.chunksize</code></td> -<td>The chunk size of the pipeline, in bytes.</td> -<td>512</td> -<td> </td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">output.default.packetsize</code></td> -<td>The packet size of the pipeline, in bytes.</td> -<td>65536</td> -<td>Default is equal to 64KB. </td> -</tr> -<tr class="even"> -<td><code class="ph codeph">output.default.write.retry</code></td> -<td>The maximum number of times that the client should reattempt to set up a failed pipeline.</td> -<td>10</td> -<td> </td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">output.packetpool.size</code></td> -<td>The maximum number of packets in a file's packet pool.</td> -<td>1024</td> -<td> </td> -</tr> -<tr class="even"> -<td><code class="ph codeph">output.read.timeout</code></td> -<td>The timeout interval, in milliseconds, for when the output stream is reading from a DataNode.</td> -<td>3600000</td> -<td>Default is equal to 1 hour. </td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">output.replace-datanode-on-failure</code></td> -<td>Determines whether the client adds a new DataNode to pipeline if the number of nodes in the pipeline is less than the specified number of replicas.</td> -<td>false (if # of nodes less than or equal to 4), otherwise true</td> -<td>When you deploy a HAWQ cluster, the <code class="ph codeph">hawq init</code> utility detects the number of nodes in the cluster and updates this configuration parameter accordingly. However, when expanding an existing cluster to 4 or more nodes, you must manually set this value to true. Set to false if you remove existing nodes and fall under 4 nodes.</td> -</tr> -<tr class="even"> -<td><code class="ph codeph">output.write.timeout</code></td> -<td>The timeout interval, in milliseconds, for when the output stream is writing to a DataNode.</td> -<td>3600000</td> -<td>Default is equal to 1 hour.</td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">rpc.client.connect.retry</code></td> -<td>The maximum number of times to retry a connection if the RPC client fails connect to the server.</td> -<td>10</td> -<td> </td> -</tr> -<tr class="even"> -<td><code class="ph codeph">rpc.client.connect.tcpnodelay</code></td> -<td>Determines whether TCP_NODELAY is used when connecting to the RPC server.</td> -<td>true</td> -<td> </td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">rpc.client.connect.timeout</code></td> -<td>The timeout interval for establishing the RPC client connection, in milliseconds.</td> -<td>600000</td> -<td>Default equals 10 minutes.</td> -</tr> -<tr class="even"> -<td><code class="ph codeph">rpc.client.max.idle</code></td> -<td>The maximum idle time for an RPC connection, in milliseconds.</td> -<td>10000</td> -<td>Default equals 10 seconds.</td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">rpc.client.ping.interval</code></td> -<td>The interval which the RPC client send a heart beat to server. 0 means disable.</td> -<td>10000</td> -<td> </td> -</tr> -<tr class="even"> -<td><code class="ph codeph">rpc.client.read.timeout</code></td> -<td>The timeout interval, in milliseconds, for when the RPC client is reading from the server.</td> -<td>3600000</td> -<td>Default equals 1 hour.</td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">rpc.client.socket.linger.timeout</code></td> -<td>The value to set for the SO_LINGER socket when connecting to the RPC server.</td> -<td>-1</td> -<td> </td> -</tr> -<tr class="even"> -<td><code class="ph codeph">rpc.client.timeout</code></td> -<td>The timeout interval of an RPC invocation, in milliseconds.</td> -<td>3600000</td> -<td>Default equals 1 hour.</td> -</tr> -<tr class="odd"> -<td><code class="ph codeph">rpc.client.write.timeout</code></td> -<td>The timeout interval, in milliseconds, for when the RPC client is writing to the server.</td> -<td>3600000</td> -<td>Default equals 1 hour.</td> -</tr> -</tbody> -</table> - -
