incubator-hawq-docs git commit: This closes #59 - Revisions to HAWQ Best Practices topics.

yozie Tue, 15 Nov 2016 16:00:13 -0800

Repository: incubator-hawq-docs
Updated Branches:
  refs/heads/develop 740b6ee69 -> 9f4293ba4



This closes #59 - Revisions to HAWQ Best Practices topics.


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/9f4293ba
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/9f4293ba
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/9f4293ba

Branch: refs/heads/develop
Commit: 9f4293ba40edad95b1eca1d9dfe04f22d3208afa
Parents: 740b6ee
Author: David Yozie <[email protected]>
Authored: Tue Nov 15 15:59:09 2016 -0800
Committer: David Yozie <[email protected]>
Committed: Tue Nov 15 15:59:09 2016 -0800

----------------------------------------------------------------------
 .../HAWQBestPracticesOverview.html.md.erb       |  3 ---
 .../operating_hawq_bestpractices.html.md.erb    | 13 ++++++++--
 .../querying_data_bestpractices.html.md.erb     | 24 +++++++++++++++---
 query/query-performance.html.md.erb             | 26 ++++++++++++++------
 4 files changed, 50 insertions(+), 16 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/9f4293ba/bestpractices/HAWQBestPracticesOverview.html.md.erb
----------------------------------------------------------------------
diff --git a/bestpractices/HAWQBestPracticesOverview.html.md.erb 
b/bestpractices/HAWQBestPracticesOverview.html.md.erb
index 6277727..13b4dca 100644
--- a/bestpractices/HAWQBestPracticesOverview.html.md.erb
+++ b/bestpractices/HAWQBestPracticesOverview.html.md.erb
@@ -4,9 +4,6 @@ title: Best Practices
 
 This chapter provides best practices on using the components and features that 
are part of a HAWQ system.
 
--   **[HAWQ Best Practices](../bestpractices/general_bestpractices.html)**
-
-    This topic addresses general best practices for using HAWQ.
 
 -   **[Best Practices for Operating 
HAWQ](../bestpractices/operating_hawq_bestpractices.html)**
 

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/9f4293ba/bestpractices/operating_hawq_bestpractices.html.md.erb
----------------------------------------------------------------------
diff --git a/bestpractices/operating_hawq_bestpractices.html.md.erb 
b/bestpractices/operating_hawq_bestpractices.html.md.erb
index d48cf82..9dc56e9 100644
--- a/bestpractices/operating_hawq_bestpractices.html.md.erb
+++ b/bestpractices/operating_hawq_bestpractices.html.md.erb
@@ -4,6 +4,16 @@ title: Best Practices for Operating HAWQ
 
 This topic provides best practices for operating HAWQ, including 
recommendations for stopping, starting and monitoring HAWQ.
 
+## <a id="best_practice_config"></a>Best Practices for Configuring HAWQ 
Parameters
+
+The HAWQ configuration guc/parameters are located in 
`$GPHOME/etc/hawq-site.xml`. This configuration file resides on all HAWQ 
instances and can be modified either by the Ambari interface or the command 
line. 
+
+If you install and manage HAWQ using Ambari, use the Ambari interface for all 
configuration changes. Do not use command line utilities such as `hawq config` 
to set or change HAWQ configuration properties for Ambari-managed clusters. 
Configuration changes to `hawq-site.xml` made outside the Ambari interface will 
be overwritten when you restart or reconfigure HAWQ using Ambari.
+
+If you manage your cluster using command line tools instead of Ambari, use a 
consistent `hawq-site.xml` file to configure your entire cluster. 
+
+**Note:** While `postgresql.conf` still exists in HAWQ, any parameters defined 
in `hawq-site.xml` will overwrite configurations in `postgresql.conf`. For this 
reason, we recommend that you only use `hawq-site.xml` to configure your HAWQ 
cluster. For Ambari clusters, always use Ambari for configuring `hawq-site.xml` 
parameters.
+
 ## <a id="task_qgk_bz3_1v"></a>Best Practices to Start/Stop HAWQ Cluster 
Members
 
 For best results in using `hawq start` and `hawq stop` to manage your HAWQ 
system, the following best practices are recommended.
@@ -85,7 +95,6 @@ WHERE status &lt;&gt; &#39;u&#39;;</code></pre></td>
 <ol>
 <li>Verify that the hosts with down segments are responsive.</li>
 <li>If hosts are OK, check the <span class="ph filepath">pg_log</span> files 
for the down segments to discover the root cause of the segments going 
down.</li>
-<li>If no unexpected errors are found, run the <code class="ph 
codeph">gprecoverseg</code> utility to bring the segments back online.</li>
 </ol></td>
 </tr>
 </tbody>
@@ -116,7 +125,7 @@ WHERE status &lt;&gt; &#39;u&#39;;</code></pre></td>
 <p>Recommended frequency: real-time, if possible, or every 15 minutes</p>
 <p>Severity: CRITICAL</p></td>
 <td>Set up system check for hardware and OS errors.</td>
-<td>If required, remove a machine from the HAWQ cluster to resolve hardware 
and OS issues, then, after add it back to the cluster and run <code class="ph 
codeph">gprecoverseg</code>.</td>
+<td>If required, remove a machine from the HAWQ cluster to resolve hardware 
and OS issues, then add it back to the cluster after the issues are 
resolved.</td>
 </tr>
 <tr class="even">
 <td>Check disk space usage on volumes used for HAWQ data storage and the OS.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/9f4293ba/bestpractices/querying_data_bestpractices.html.md.erb
----------------------------------------------------------------------
diff --git a/bestpractices/querying_data_bestpractices.html.md.erb 
b/bestpractices/querying_data_bestpractices.html.md.erb
index e2fb983..3efe569 100644
--- a/bestpractices/querying_data_bestpractices.html.md.erb
+++ b/bestpractices/querying_data_bestpractices.html.md.erb
@@ -4,6 +4,25 @@ title: Best Practices for Querying Data
 
 To obtain the best results when querying data in HAWQ, review the best 
practices described in this topic.
 
+## <a id="virtual_seg_performance"></a>Factors Impacting Query Performance
+
+The number of virtual segments used for a query directly impacts the query's 
performance. The following factors can impact the degree of parallelism of a 
query:
+
+-   **Cost of the query**. Small queries use fewer segments and larger queries 
use more segments. Some techniques used in defining resource queues can 
influence the number of both virtual segments and general resources allocated 
to queries. For more information, see [Best Practices for Using Resource 
Queues](managing_resources_bestpractices.html#topic_hvd_pls_wv).
+-   **Available resources at query time**. If more resources are available in 
the resource queue, those resources will be used.
+-   **Hash table and bucket number**. If the query involves only 
hash-distributed tables, the query's parallelism is fixed (equal to the hash 
table bucket number) under the following conditions: 
+ 
+       - The bucket number (bucketnum) configured for all the hash tables is 
the same for all tables 
+   - The table size for random tables is no more than 1.5 times the size 
allotted for the hash tables. 
+
+  Otherwise, the number of virtual segments depends on the query's cost: 
hash-distributed table queries behave like queries on randomly-distributed 
tables.
+  
+-   **Query Type**: It can be difficult to calculate  resource costs for 
queries with some user-defined functions or for queries to external tables. 
With these queries,  the number of virtual segments is controlled by the  
`hawq_rm_nvseg_perquery_limit `and `hawq_rm_nvseg_perquery_perseg_limit` 
parameters, as well as by the ON clause and the location list of external 
tables. If the query has a hash result table (e.g. `INSERT into hash_table`), 
the number of virtual segments must be equal to the bucket number of the 
resulting hash table. If the query is performed in utility mode, such as for 
`COPY` and `ANALYZE` operations, the virtual segment number is calculated by 
different policies.
+
+  ***Note:*** PXF external tables use the `default_hash_table_bucket_number` 
parameter, not the `hawq_rm_nvseg_perquery_perseg_limit` parameter, to control 
the number of virtual segments.
+
+See [Query Performance](../query/query-performance.html#topic38) for more 
details.
+
 ## <a id="id_xtk_jmq_1v"></a>Examining Query Plans to Solve Problems
 
 If a query performs poorly, examine its query plan and ask the following 
questions:
@@ -20,8 +39,5 @@ If a query performs poorly, examine its query plan and ask 
the following questio
 
     `Work_mem used: 23430K bytes avg, 23430K bytes max (seg0). Work_mem 
wanted: 33649K bytes avg, 33649K bytes max (seg0) to lessen workfile I/O 
affecting 2               workers.`
 
-The "bytes wanted" (Work_mem) message from `EXPLAIN ANALYZE` is based on the 
amount of data written to work files and is not exact.
-
-**Note**
-The *work\_mem* property is not configurable. Use resource queues to manage 
memory use. For more information on resource queues, see [Configuring Resource 
Management](../resourcemgmt/ConfigureResourceManagement.html) and [Working with 
Hierarchical Resource Queues](../resourcemgmt/ResourceQueues.html).
+  **Note:** The "bytes wanted" (*work\_mem* property) is based on the amount 
of data written to work files and is not exact. This property is not 
configurable. Use resource queues to manage memory use. For more information on 
resource queues, see [Configuring Resource 
Management](../resourcemgmt/ConfigureResourceManagement.html) and [Working with 
Hierarchical Resource Queues](../resourcemgmt/ResourceQueues.html).
 

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/9f4293ba/query/query-performance.html.md.erb
----------------------------------------------------------------------
diff --git a/query/query-performance.html.md.erb 
b/query/query-performance.html.md.erb
index e3aa8f7..981d77b 100644
--- a/query/query-performance.html.md.erb
+++ b/query/query-performance.html.md.erb
@@ -118,18 +118,30 @@ The following table describes the metrics related to data 
locality. Use these me
 
 ## <a id="topic_wv3_gzc_d5"></a>Number of Virtual Segments
 
-The number of virtual segment used has impacts on the query performance. HAWQ 
decides the number of virtual segments of a query (its parallelism) by using 
the following rules:
+To obtain the best results when querying data in HAWQ, review the best 
practices described in this topic.
 
--   **Cost of the query**. Small queries use fewer segments and larger queries 
use more segments. Note that there are some techniques you can use when 
defining resource queues to influence the number of virtual segments and 
general resources that are allocated to queries. See [Best Practices for Using 
Resource 
Queues](../bestpractices/managing_resources_bestpractices.html#topic_hvd_pls_wv).
--   **Available resources**. Resources available at query time. If more 
resources are available in the resource queue, the resources will be used.
--   **Hash table and bucket number**. If the query involves only 
hash-distributed tables, and the bucket number (bucketnum) configured for all 
the hash tables is either the same bucket number for all tables or the table 
size for random tables is no more than 1.5 times larger than the size of hash 
tables for the hash tables, then the query's parallelism is fixed (equal to the 
hash table bucket number). Otherwise, the number of virtual segments depends on 
the query's cost and hash-distributed table queries will behave like queries on 
randomly distributed tables.
--   **Query Type**: For queries with some user-defined functions or for 
external tables where calculating resource costs is difficult , then the number 
of virtual segments is controlled by `hawq_rm_nvseg_perquery_limit `and 
`hawq_rm_nvseg_perquery_perseg_limit` parameters, as well as by the ON clause 
and the location list of external tables. If the query has a hash result table 
(e.g. `INSERT into hash_table`) then the number of virtual segment number must 
be equal to the bucket number of the resulting hash table, If the query is 
performed in utility mode, such as for `COPY` and `ANALYZE` operations, the 
virtual segment number is calculated by different policies, which will be 
explained later in this section.
+### <a id="virtual_seg_performance"></a>Factors Impacting Query Performance
 
-The following are guidelines for numbers of virtual segments to use, provided 
there are sufficient resources available.
+The number of virtual segments used for a query directly impacts the query's 
performance. The following factors can impact the degree of parallelism of a 
query:
+
+-   **Cost of the query**. Small queries use fewer segments and larger queries 
use more segments. Some techniques used in defining resource queues can 
influence the number of both virtual segments and general resources allocated 
to queries.
+-   **Available resources at query time**. If more resources are available in 
the resource queue, those resources will be used.
+-   **Hash table and bucket number**. If the query involves only 
hash-distributed tables, the query's parallelism is fixed (equal to the hash 
table bucket number) under the following conditions:
+
+   - The bucket number (bucketnum) configured for all the hash tables is the 
same bucket number
+   - The table size for random tables is no more than 1.5 times the size 
allotted for the hash tables.
+
+  Otherwise, the number of virtual segments depends on the query's cost: 
hash-distributed table queries behave like queries on randomly-distributed 
tables.
+
+-   **Query Type**: It can be difficult to calculate  resource costs for 
queries with some user-defined functions or for queries to external tables. 
With these queries,  the number of virtual segments is controlled by the  
`hawq_rm_nvseg_perquery_limit `and `hawq_rm_nvseg_perquery_perseg_limit` 
parameters, as well as by the ON clause and the location list of external 
tables. If the query has a hash result table (e.g. `INSERT into hash_table`), 
the number of virtual segments must be equal to the bucket number of the 
resulting hash table. If the query is performed in utility mode, such as for 
`COPY` and `ANALYZE` operations, the virtual segment number is calculated by 
different policies.
+
+###General Guidelines
+
+The following guidelines expand on the numbers of virtual segments to use, 
provided there are sufficient resources available.
 
 -   **Random tables exist in the select list:** \#vseg (number of virtual 
segments) depends on the size of the table.
 -   **Hash tables exist in the select list:** \#vseg depends on the bucket 
number of the table.
--   **Random and hash tables both exist in the select list:** \#vseg depends 
on the bucket number of the table, if the table size of random tables is no 
more than 1.5 times larger than the size of hash tables. Otherwise, \#vseg 
depends on the size of the random table.
+-   **Random and hash tables both exist in the select list:** \#vseg depends 
on the bucket number of the table, if the table size of random tables is no 
more than 1.5 times the size of hash tables. Otherwise, \#vseg depends on the 
size of the random table.
 -   **User-defined functions exist:** \#vseg depends on the 
`hawq_rm_nvseg_perquery_limit` and `hawq_rm_nvseg_perquery_perseg_limit` 
parameters.
 -   **PXF external tables exist:** \#vseg depends on the 
`default_hash_table_bucket_number` parameter.
 -   **gpfdist external tables exist:** \#vseg is at least the number of 
locations in the location list.

incubator-hawq-docs git commit: This closes #59 - Revisions to HAWQ Best Practices topics.

Reply via email to