remove book branching on incubator-hawq-docs

yozie Fri, 06 Jan 2017 09:39:43 -0800

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/admin/startstop.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/admin/startstop.html.md.erb 
b/markdown/admin/startstop.html.md.erb
new file mode 100644
index 0000000..7aac723
--- /dev/null
+++ b/markdown/admin/startstop.html.md.erb
@@ -0,0 +1,146 @@
+---
+title: Starting and Stopping HAWQ
+---
+
+In a HAWQ DBMS, the database server instances \(the master and all segments\) 
are started or stopped across all of the hosts in the system in such a way that 
they can work together as a unified DBMS.
+
+Because a HAWQ system is distributed across many machines, the process for 
starting and stopping a HAWQ system is different than the process for starting 
and stopping a regular PostgreSQL DBMS.
+
+Use the `hawq start `*`object`* and `hawq stop `*`object`* commands to start 
and stop HAWQ, respectively. These management tools are located in the 
`$GPHOME/bin` directory on your HAWQ master host. 
+
+Initializing a HAWQ system also starts the system.
+
+**Important:**
+
+Do not issue a `KILL` command to end any Postgres process. Instead, use the 
database command `pg_cancel_backend()`.
+
+For information about [hawq 
start](../reference/cli/admin_utilities/hawqstart.html) and [hawq 
stop](../reference/cli/admin_utilities/hawqstop.html), see the appropriate 
pages in the HAWQ Management Utility Reference or enter `hawq start -h` or 
`hawq stop -h` on the command line.
+
+
+## <a id="task_hkd_gzv_fp"></a>Starting HAWQ 
+
+When a HAWQ system is first initialized, it is also started. For more 
information about initializing HAWQ, see [hawq 
init](../reference/cli/admin_utilities/hawqinit.html). 
+
+To start a stopped HAWQ system that was previously initialized, run the `hawq 
start` command on the master instance.
+
+You can also use the `hawq start master` command to start only the HAWQ 
master, without segment nodes, then add these later, using `hawq start 
segment`. If you want HAWQ to ignore hosts that fail ssh validation, use the 
hawq start `--ignore-bad-hosts` option. 
+
+Use the `hawq start cluster` command to start a HAWQ system that has already 
been initialized by the `hawq init cluster` command, but has been stopped by 
the `hawq stop cluster` command. The `hawq start cluster` command starts a HAWQ 
system on the master host and starts all its segments. The command orchestrates 
this process and performs the process in parallel.
+
+
+## <a id="task_gpdb_restart"></a>Restarting HAWQ 
+
+Stop the HAWQ system and then restart it.
+
+The `hawq restart` command with the appropriate `cluster` or node-type option 
will stop and then restart HAWQ after the shutdown completes. If the master or 
segments are already stopped, restart will have no effect.
+
+-   To restart a HAWQ cluster, enter the following command on the master host:
+
+    ```shell
+    $ hawq restart cluster
+    ```
+
+
+## <a id="task_upload_config"></a>Reloading Configuration File Changes Only 
+
+Reload changes to the HAWQ configuration files without interrupting the system.
+
+The `hawq stop` command can reload changes to the `pg_hba.conf `configuration 
file and to *runtime* parameters in the `hawq-site.xml` and `pg_hba.conf` files 
without service interruption. Active sessions pick up changes when they 
reconnect to the database. Many server configuration parameters require a full 
system restart \(`hawq restart cluster`\) to activate. For information about 
server configuration parameters, see the [Server Configuration Parameter 
Reference](../reference/guc/guc_config.html).
+
+-   Reload configuration file changes without shutting down the system using 
the `hawq stop` command:
+
+    ```shell
+    $ hawq stop cluster --reload
+    ```
+    
+    Or:
+
+    ```shell
+    $ hawq stop cluster -u
+    ```
+    
+
+## <a id="task_maint_mode"></a>Starting the Master in Maintenance Mode 
+
+Start only the master to perform maintenance or administrative tasks without 
affecting data on the segments.
+
+Maintenance mode is a superuser-only mode that should only be used when 
required for a particular maintenance task. For example, you can connect to a 
database only on the master instance in maintenance mode and edit system 
catalog settings.
+
+1.  Run `hawq start` on the `master` using the `-m` option:
+
+    ```shell
+    $ hawq start master -m
+    ```
+
+2.  Connect to the master in maintenance mode to do catalog maintenance. For 
example:
+
+    ```shell
+    $ PGOPTIONS='-c gp_session_role=utility' psql template1
+    ```
+3.  After completing your administrative tasks, restart the master in 
production mode. 
+
+    ```shell
+    $ hawq restart master 
+    ```
+
+    **Warning:**
+
+    Incorrect use of maintenance mode connections can result in an 
inconsistent HAWQ system state. Only expert users should perform this operation.
+
+
+## <a id="task_gpdb_stop"></a>Stopping HAWQ 
+
+The `hawq stop cluster` command stops or restarts your HAWQ system and always 
runs on the master host. When activated, `hawq stop cluster` stops all 
`postgres` processes in the system, including the master and all segment 
instances. The `hawq stop cluster` command uses a default of up to 64 parallel 
worker threads to bring down the segments that make up the HAWQ cluster. The 
system waits for any active transactions to finish before shutting down. To 
stop HAWQ immediately, use fast mode. The commands `hawq stop master`, `hawq 
stop segment`, `hawq stop standby`, or `hawq stop allsegments` can be used to 
stop the master, the local segment node, standby, or all segments in the 
cluster. Stopping the master will stop only the master segment, and will not 
shut down a cluster.
+
+-   To stop HAWQ:
+
+    ```shell
+    $ hawq stop cluster
+    ```
+
+-   To stop HAWQ in fast mode:
+
+    ```shell
+    $ hawq stop cluster -M fast
+    ```
+
+
+## <a id="task_tx4_bl3_h5"></a>Best Practices to Start/Stop HAWQ Cluster 
Members 
+
+For best results in using `hawq start` and `hawq stop` to manage your HAWQ 
system, the following best practices are recommended.
+
+-   Issue the `CHECKPOINT` command to update and flush all data files to disk 
and update the log file before stopping the cluster. A checkpoint ensures that, 
in the event of a crash, files can be restored from the checkpoint snapshot.
+
+-   Stop the entire HAWQ system by stopping the cluster on the master host. 
+
+    ```shell
+    $ hawq stop cluster
+    ```
+
+-   To stop segments and kill any running queries without causing data loss or 
inconsistency issues, use `fast` or `immediate` mode on the cluster:
+
+    ```shell
+    $ hawq stop cluster -M fast
+    $ hawq stop cluster -M immediate
+    ```
+
+-   Use `hawq stop master` to stop the master only. If you cannot stop the 
master due to running transactions, try using `fast` shutdown. If `fast` 
shutdown does not work, use `immediate` shutdown. Use `immediate` shutdown with 
caution, as it will result in a crash-recovery run when the system is restarted.
+
+       ```shell
+    $ hawq stop master -M fast
+    $ hawq stop master -M immediate
+    ```
+-   If you have changed or want to reload server parameter settings on a HAWQ 
database where there are active connections, use the command:
+
+
+       ```shell
+    $ hawq stop master -u -M fast 
+    ```   
+
+-   When stopping a segment or all segments, use `smart` mode, which is the 
default. Using `fast` or `immediate` mode on segments will have no effect since 
segments are stateless.
+
+    ```shell
+    $ hawq stop segment
+    $ hawq stop allsegments
+    ```
+-      You should typically always use `hawq start cluster` or `hawq restart 
cluster` to start the cluster. If you do end up starting nodes individually 
with `hawq start standby|master|segment`, make sure to always start the standby 
*before* the active master. Otherwise, the standby can become unsynchronized 
with the active master.


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/bestpractices/HAWQBestPracticesOverview.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/bestpractices/HAWQBestPracticesOverview.html.md.erb 
b/markdown/bestpractices/HAWQBestPracticesOverview.html.md.erb
new file mode 100644
index 0000000..13b4dca
--- /dev/null
+++ b/markdown/bestpractices/HAWQBestPracticesOverview.html.md.erb
@@ -0,0 +1,28 @@
+---
+title: Best Practices
+---
+
+This chapter provides best practices on using the components and features that 
are part of a HAWQ system.
+
+
+-   **[Best Practices for Operating 
HAWQ](../bestpractices/operating_hawq_bestpractices.html)**
+
+    This topic provides best practices for operating HAWQ, including 
recommendations for stopping, starting and monitoring HAWQ.
+
+-   **[Best Practices for Securing 
HAWQ](../bestpractices/secure_bestpractices.html)**
+
+    To secure your HAWQ deployment, review the recommendations listed in this 
topic.
+
+-   **[Best Practices for Managing 
Resources](../bestpractices/managing_resources_bestpractices.html)**
+
+    This topic describes best practices for managing resources in HAWQ.
+
+-   **[Best Practices for Managing 
Data](../bestpractices/managing_data_bestpractices.html)**
+
+    This topic describes best practices for creating databases, loading data, 
partioning data, and recovering data in HAWQ.
+
+-   **[Best Practices for Querying 
Data](../bestpractices/querying_data_bestpractices.html)**
+
+    To obtain the best results when querying data in HAWQ, review the best 
practices described in this topic.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/bestpractices/general_bestpractices.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/bestpractices/general_bestpractices.html.md.erb 
b/markdown/bestpractices/general_bestpractices.html.md.erb
new file mode 100644
index 0000000..503887b
--- /dev/null
+++ b/markdown/bestpractices/general_bestpractices.html.md.erb
@@ -0,0 +1,26 @@
+---
+title: HAWQ Best Practices
+---
+
+This topic addresses general best practices for users who are new to HAWQ.
+
+When using HAWQ, adhere to the following guidelines for best results:
+
+-   **Use a consistent `hawq-site.xml` file to configure your entire cluster**:
+
+    Configuration guc/parameters are located in `$GPHOME/etc/hawq-site.xml`. 
This configuration file resides on all HAWQ instances and can be modified by 
using the `hawq config` utility. You can use the same configuration file 
cluster-wide across both master and segments.
+    
+    If you install and manage HAWQ using Ambari, do not use `hawq config` to 
set or change HAWQ configuration properties. Use the Ambari interface for all 
configuration changes. Configuration changes to `hawq-site.xml` made outside 
the Ambari interface will be overwritten when you restart or reconfigure  HAWQ 
using Ambari.
+
+    **Note:** While `postgresql.conf` still exists in HAWQ, any parameters 
defined in `hawq-site.xml` will overwrite configurations in `postgresql.conf`. 
For this reason, we recommend that you only use `hawq-site.xml` to configure 
your HAWQ cluster.
+
+-   **Keep in mind the factors that impact the number of virtual segments used 
for queries. The number of virtual segments used directly impacts the query's 
performance.** The degree of parallelism achieved by a query is determined by 
multiple factors, including the following:
+    -   **Cost of the query**. Small queries use fewer segments and larger 
queries use more segments. Note that there are some techniques you can use when 
defining resource queues to influence the number of virtual segments and 
general resources that are allocated to queries. See [Best Practices for Using 
Resource Queues](managing_resources_bestpractices.html#topic_hvd_pls_wv).
+    -   **Available resources**. Resources available at query time. If more 
resources are available in the resource queue, the resources will be used.
+    -   **Hash table and bucket number**. If the query involves only 
hash-distributed tables, and the bucket number (bucketnum) configured for all 
the hash tables is either the same bucket number for all tables or the table 
size for random tables is no more than 1.5 times larger than the size of hash 
tables for the hash tables, then the query's parallelism is fixed (equal to the 
hash table bucket number). Otherwise, the number of virtual segments depends on 
the query's cost and hash-distributed table queries will behave like queries on 
randomly distributed tables.
+    -   **Query Type**: For queries with some user-defined functions or for 
external tables where calculating resource costs is difficult , then the number 
of virtual segments is controlled by `hawq_rm_nvseg_perquery_limit `and 
`hawq_rm_nvseg_perquery_perseg_limit` parameters, as well as by the ON clause 
and the location list of external tables. If the query has a hash result table 
(e.g. `INSERT into hash_table`) then the number of virtual segment number must 
be equal to the bucket number of the resulting hash table, If the query is 
performed in utility mode, such as for `COPY` and `ANALYZE` operations, the 
virtual segment number is calculated by different policies, which will be 
explained later in this section.
+    -   **PXF**: PXF external tables use the 
`default_hash_table_bucket_number` parameter, not the 
`hawq_rm_nvseg_perquery_perseg_limit` parameter, to control the number of 
virtual segments. 
+
+    See [Query Performance](../query/query-performance.html#topic38) for more 
details.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/bestpractices/managing_data_bestpractices.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/bestpractices/managing_data_bestpractices.html.md.erb 
b/markdown/bestpractices/managing_data_bestpractices.html.md.erb
new file mode 100644
index 0000000..11d6e02
--- /dev/null
+++ b/markdown/bestpractices/managing_data_bestpractices.html.md.erb
@@ -0,0 +1,47 @@
+---
+title: Best Practices for Managing Data
+---
+
+This topic describes best practices for creating databases, loading data, 
partioning data, and recovering data in HAWQ.
+
+## <a id="topic_xhy_v2j_1v"></a>Best Practices for Loading Data
+
+Loading data into HDFS is challenging due to the limit on the number of files 
that can be opened concurrently for write on both NameNodes and DataNodes.
+
+To obtain the best performance during data loading, observe the following best 
practices:
+
+-   Typically the number of concurrent connections to a NameNode should not 
exceed 50,000, and the number of open files per DataNode should not exceed 
10,000. If you exceed these limits, NameNode and DataNode may become overloaded 
and slow.
+-   If the number of partitions in a table is large, the recommended way to 
load data into the partitioned table is to load the data partition by 
partition. For example, you can use query such as the following to load data 
into only one partition:
+
+    ```sql
+    INSERT INTO target_partitioned_table_part1 SELECT * FROM source_table 
WHERE filter
+    ```
+
+    where *filter* selects only the data in the target partition.
+
+-   To alleviate the load on NameNode, you can reduce the number of virtual 
segment used per node. You can do this on the statement-level or on the 
resource queue level. See [Configuring the Maximum Number of Virtual 
Segments](../resourcemgmt/ConfigureResourceManagement.html#topic_tl5_wq1_f5) 
for more information.
+-   Use resource queues to limit load query and read query concurrency.
+
+The best practice for loading data into partitioned tables is to create an 
intermediate staging table, load it, and then exchange it into your partition 
design. See [Exchanging a Partition](../ddl/ddl-partition.html#topic83).
+
+## <a id="topic_s23_52j_1v"></a>Best Practices for Partitioning Data
+
+### <a id="topic65"></a>Deciding on a Table Partitioning Strategy
+
+Not all tables are good candidates for partitioning. If the answer is *yes* to 
all or most of the following questions, table partitioning is a viable database 
design strategy for improving query performance. If the answer is *no* to most 
of the following questions, table partitioning is not the right solution for 
that table. Test your design strategy to ensure that query performance improves 
as expected.
+
+-   **Is the table large enough?** Large fact tables are good candidates for 
table partitioning. If you have millions or billions of records in a table, you 
may see performance benefits from logically breaking that data up into smaller 
chunks. For smaller tables with only a few thousand rows or less, the 
administrative overhead of maintaining the partitions will outweigh any 
performance benefits you might see.
+-   **Are you experiencing unsatisfactory performance?** As with any 
performance tuning initiative, a table should be partitioned only if queries 
against that table are producing slower response times than desired.
+-   **Do your query predicates have identifiable access patterns?** Examine 
the `WHERE` clauses of your query workload and look for table columns that are 
consistently used to access data. For example, if most of your queries tend to 
look up records by date, then a monthly or weekly date-partitioning design 
might be beneficial. Or if you tend to access records by region, consider a 
list-partitioning design to divide the table by region.
+-   **Does your data warehouse maintain a window of historical data?** Another 
consideration for partition design is your organization's business requirements 
for maintaining historical data. For example, your data warehouse may require 
that you keep data for the past twelve months. If the data is partitioned by 
month, you can easily drop the oldest monthly partition from the warehouse and 
load current data into the most recent monthly partition.
+-   **Can the data be divided into somewhat equal parts based on some defining 
criteria?** Choose partitioning criteria that will divide your data as evenly 
as possible. If the partitions contain a relatively equal number of records, 
query performance improves based on the number of partitions created. For 
example, by dividing a large table into 10 partitions, a query will execute 10 
times faster than it would against the unpartitioned table, provided that the 
partitions are designed to support the query's criteria.
+
+Do not create more partitions than are needed. Creating too many partitions 
can slow down management and maintenance jobs, such as vacuuming, recovering 
segments, expanding the cluster, checking disk usage, and others.
+
+Partitioning does not improve query performance unless the query optimizer can 
eliminate partitions based on the query predicates. Queries that scan every 
partition run slower than if the table were not partitioned, so avoid 
partitioning if few of your queries achieve partition elimination. Check the 
explain plan for queries to make sure that partitions are eliminated. See 
[Query Profiling](../query/query-profiling.html#topic39) for more about 
partition elimination.
+
+Be very careful with multi-level partitioning because the number of partition 
files can grow very quickly. For example, if a table is partitioned by both day 
and city, and there are 1,000 days of data and 1,000 cities, the total number 
of partitions is one million. Column-oriented tables store each column in a 
physical table, so if this table has 100 columns, the system would be required 
to manage 100 million files for the table.
+
+Before settling on a multi-level partitioning strategy, consider a single 
level partition with bitmap indexes. Indexes slow down data loads, so consider 
performance testing with your data and schema to decide on the best strategy.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/bestpractices/managing_resources_bestpractices.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/bestpractices/managing_resources_bestpractices.html.md.erb 
b/markdown/bestpractices/managing_resources_bestpractices.html.md.erb
new file mode 100644
index 0000000..f770611
--- /dev/null
+++ b/markdown/bestpractices/managing_resources_bestpractices.html.md.erb
@@ -0,0 +1,144 @@
+---
+title: Best Practices for Managing Resources
+---
+
+This topic describes best practices for managing resources in HAWQ.
+
+## <a id="topic_ikz_ndx_15"></a>Best Practices for Configuring Resource 
Management
+
+When configuring resource management, you can apply certain best practices to 
ensure that resources are managed both efficiently and for best system 
performance.
+
+The following is a list of high-level best practices for optimal resource 
management:
+
+-   Make sure segments do not have identical IP addresses. See [Segments Do 
Not Appear in 
gp\_segment\_configuration](../troubleshooting/Troubleshooting.html#topic_hlj_zxx_15)
 for an explanation of this problem.
+-   Configure all segments to have the same resource capacity. See 
[Configuring Segment Resource 
Capacity](../resourcemgmt/ConfigureResourceManagement.html#topic_htk_fxh_15).
+-   To prevent resource fragmentation, ensure that your deployment's segment 
resource capacity (standalone mode) or YARN node resource capacity (YARN mode) 
is a multiple of all virtual segment resource quotas. See [Configuring Segment 
Resource 
Capacity](../resourcemgmt/ConfigureResourceManagement.html#topic_htk_fxh_15) 
(HAWQ standalone mode) and [Setting HAWQ Segment Resource Capacity in 
YARN](../resourcemgmt/YARNIntegration.html#topic_pzf_kqn_c5).
+-   Ensure that enough registered segments are available and usable for query 
resource requests. If the number of unavailable or unregistered segments is 
higher than a set limit, then query resource requests are rejected. Also ensure 
that the variance of dispatched virtual segments across physical segments is 
not greater than the configured limit. See [Rejection of Query Resource 
Requests](../troubleshooting/Troubleshooting.html#topic_vm5_znx_15).
+-   Use multiple master and segment temporary directories on separate, large 
disks (2TB or greater) to load balance writes to temporary files (for example, 
`/disk1/tmp             /disk2/tmp`). For a given query, HAWQ will use a 
separate temp directory (if available) for each virtual segment to store spill 
files. Multiple HAWQ sessions will also use separate temp directories where 
available to avoid disk contention. If you configure too few temp directories, 
or you place multiple temp directories on the same disk, you increase the risk 
of disk contention or running out of disk space when multiple virtual segments 
target the same disk.
+-   Configure minimum resource levels in YARN, and tune the timeout of when 
idle resources are returned to YARN. See [Tune HAWQ Resource Negotiations with 
YARN](../resourcemgmt/YARNIntegration.html#topic_wp3_4bx_15).
+-   Make sure that the property `yarn.scheduler.minimum-allocation-mb` in 
`yarn-site.xml` is an equal subdivision of 1GB. For example, 1024, 512.
+
+## <a id="topic_hvd_pls_wv"></a>Best Practices for Using Resource Queues
+
+Design and configure your resource queues depending on the operational needs 
of your deployment. This topic describes the best practices for creating and 
modifying resource queues within the context of different operational scenarios.
+
+### Modifying Resource Queues for Overloaded HDFS
+
+A high number of concurrent HAWQ queries can cause HDFS to overload, 
especially when querying partitioned tables. Use the `ACTIVE_STATEMENTS` 
attribute to restrict statement concurrency in a resource queue. For example, 
if an external application is executing more than 100 concurrent queries, then 
limiting the number of active statements in your resource queues will instruct 
the HAWQ resource manager to restrict actual statement concurrency within HAWQ. 
You might want to modify an existing resource queue as follows:
+
+```sql
+ALTER RESOURCE QUEUE sampleque1 WITH (ACTIVE_STATEMENTS=20);
+```
+
+In this case, when this DDL is applied to queue `sampleque1`, the roles using 
this queue will have to wait until no more than 20 statements are running to 
execute their queries. Therefore, 80 queries will be waiting in the queue for 
later execution. Restricting the number of active query statements helps limit 
the usage of HDFS resources and protects HDFS. You can alter concurrency even 
when the resource queue is busy. For example, if a queue already has 40 
concurrent statements running, and you apply a DDL statement that specifies 
`ACTIVE_STATEMENTS=20`, then the resource queue pauses the allocation of 
resources to queries until more than 20 statements have returned their 
resources.
+
+### Isolating and Protecting Production Workloads
+
+Another best practice is using resource queues to isolate your workloads. 
Workload isolation prevents your production workload from being starved of 
resources. To create this isolation, divide your workload by creating roles for 
specific purposes. For example, you could create one role for production online 
verification and another role for the regular running of production processes.
+
+In this scenario, let us assign `role1` for the production workload and 
`role2` for production software verification. We can define the following 
resource queues under the same parent queue `dept1que`, which is the resource 
queue defined for the entire department.
+
+```sql
+CREATE RESOURCE QUEUE dept1product
+   WITH (PARENT='dept1que', MEMORY_LIMIT_CLUSTER=90%, CORE_LIMIT_CLUSTER=90%, 
RESOURCE_OVERCOMMIT_FACTOR=2);
+
+CREATE RESOURCE QUEUE dept1verification 
+   WITH (PARENT='dept1que', MEMORY_LIMIT_CLUSTER=10%, CORE_LIMIT_CLUSTER=10%, 
RESOURCE_OVERCOMMIT_FACTOR=10);
+
+ALTER ROLE role1 RESOURCE QUEUE dept1product;
+
+ALTER ROLE role2 RESOURCE QUEUE dept1verification;
+```
+
+With these resource queues defined, workload is spread across the resource 
queues as follows:
+
+-   When both `role1` and `role2` have workloads, the test verification 
workload gets only 10% of the total available `dept1que` resources, leaving 90% 
of the `dept1que` resources available for running the production workload.
+-   When `role1` has a workload but `role2` is idle, then 100% of all 
`dept1que` resources can be consumed by the production workload.
+-   When only `role2` has a workload (for example, during a scheduled testing 
window), then 100% of all `dept1que` resources can also be utilized for testing.
+
+Even when the resource queues are busy, you can alter the resource queue's 
memory and core limits to change resource allocation policies before switching 
workloads.
+
+In addition, you can use resource queues to isolate workloads for different 
departments or different applications. For example, we can use the following 
DDL statements to define 3 departments, and an administrator can arbitrarily 
redistribute resource allocations among the departments according to usage 
requirements.
+
+```sql
+ALTER RESOURCE QUEUE pg_default 
+   WITH (MEMORY_LIMIT_CLUSTER=10%, CORE_LIMIT_CLUSTER=10%);
+
+CREATE RESOURCE QUEUE dept1 
+   WITH (PARENT='pg_root', MEMORY_LIMIT_CLUSTER=30%, CORE_LIMIT_CLUSTER=30%);
+
+CREATE RESOURCE QUEUE dept2 
+   WITH (PARENT='pg_root', MEMORY_LIMIT_CLUSTER=30%, CORE_LIMIT_CLUSTER=30%);
+
+CREATE RESOURCE QUEUE dept3 
+   WITH (PARENT='pg_root', MEMORY_LIMIT_CLUSTER=30%, CORE_LIMIT_CLUSTER=30%);
+
+CREATE RESOURCE QUEUE dept11
+   WITH (PARENT='dept1', MEMORY_LIMIT_CLUSTER=50%,CORE_LIMIT_CLUSTER=50%);
+
+CREATE RESOURCE QUEUE dept12
+   WITH (PARENT='dept1', MEMORY_LIMIT_CLUSTER=50%, CORE_LIMIT_CLUSTER=50%);
+```
+
+### Querying Parquet Tables with Large Table Size
+
+You can use resource queues to improve query performance on Parquet tables 
with a large page size. This type of query requires a large memory quota for 
virtual segments. Therefore, if one role mostly queries Parquet tables with a 
large page size, alter the resource queue associated with the role to increase 
its virtual segment resource quota. For example:
+
+```sql
+ALTER RESOURCE queue1 WITH (VSEG_RESOURCE_QUOTA='mem:2gb');
+```
+
+If there are only occasional queries on Parquet tables with a large page size, 
use a statement level specification instead of altering the resource queue. For 
example:
+
+```sql
+SET HAWQ_RM_STMT_NVSEG=10;
+SET HAWQ_RM_STMT_VSEG_MEMORY='2gb';
+query1;
+SET HAWQ_RM_STMT_NVSEG=0;
+```
+
+### Restricting Resource Consumption for Specific Queries
+
+In general, the HAWQ resource manager attempts to provide as much resources as 
possible to the current query to achieve high query performance. When a query 
is complex and large, however, the associated resource queue can use up many 
virtual segments causing other resource queues (and queries) to starve. Under 
these circumstances,you should enable nvseg limits on the resource queue 
associated with the large query. For example, you can specify that all queries 
can use no more than 200 virtual segments. To achieve this limit, alter the 
resource queue as follows
+
+``` sql
+ALTER RESOURCE QUEUE queue1 WITH (NVSEG_UPPER_LIMIT=200);
+```
+
+If we hope to make this limit vary according to the dynamic cluster size, we 
can use the following statement.
+
+```sql
+ALTER RESOURCE QUEUE queue1 WITH (NVSEG_UPPER_LIMIT_PERSEG=10);
+```
+
+After setting the limit in the above example, the actual limit will be 100 if 
you have a 10-node cluster. If the cluster is expanded to 20 nodes, then the 
limit increases automatically to 200.
+
+### Guaranteeing Resource Allocations for Individual Statements
+
+In general, the minimum number of virtual segments allocated to a statement is 
decided by the resource queue's actual capacity and its concurrency setting. 
For example, if there are 10 nodes in a cluster and the total resource capacity 
of the cluster is 640GB and 160 cores, then a resource queue having 20% 
capacity has a capacity of 128GB (640GB \* .20) and 32 cores (160 \*.20). If 
the virtual segment quota is set to 256MB, then this queue has 512 virtual 
segments allocated (128GB/256MB=512). If the `ACTIVE_STATEMENTS` concurrency 
setting for the resource queue is 20, then the minimum number of allocated 
virtual segments for each query is **25** (*trunc*(512/20)=25). However, this 
minimum number of virtual segments is a soft restriction. If a query statement 
requires only 5 virtual segments, then this minimum number of 25 is ignored 
since it is not necessary to allocate 25 for this statement.
+
+In order to raise the minimum number of virtual segments available for a query 
statement, there are two options.
+
+-   *Option 1*: Alter the resource queue to reduce concurrency. This is the 
recommended way to achieve the goal. For example:
+
+    ```sql
+    ALTER RESOURCE QUEUE queue1 WITH (ACTIVE_STATEMENTS=10);
+    ```
+
+    If the original concurrency setting is 20, then the minimum number of 
virtual segments is doubled.
+
+-   *Option 2*: Alter the nvseg limits of the resource queue. For example:
+
+    ```sql
+    ALTER RESOURCE QUEUE queue1 WITH (NVSEG_LOWER_LIMIT=50);
+    ```
+
+    or, alternately:
+
+    ```sql
+    ALTER RESOURCE QUEUE queue1 WITH (NVSEG_LOWER_LIMIT_PERSEG=5);
+    ```
+
+    In the second DDL, if there are 10 nodes in the cluster, the actual 
minimum number of virtual segments is 50 (5 \* 10 = 50).
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/bestpractices/operating_hawq_bestpractices.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/bestpractices/operating_hawq_bestpractices.html.md.erb 
b/markdown/bestpractices/operating_hawq_bestpractices.html.md.erb
new file mode 100644
index 0000000..9dc56e9
--- /dev/null
+++ b/markdown/bestpractices/operating_hawq_bestpractices.html.md.erb
@@ -0,0 +1,298 @@
+---
+title: Best Practices for Operating HAWQ
+---
+
+This topic provides best practices for operating HAWQ, including 
recommendations for stopping, starting and monitoring HAWQ.
+
+## <a id="best_practice_config"></a>Best Practices for Configuring HAWQ 
Parameters
+
+The HAWQ configuration guc/parameters are located in 
`$GPHOME/etc/hawq-site.xml`. This configuration file resides on all HAWQ 
instances and can be modified either by the Ambari interface or the command 
line. 
+
+If you install and manage HAWQ using Ambari, use the Ambari interface for all 
configuration changes. Do not use command line utilities such as `hawq config` 
to set or change HAWQ configuration properties for Ambari-managed clusters. 
Configuration changes to `hawq-site.xml` made outside the Ambari interface will 
be overwritten when you restart or reconfigure HAWQ using Ambari.
+
+If you manage your cluster using command line tools instead of Ambari, use a 
consistent `hawq-site.xml` file to configure your entire cluster. 
+
+**Note:** While `postgresql.conf` still exists in HAWQ, any parameters defined 
in `hawq-site.xml` will overwrite configurations in `postgresql.conf`. For this 
reason, we recommend that you only use `hawq-site.xml` to configure your HAWQ 
cluster. For Ambari clusters, always use Ambari for configuring `hawq-site.xml` 
parameters.
+
+## <a id="task_qgk_bz3_1v"></a>Best Practices to Start/Stop HAWQ Cluster 
Members
+
+For best results in using `hawq start` and `hawq stop` to manage your HAWQ 
system, the following best practices are recommended.
+
+-   Issue the `CHECKPOINT` command to update and flush all data files to disk 
and update the log file before stopping the cluster. A checkpoint ensures that, 
in the event of a crash, files can be restored from the checkpoint snapshot.
+-   Stop the entire HAWQ system by stopping the cluster on the master host:
+    ```shell
+    $ hawq stop cluster
+    ```
+
+-   To stop segments and kill any running queries without causing data loss or 
inconsistency issues, use `fast` or `immediate` mode on the cluster:
+
+    ```shell
+    $ hawq stop cluster -M fast
+    ```
+    ```shell
+    $ hawq stop cluster -M immediate
+    ```
+
+-   Use `hawq stop master` to stop the master only. If you cannot stop the 
master due to running transactions, try using fast shutdown. If fast shutdown 
does not work, use immediate shutdown. Use immediate shutdown with caution, as 
it will result in a crash-recovery run when the system is restarted. 
+
+    ```shell
+    $ hawq stop master -M fast
+    ```
+    ```shell
+    $ hawq stop master -M immediate
+    ```
+
+-   When stopping a segment or all segments, you can use the default mode of 
smart mode. Using fast or immediate mode on segments will have no effect since 
segments are stateless.
+
+    ```shell
+    $ hawq stop segment
+    ```
+    ```shell
+    $ hawq stop allsegments
+    ```
+
+-   Typically you should always use `hawq start cluster` or `hawq              
 restart cluster` to start the cluster. If you do end up using `hawq start 
standby|master|segment` to start nodes individually, make sure you always start 
the standby before the active master. Otherwise, the standby can become 
unsynchronized with the active master.
+
+## <a id="id_trr_m1j_1v"></a>Guidelines for Cluster Expansion
+
+This topic provides some guidelines around expanding your HAWQ cluster.
+
+There are several recommendations to keep in mind when modifying the size of 
your running HAWQ cluster:
+
+-   When you add a new node, install both a DataNode and a physical segment on 
the new node.
+-   After adding a new node, you should always rebalance HDFS data to maintain 
cluster performance.
+-   Adding or removing a node also necessitates an update to the HDFS metadata 
cache. This update will happen eventually, but can take some time. To speed the 
update of the metadata cache, execute **`select gp_metadata_cache_clear();`**.
+-   Note that for hash distributed tables, expanding the cluster will not 
immediately improve performance since hash distributed tables use a fixed 
number of virtual segments. In order to obtain better performance with hash 
distributed tables, you must redistribute the table to the updated cluster by 
either the [ALTER TABLE](../reference/sql/ALTER-TABLE.html) or [CREATE TABLE 
AS](../reference/sql/CREATE-TABLE-AS.html#topic1) command.
+-   If you are using hash tables, consider updating the 
`default_hash_table_bucket_number` server configuration parameter to a larger 
value after expanding the cluster but before redistributing the hash tables.
+
+## <a id="id_o5n_p1j_1v"></a>Database State Monitoring Activities
+
+<a id="id_o5n_p1j_1v__d112e31"></a>
+
+<table>
+<caption><span class="tablecap">Table 1. Database State Monitoring 
Activities</span></caption>
+<colgroup>
+<col width="33%" />
+<col width="33%" />
+<col width="33%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th>Activity</th>
+<th>Procedure</th>
+<th>Corrective Actions</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>List segments that are currently down. If any rows are returned, this 
should generate a warning or alert.
+<p>Recommended frequency: run every 5 to 10 minutes</p>
+<p>Severity: IMPORTANT</p></td>
+<td>Run the following query in the <code class="ph codeph">postgres</code> 
database:
+<pre class="pre codeblock"><code>SELECT * FROM gp_segment_configuration
+WHERE status &lt;&gt; &#39;u&#39;;</code></pre></td>
+<td>If the query returns any rows, follow these steps to correct the problem:
+<ol>
+<li>Verify that the hosts with down segments are responsive.</li>
+<li>If hosts are OK, check the <span class="ph filepath">pg_log</span> files 
for the down segments to discover the root cause of the segments going 
down.</li>
+</ol></td>
+</tr>
+</tbody>
+</table>
+
+
+## <a id="id_d3w_p1j_1v"></a>Hardware and Operating System Monitoring
+
+<a id="id_d3w_p1j_1v__d112e111"></a>
+
+<table>
+<caption><span class="tablecap">Table 2. Hardware and Operating System 
Monitoring Activities</span></caption>
+<colgroup>
+<col width="33%" />
+<col width="33%" />
+<col width="33%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th>Activity</th>
+<th>Procedure</th>
+<th>Corrective Actions</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>Underlying platform check for maintenance required or system down of the 
hardware.
+<p>Recommended frequency: real-time, if possible, or every 15 minutes</p>
+<p>Severity: CRITICAL</p></td>
+<td>Set up system check for hardware and OS errors.</td>
+<td>If required, remove a machine from the HAWQ cluster to resolve hardware 
and OS issues, then add it back to the cluster after the issues are 
resolved.</td>
+</tr>
+<tr class="even">
+<td>Check disk space usage on volumes used for HAWQ data storage and the OS.
+<p>Recommended frequency: every 5 to 30 minutes</p>
+<p>Severity: CRITICAL</p></td>
+<td><div class="p">
+Set up a disk space check.
+<ul>
+<li>Set a threshold to raise an alert when a disk reaches a percentage of 
capacity. The recommended threshold is 75% full.</li>
+<li>It is not recommended to run the system with capacities approaching 
100%.</li>
+</ul>
+</div></td>
+<td>Free space on the system by removing some data or files.</td>
+</tr>
+<tr class="odd">
+<td>Check for errors or dropped packets on the network interfaces.
+<p>Recommended frequency: hourly</p>
+<p>Severity: IMPORTANT</p></td>
+<td>Set up a network interface checks.</td>
+<td><p>Work with network and OS teams to resolve errors.</p></td>
+</tr>
+<tr class="even">
+<td>Check for RAID errors or degraded RAID performance.
+<p>Recommended frequency: every 5 minutes</p>
+<p>Severity: CRITICAL</p></td>
+<td>Set up a RAID check.</td>
+<td><ul>
+<li>Replace failed disks as soon as possible.</li>
+<li>Work with system administration team to resolve other RAID or controller 
errors as soon as possible.</li>
+</ul></td>
+</tr>
+<tr class="odd">
+<td>Check for adequate I/O bandwidth and I/O skew.
+<p>Recommended frequency: when create a cluster or when hardware issues are 
suspected.</p></td>
+<td>Run the HAWQ <code class="ph codeph">hawq checkperf</code> utility.</td>
+<td><div class="p">
+The cluster may be under-specified if data transfer rates are not similar to 
the following:
+<ul>
+<li>2GB per second disk read</li>
+<li>1 GB per second disk write</li>
+<li>10 Gigabit per second network read and write</li>
+</ul>
+If transfer rates are lower than expected, consult with your data architect 
regarding performance expectations.
+</div>
+<p>If the machines on the cluster display an uneven performance profile, work 
with the system administration team to fix faulty machines.</p></td>
+</tr>
+</tbody>
+</table>
+
+
+## <a id="id_khd_q1j_1v"></a>Data Maintenance
+
+<a id="id_khd_q1j_1v__d112e279"></a>
+
+<table>
+<caption><span class="tablecap">Table 3. Data Maintenance 
Activities</span></caption>
+<colgroup>
+<col width="33%" />
+<col width="33%" />
+<col width="33%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th>Activity</th>
+<th>Procedure</th>
+<th>Corrective Actions</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>Check for missing statistics on tables.</td>
+<td>Check the <code class="ph codeph">hawq_stats_missing</code> view in each 
database:
+<pre class="pre codeblock"><code>SELECT * FROM 
hawq_toolkit.hawq_stats_missing;</code></pre></td>
+<td>Run <code class="ph codeph">ANALYZE</code> on tables that are missing 
statistics.</td>
+</tr>
+</tbody>
+</table>
+
+
+## <a id="id_lx4_q1j_1v"></a>Database Maintenance
+
+<a id="id_lx4_q1j_1v__d112e343"></a>
+
+<table>
+<caption><span class="tablecap">Table 4. Database Maintenance 
Activities</span></caption>
+<colgroup>
+<col width="33%" />
+<col width="33%" />
+<col width="33%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th>Activity</th>
+<th>Procedure</th>
+<th>Corrective Actions</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>Mark deleted rows in HAWQ system catalogs (tables in the <code class="ph 
codeph">pg_catalog</code> schema) so that the space they occupy can be reused.
+<p>Recommended frequency: daily</p>
+<p>Severity: CRITICAL</p></td>
+<td>Vacuum each system catalog:
+<pre class="pre codeblock"><code>VACUUM &lt;table&gt;;</code></pre></td>
+<td>Vacuum system catalogs regularly to prevent bloating.</td>
+</tr>
+<tr class="even">
+<td>Update table statistics.
+<p>Recommended frequency: after loading data and before executing queries</p>
+<p>Severity: CRITICAL</p></td>
+<td>Analyze user tables:
+<pre class="pre codeblock"><code>ANALYZEDB -d &lt;database&gt; 
-a</code></pre></td>
+<td>Analyze updated tables regularly so that the optimizer can produce 
efficient query execution plans.</td>
+</tr>
+<tr class="odd">
+<td>Backup the database data.
+<p>Recommended frequency: daily, or as required by your backup plan</p>
+<p>Severity: CRITICAL</p></td>
+<td>See <a href="../admin/BackingUpandRestoringHAWQDatabases.html">Backing up 
and Restoring HAWQ Databases</a> for a discussion of backup procedures</td>
+<td>Best practice is to have a current backup ready in case the database must 
be restored.</td>
+</tr>
+<tr class="even">
+<td>Reindex system catalogs (tables in the <code class="ph 
codeph">pg_catalog</code> schema) to maintain an efficient catalog.
+<p>Recommended frequency: weekly, or more often if database objects are 
created and dropped frequently</p></td>
+<td>Run <code class="ph codeph">REINDEX SYSTEM</code> in each database.
+<pre class="pre codeblock"><code>REINDEXDB -s</code></pre></td>
+<td>The optimizer retrieves information from the system tables to create query 
plans. If system tables and indexes are allowed to become bloated over time, 
scanning the system tables increases query execution time.</td>
+</tr>
+</tbody>
+</table>
+
+
+## <a id="id_blv_q1j_1v"></a>Patching and Upgrading
+
+<a id="id_blv_q1j_1v__d112e472"></a>
+
+<table>
+<caption><span class="tablecap">Table 5. Patch and Upgrade 
Activities</span></caption>
+<colgroup>
+<col width="33%" />
+<col width="33%" />
+<col width="33%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th>Activity</th>
+<th>Procedure</th>
+<th>Corrective Actions</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>Ensure any bug fixes or enhancements are applied to the kernel.
+<p>Recommended frequency: at least every 6 months</p>
+<p>Severity: IMPORTANT</p></td>
+<td>Follow the vendor's instructions to update the Linux kernel.</td>
+<td>Keep the kernel current to include bug fixes and security fixes, and to 
avoid difficult future upgrades.</td>
+</tr>
+<tr class="even">
+<td>Install HAWQ minor releases.
+<p>Recommended frequency: quarterly</p>
+<p>Severity: IMPORTANT</p></td>
+<td>Always upgrade to the latest in the series.</td>
+<td>Keep the HAWQ software current to incorporate bug fixes, performance 
enhancements, and feature enhancements into your HAWQ cluster.</td>
+</tr>
+</tbody>
+</table>
+
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/bestpractices/querying_data_bestpractices.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/bestpractices/querying_data_bestpractices.html.md.erb 
b/markdown/bestpractices/querying_data_bestpractices.html.md.erb
new file mode 100644
index 0000000..3efe569
--- /dev/null
+++ b/markdown/bestpractices/querying_data_bestpractices.html.md.erb
@@ -0,0 +1,43 @@
+---
+title: Best Practices for Querying Data
+---
+
+To obtain the best results when querying data in HAWQ, review the best 
practices described in this topic.
+
+## <a id="virtual_seg_performance"></a>Factors Impacting Query Performance
+
+The number of virtual segments used for a query directly impacts the query's 
performance. The following factors can impact the degree of parallelism of a 
query:
+
+-   **Cost of the query**. Small queries use fewer segments and larger queries 
use more segments. Some techniques used in defining resource queues can 
influence the number of both virtual segments and general resources allocated 
to queries. For more information, see [Best Practices for Using Resource 
Queues](managing_resources_bestpractices.html#topic_hvd_pls_wv).
+-   **Available resources at query time**. If more resources are available in 
the resource queue, those resources will be used.
+-   **Hash table and bucket number**. If the query involves only 
hash-distributed tables, the query's parallelism is fixed (equal to the hash 
table bucket number) under the following conditions: 
+ 
+       - The bucket number (bucketnum) configured for all the hash tables is 
the same for all tables 
+   - The table size for random tables is no more than 1.5 times the size 
allotted for the hash tables. 
+
+  Otherwise, the number of virtual segments depends on the query's cost: 
hash-distributed table queries behave like queries on randomly-distributed 
tables.
+  
+-   **Query Type**: It can be difficult to calculate  resource costs for 
queries with some user-defined functions or for queries to external tables. 
With these queries,  the number of virtual segments is controlled by the  
`hawq_rm_nvseg_perquery_limit `and `hawq_rm_nvseg_perquery_perseg_limit` 
parameters, as well as by the ON clause and the location list of external 
tables. If the query has a hash result table (e.g. `INSERT into hash_table`), 
the number of virtual segments must be equal to the bucket number of the 
resulting hash table. If the query is performed in utility mode, such as for 
`COPY` and `ANALYZE` operations, the virtual segment number is calculated by 
different policies.
+
+  ***Note:*** PXF external tables use the `default_hash_table_bucket_number` 
parameter, not the `hawq_rm_nvseg_perquery_perseg_limit` parameter, to control 
the number of virtual segments.
+
+See [Query Performance](../query/query-performance.html#topic38) for more 
details.
+
+## <a id="id_xtk_jmq_1v"></a>Examining Query Plans to Solve Problems
+
+If a query performs poorly, examine its query plan and ask the following 
questions:
+
+-   **Do operations in the plan take an exceptionally long time?** Look for an 
operation that consumes the majority of query processing time. For example, if 
a scan on a hash table takes longer than expected, the data locality may be 
low; reloading the data can increase the data locality and speed up the query. 
Or, adjust `enable_<operator>` parameters to see if you can force the legacy 
query optimizer (planner) to choose a different plan by disabling a particular 
query plan operator for that query.
+-   **Are the optimizer's estimates close to reality?** Run `EXPLAIN           
  ANALYZE` and see if the number of rows the optimizer estimates is close to 
the number of rows the query operation actually returns. If there is a large 
discrepancy, collect more statistics on the relevant columns.
+-   **Are selective predicates applied early in the plan?** Apply the most 
selective filters early in the plan so fewer rows move up the plan tree. If the 
query plan does not correctly estimate query predicate selectivity, collect 
more statistics on the relevant columns. You can also try reordering the 
`WHERE` clause of your SQL statement.
+-   **Does the optimizer choose the best join order?** When you have a query 
that joins multiple tables, make sure that the optimizer chooses the most 
selective join order. Joins that eliminate the largest number of rows should be 
done earlier in the plan so fewer rows move up the plan tree.
+
+    If the plan is not choosing the optimal join order, set 
`join_collapse_limit=1` and use explicit `JOIN` syntax in your SQL statement to 
force the legacy query optimizer (planner) to the specified join order. You can 
also collect more statistics on the relevant join columns.
+
+-   **Does the optimizer selectively scan partitioned tables?** If you use 
table partitioning, is the optimizer selectively scanning only the child tables 
required to satisfy the query predicates? Scans of the parent tables should 
return 0 rows since the parent tables do not contain any data. See [Verifying 
Your Partition Strategy](../ddl/ddl-partition.html#topic74) for an example of a 
query plan that shows a selective partition scan.
+-   **Does the optimizer choose hash aggregate and hash join operations where 
applicable?** Hash operations are typically much faster than other types of 
joins or aggregations. Row comparison and sorting is done in memory rather than 
reading/writing from disk. To enable the query optimizer to choose hash 
operations, there must be sufficient memory available to hold the estimated 
number of rows. Run an `EXPLAIN  ANALYZE` for the query to show which plan 
operations spilled to disk, how much work memory they used, and how much memory 
was required to avoid spilling to disk. For example:
+
+    `Work_mem used: 23430K bytes avg, 23430K bytes max (seg0). Work_mem 
wanted: 33649K bytes avg, 33649K bytes max (seg0) to lessen workfile I/O 
affecting 2               workers.`
+
+  **Note:** The "bytes wanted" (*work\_mem* property) is based on the amount 
of data written to work files and is not exact. This property is not 
configurable. Use resource queues to manage memory use. For more information on 
resource queues, see [Configuring Resource 
Management](../resourcemgmt/ConfigureResourceManagement.html) and [Working with 
Hierarchical Resource Queues](../resourcemgmt/ResourceQueues.html).
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/bestpractices/secure_bestpractices.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/bestpractices/secure_bestpractices.html.md.erb 
b/markdown/bestpractices/secure_bestpractices.html.md.erb
new file mode 100644
index 0000000..04c5343
--- /dev/null
+++ b/markdown/bestpractices/secure_bestpractices.html.md.erb
@@ -0,0 +1,11 @@
+---
+title: Best Practices for Securing HAWQ
+---
+
+To secure your HAWQ deployment, review the recommendations listed in this 
topic.
+
+-   Set up SSL to encrypt your client server communication channel. See 
[Encrypting Client/Server Connections](../clientaccess/client_auth.html#topic5).
+-   Configure `pg_hba.conf` only on HAWQ master. Do not configure it on 
segments.
+    **Note:** For a more secure system, consider removing all connections that 
use trust authentication from your master `pg_hba.conf`. Trust authentication 
means the role is granted access without any authentication, therefore 
bypassing all security. Replace trust entries with ident authentication if your 
system has an ident service available.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/clientaccess/client_auth.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/clientaccess/client_auth.html.md.erb 
b/markdown/clientaccess/client_auth.html.md.erb
new file mode 100644
index 0000000..a13f4e1
--- /dev/null
+++ b/markdown/clientaccess/client_auth.html.md.erb
@@ -0,0 +1,193 @@
+---
+title: Configuring Client Authentication
+---
+
+When a HAWQ system is first initialized, the system contains one predefined 
*superuser* role. This role will have the same name as the operating system 
user who initialized the HAWQ system. This role is referred to as `gpadmin`. By 
default, the system is configured to only allow local connections to the 
database from the `gpadmin` role. To allow any other roles to connect, or to 
allow connections from remote hosts, you configure HAWQ to allow such 
connections.
+
+## <a id="topic2"></a>Allowing Connections to HAWQ 
+
+Client access and authentication is controlled by the standard PostgreSQL 
host-based authentication file, `pg_hba.conf`. In HAWQ, the `pg_hba.conf` file 
of the master instance controls client access and authentication to your HAWQ 
system. HAWQ segments have `pg_hba.conf` files that are configured to allow 
only client connections from the master host and never accept client 
connections. Do not alter the `pg_hba.conf` file on your segments.
+
+See [The pg\_hba.conf 
File](http://www.postgresql.org/docs/9.0/interactive/auth-pg-hba-conf.html) in 
the PostgreSQL documentation for more information.
+
+The general format of the `pg_hba.conf` file is a set of records, one per 
line. HAWQ ignores blank lines and any text after the `#` comment character. A 
record consists of a number of fields that are separated by spaces and/or tabs. 
Fields can contain white space if the field value is quoted. Records cannot be 
continued across lines. Each remote client access record has the following 
format:
+
+```
+host|hostssl|hostnosslÂ Â Â <database>Â Â Â <role>Â Â Â 
<CIDR-address>|<IP-address>,<IP-mask>Â Â Â <authentication-method>
+```
+
+Each UNIX-domain socket access record has the following format:
+
+```
+localÂ Â Â <database>Â Â Â <role>Â Â Â <authentication-method>
+```
+
+The following table describes meaning of each field.
+
+|Field|Description|
+|-----|-----------|
+|local|Matches connection attempts using UNIX-domain sockets. Without a record 
of this type, UNIX-domain socket connections are disallowed.|
+|host|Matches connection attempts made using TCP/IP. Remote TCP/IP connections 
will not be possible unless the server is started with an appropriate value for 
the listen\_addresses server configuration parameter.|
+|hostssl|Matches connection attempts made using TCP/IP, but only when the 
connection is made with SSL encryption. SSL must be enabled at server start 
time by setting the ssl configuration parameter|
+|hostnossl|Matches connection attempts made over TCP/IP that do not use SSL.|
+|\<database\>|Specifies which database names this record matches. The value 
`all` specifies that it matches all databases. Multiple database names can be 
supplied by separating them with commas. A separate file containing database 
names can be specified by preceding the file name with @.|
+|\<role\>|Specifies which database role names this record matches. The value 
`all` specifies that it matches all roles. If the specified role is a group and 
you want all members of that group to be included, precede the role name with a 
+. Multiple role names can be supplied by separating them with commas. A 
separate file containing role names can be specified by preceding the file name 
with @.|
+|\<CIDR-address\>|Specifies the client machine IP address range that this 
record matches. It contains an IP address in standard dotted decimal notation 
and a CIDR mask length. IP addresses can only be specified numerically, not as 
domain or host names. The mask length indicates the number of high-order bits 
of the client IP address that must match. Bits to the right of this must be 
zero in the given IP address. There must not be any white space between the IP 
address, the /, and the CIDR mask length. Typical examples of a CIDR-address 
are 192.0.2.0/32 for a single host, or 192.0.2.2/24 for a small network, or 
192.0.2.3/16 for a larger one. To specify a single host, use a CIDR mask of 32 
for IPv4 or 128 for IPv6. In a network address, do not omit trailing zeroes.|
+|\<IP-address\>, \<IP-mask\>|These fields can be used as an alternative to the 
CIDR-address notation. Instead of specifying the mask length, the actual mask 
is specified in a separate column. For example, 255.255.255.255 represents a 
CIDR mask length of 32. These fields only apply to host, hostssl, and hostnossl 
records.|
+|\<authentication-method\>|Specifies the authentication method to use when 
connecting. HAWQ supports the [authentication 
methods](http://www.postgresql.org/docs/9.0/static/auth-methods.html) supported 
by PostgreSQL 9.0.|
+
+### <a id="topic3"></a>Editing the pg\_hba.conf File 
+
+This example shows how to edit the `pg_hba.conf` file of the master to allow 
remote client access to all databases from all roles using encrypted password 
authentication.
+
+**Note:** For a more secure system, consider removing all connections that use 
trust authentication from your master `pg_hba.conf`. Trust authentication means 
the role is granted access without any authentication, therefore bypassing all 
security. Replace trust entries with ident authentication if your system has an 
ident service available.
+
+#### <a id="ip144328"></a>Editing pg\_hba.conf 
+
+1.  Obtain the master data directory location from the `hawq_master_directory` 
property value in `hawq-site.xml` and use a text editor to open the 
`pg_hba.conf` file in this directory.
+2.  Add a line to the file for each type of connection you want to allow. 
Records are read sequentially, so the order of the records is significant. 
Typically, earlier records will have tight connection match parameters and 
weaker authentication methods, while later records will have looser match 
parameters and stronger authentication methods. For example:
+
+    ```
+    # allow the gpadmin user local access to all databases
+    # using ident authentication
+    local Â Â all Â Â gpadmin Â Â ident Â Â Â Â Â Â Â Â sameuser
+    host Â Â Â all Â Â gpadmin Â Â 127.0.0.1/32 Â ident
+    host Â Â Â all Â Â gpadmin Â Â ::1/128 Â Â Â Â Â Â ident
+    # allow the 'dba' role access to any database from any
+    # host with IP address 192.168.x.x and use md5 encrypted
+    # passwords to authenticate the user
+    # Note that to use SHA-256 encryption, replace *md5* with
+    # password in the line below
+    host Â Â Â all Â Â dba Â Â 192.168.0.0/32 Â md5
+    # allow all roles access to any database from any
+    # host and use ldap to authenticate the user. HAWQ role
+    # names must match the LDAP common name.
+    host Â Â Â all Â Â all Â Â 192.168.0.0/32 Â ldap ldapserver=usldap1
+    ldapport=1389 ldapprefix="cn="
+    ldapsuffix=",ou=People,dc=company,dc=com"
+    ```
+
+3.  Save and close the file.
+4.  Reload the `pg_hba.conf `configuration file for your changes to take 
effect. Include the `-M fast` option if you have active/open database 
connections:
+
+    ``` bash
+    $ hawq stop cluster -u [-M fast]
+    ```
+    
+
+
+## <a id="topic4"></a>Limiting Concurrent Connections 
+
+HAWQ allocates some resources on a per-connection basis, so setting the 
maximum number of connections allowed is recommended.
+
+To limit the number of active concurrent sessions to your HAWQ system, you can 
configure the `max_connections` server configuration parameter on master or the 
`seg_max_connections` server configuration parameter on segments. These 
parameters are *local* parameters, meaning that you must set them in the 
`hawq-site.xml` file of all HAWQ instances.
+
+When you set `max_connections`, you must also set the dependent parameter 
`max_prepared_transactions`. This value must be at least as large as the value 
of `max_connections`, and all HAWQ instances should be set to the same value.
+
+Example `$GPHOME/etc/hawq-site.xml` configuration:
+
+``` xml
+  <property>
+      <name>max_connections</name>
+      <value>500</value>
+  </property>
+  <property>
+      <name>max_prepared_transactions</name>
+      <value>1000</value>
+  </property>
+  <property>
+      <name>seg_max_connections</name>
+      <value>3200</value>
+  </property>
+```
+
+**Note:** Raising the values of these parameters may cause HAWQ to request 
more shared memory. To mitigate this effect, consider decreasing other 
memory-related server configuration parameters such as 
[gp\_cached\_segworkers\_threshold](../reference/guc/parameter_definitions.html#gp_cached_segworkers_threshold).
+
+
+### <a id="ip142411"></a>Setting the number of allowed connections
+
+You will perform different procedures to set connection-related server 
configuration parameters for your HAWQ cluster depending upon whether you 
manage your cluster from the command line or use Ambari. If you use Ambari to 
manage your HAWQ cluster, you must ensure that you update server configuration 
parameters only via the Ambari Web UI. If you manage your HAWQ cluster from the 
command line, you will use the `hawq config` command line utility to set server 
configuration parameters.
+
+If you use Ambari to manage your cluster:
+
+1. Set the `max_connections`, `seg_max_connections`, and 
`max_prepared_transactions` configuration properties via the HAWQ service 
**Configs > Advanced > Custom hawq-site** drop down.
+2. Select **Service Actions > Restart All** to load the updated configuration.
+
+If you manage your cluster from the command line:
+
+1.  Log in to the HAWQ master host as a HAWQ administrator and source the file 
`/usr/local/hawq/greenplum_path.sh`.
+
+    ``` shell
+    $ source /usr/local/hawq/greenplum_path.sh
+    ```
+    
+2.  Use the `hawq config` utility to set the values of the `max_connections`, 
`seg_max_connections`, and `max_prepared_transactions` parameters to values 
appropriate for your deployment. For example: 
+
+    ``` bash
+    $ hawq config -c max_connections -v 100
+    $ hawq config -c seg_max_connections -v 6400
+    $ hawq config -c max_prepared_transactions -v 200
+    ```
+
+    The value of `max_prepared_transactions` must be greater than or equal to 
`max_connections`.
+
+5.  Load the new configuration values by restarting your HAWQ cluster:
+
+    ``` bash
+    $ hawq restart cluster
+    ```
+
+6.  Use the `-s` option to `hawq config` to display server configuration 
parameter values:
+
+    ``` bash
+    $ hawq config -s max_connections
+    $ hawq config -s seg_max_connections
+    ```
+
+
+## <a id="topic5"></a>Encrypting Client/Server Connections 
+
+Enable SSL for client connections to HAWQ to encrypt the data passed over the 
network between the client and the database.
+
+HAWQ has native support for SSL connections between the client and the master 
server. SSL connections prevent third parties from snooping on the packets, and 
also prevent man-in-the-middle attacks. SSL should be used whenever the client 
connection goes through an insecure link, and must be used whenever client 
certificate authentication is used.
+
+Enabling SSL requires that OpenSSL be installed on both the client and the 
master server systems. HAWQ can be started with SSL enabled by setting the 
server configuration parameter `ssl` to `on` in the master `hawq-site.xml`. 
When starting in SSL mode, the server will look for the files `server.key` 
\(server private key\) and `server.crt` \(server certificate\) in the master 
data directory. These files must be set up correctly before an SSL-enabled HAWQ 
system can start.
+
+**Important:** Do not protect the private key with a passphrase. The server 
does not prompt for a passphrase for the private key, and the database startup 
fails with an error if one is required.
+
+A self-signed certificate can be used for testing, but a certificate signed by 
a certificate authority \(CA\) should be used in production, so the client can 
verify the identity of the server. Either a global or local CA can be used. If 
all the clients are local to the organization, a local CA is recommended.
+
+### <a id="topic6"></a>Creating a Self-signed Certificate without a Passphrase 
for Testing Only 
+
+To create a quick self-signed certificate for the server for testing, use the 
following OpenSSL command:
+
+```
+# openssl req -new -text -out server.req
+```
+
+Enter the information requested by the prompts. Be sure to enter the local 
host name as *Common Name*. The challenge password can be left blank.
+
+The program will generate a key that is passphrase protected, and does not 
accept a passphrase that is less than four characters long.
+
+To use this certificate with HAWQ, remove the passphrase with the following 
commands:
+
+```
+# openssl rsa -in privkey.pem -out server.key
+# rm privkey.pem
+```
+
+Enter the old passphrase when prompted to unlock the existing key.
+
+Then, enter the following command to turn the certificate into a self-signed 
certificate and to copy the key and certificate to a location where the server 
will look for them.
+
+``` 
+# openssl req -x509 -in server.req -text -key server.key -out server.crt
+```
+
+Finally, change the permissions on the key with the following command. The 
server will reject the file if the permissions are less restrictive than these.
+
+```
+# chmod og-rwx server.key
+```
+
+For more details on how to create your server private key and certificate, 
refer to the [OpenSSL documentation](https://www.openssl.org/docs/).

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/clientaccess/disable-kerberos.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/clientaccess/disable-kerberos.html.md.erb 
b/markdown/clientaccess/disable-kerberos.html.md.erb
new file mode 100644
index 0000000..5646eec
--- /dev/null
+++ b/markdown/clientaccess/disable-kerberos.html.md.erb
@@ -0,0 +1,85 @@
+---
+title: Disabling Kerberos Security
+---
+
+Follow these steps to disable Kerberos security for HAWQ and PXF for manual 
installations.
+
+**Note:** If you install or manage your cluster using Ambari, then the HAWQ 
Ambari plug-in automatically disables security for HAWQ and PXF when you 
disable security for Hadoop. The following instructions are only necessary for 
manual installations, or when Hadoop security is disabled outside of Ambari.
+
+1.  Disable Kerberos on the Hadoop cluster on which you use HAWQ.
+2.  Disable security for HAWQ:
+    1.  Login to the HAWQ database master server as the `gpadmin` user:
+
+        ``` bash
+        $ ssh hawq_master_fqdn
+        ```
+
+    2.  Run the following command to set up HAWQ environment variables:
+
+        ``` bash
+        $ source /usr/local/hawq/greenplum_path.sh
+        ```
+
+    3.  Start HAWQ if necessary:
+
+        ``` bash
+        $ hawq start -a
+        ```
+
+    4.  Run the following command to disable security:
+
+        ``` bash
+        $ hawq config --masteronly -c enable_secure_filesystem -v âoffâ
+        ```
+
+    5.  Change the permission of the HAWQ HDFS data directory:
+
+        ``` bash
+        $ sudo -u hdfs hdfs dfs -chown -R gpadmin:gpadmin /hawq_data
+        ```
+
+    6.  On the HAWQ master node and on all segment server nodes, edit the 
`/usr/local/hawq/etc/hdfs-client.xml` file to disable kerberos security. 
Comment or remove the following properties in each file:
+
+        ``` xml
+        <!--
+        <property>
+          <name>hadoop.security.authentication</name>
+          <value>kerberos</value>
+        </property>
+
+        <property>
+          <name>dfs.namenode.kerberos.principal</name>
+          <value>nn/[email protected]</value>
+        </property>
+        -->
+        ```
+
+    7.  Restart HAWQ:
+
+        ``` bash
+        $ hawq restart -a -M fast
+        ```
+
+3.  Disable security for PXF:
+    1.  On each PXF node, edit the `/etc/gphd/pxf/conf/pxf-site.xml` to 
comment or remove the properties:
+
+        ``` xml
+        <!--
+        <property>
+            <name>pxf.service.kerberos.keytab</name>
+            <value>/etc/security/phd/keytabs/pxf.service.keytab</value>
+            <description>path to keytab file owned by pxf service
+            with permissions 0400</description>
+        </property>
+
+        <property>
+            <name>pxf.service.kerberos.principal</name>
+            <value>pxf/[email protected]</value>
+            <description>Kerberos principal pxf service should use.
+            _HOST is replaced automatically with hostnames
+            FQDN</description>
+        </property>
+        -->
+        ```
+
+    2.  Restart the PXF service.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/clientaccess/g-connecting-with-psql.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/clientaccess/g-connecting-with-psql.html.md.erb 
b/markdown/clientaccess/g-connecting-with-psql.html.md.erb
new file mode 100644
index 0000000..0fa501c
--- /dev/null
+++ b/markdown/clientaccess/g-connecting-with-psql.html.md.erb
@@ -0,0 +1,35 @@
+---
+title: Connecting with psql
+---
+
+Depending on the default values used or the environment variables you have 
set, the following examples show how to access a database via `psql`:
+
+``` bash
+$ psql -d gpdatabase -h master_host -p 5432 -U `gpadmin`
+```
+
+``` bash
+$ psql gpdatabase
+```
+
+``` bash
+$ psql
+```
+
+If a user-defined database has not yet been created, you can access the system 
by connecting to the `template1` database. For example:
+
+``` bash
+$ psql template1
+```
+
+After connecting to a database, `psql` provides a prompt with the name of the 
database to which `psql` is currently connected, followed by the string `=>` 
\(or `=#` if you are the database superuser\). For example:
+
+``` sql
+gpdatabase=>
+```
+
+At the prompt, you may type in SQL commands. A SQL command must end with a `;` 
\(semicolon\) in order to be sent to the server and executed. For example:
+
+``` sql
+=> SELECT * FROM mytable;
+```

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/clientaccess/g-database-application-interfaces.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/clientaccess/g-database-application-interfaces.html.md.erb 
b/markdown/clientaccess/g-database-application-interfaces.html.md.erb
new file mode 100644
index 0000000..29e22c5
--- /dev/null
+++ b/markdown/clientaccess/g-database-application-interfaces.html.md.erb
@@ -0,0 +1,96 @@
+---
+title: HAWQ Database Drivers and APIs
+---
+
+You may want to connect your existing Business Intelligence (BI) or Analytics 
applications with HAWQ. The database application programming interfaces most 
commonly used with HAWQ are the Postgres and ODBC and JDBC APIs.
+
+HAWQ provides the following connectivity tools for connecting to the database:
+
+  - ODBC driver
+  - JDBC driver
+  - `libpq` - PostgreSQL C API
+
+## <a id="dbdriver"></a>HAWQ Drivers
+
+ODBC and JDBC drivers for HAWQ are available as a separate download from 
Pivotal Network [Pivotal 
Network](https://network.pivotal.io/products/pivotal-hdb).
+
+### <a id="odbc_driver"></a>ODBC Driver
+
+The ODBC API specifies a standard set of C interfaces for accessing database 
management systems.  For additional information on using the ODBC API, refer to 
the [ODBC Programmer's 
Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) 
documentation.
+
+HAWQ supports the DataDirect ODBC Driver. Installation instructions for this 
driver are provided on the Pivotal Network driver download page. Refer to [HAWQ 
ODBC 
Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23)
 for HAWQ-specific ODBC driver information.
+
+#### <a id="odbc_driver_connurl"></a>Connection Data Source
+The information required by the HAWQ ODBC driver to connect to a database is 
typically stored in a named data source. Depending on your platform, you may 
use 
[GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23)
 or [command 
line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23)
 tools to create your data source definition. On Linux, ODBC data sources are 
typically defined in a file named `odbc.ini`. 
+
+Commonly-specified HAWQ ODBC data source connection properties include:
+
+| Property Name                                                    | Value 
Description                                                                     
                                                                                
                                    |
+|-------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Database | Name of the database to which you want to connect. |
+| Driver   | Full path to the ODBC driver library file.                        
                                                                   |
+| HostName              | HAWQ master host name.                               
                                                      |
+| MaxLongVarcharSize      | Maximum size of columns of type long varchar.      
                                                                                
|
+| Password              | Password used to connect to the specified database.  
                                                                                
     |
+| PortNumber              | HAWQ master database port number.                  
                                                                    |
+
+Refer to [Connection Option 
Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23)
 for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC 
driver.
+
+Example HAWQ DataDirect ODBC driver data source definition:
+
+``` shell
+[HAWQ-201]
+Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so
+Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ
+Database=getstartdb
+HostName=hdm1
+PortNumber=5432
+Password=changeme
+MaxLongVarcharSize=8192
+```
+
+The first line, `[HAWQ-201]`, identifies the name of the data source.
+
+ODBC connection properties may also be specified in a connection string 
identifying either a data source name, the name of a file data source, or the 
name of a driver.  A HAWQ ODBC connection string has the following format:
+
+``` shell
+([DSN=<data_source_name>]|[FILEDSN=<filename.dsn>]|[DRIVER=<driver_name>])[;<attribute=<value>[;...]]
+```
+
+For additional information on specifying a HAWQ ODBC connection string, refer 
to [Using a Connection 
String](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FUsing_a_Connection_String_16.html%23).
+
+### <a id="jdbc_driver"></a>JDBC Driver
+The JDBC API specifies a standard set of Java interfaces to SQL-compliant 
databases. For additional information on using the JDBC API, refer to the [Java 
JDBC API](https://docs.oracle.com/javase/8/docs/technotes/guides/jdbc/) 
documentation.
+
+HAWQ supports the DataDirect JDBC Driver. Installation instructions for this 
driver are provided on the Pivotal Network driver download page. Refer to [HAWQ 
JDBC 
Driver](http://media.datadirect.com/download/docs/jdbc/alljdbc/help.html#page/jdbcconnect%2Fgreenplum-driver.html%23)
 for HAWQ-specific JDBC driver information.
+
+#### <a id="jdbc_driver_connurl"></a>Connection URL
+Connection URLs for accessing the HAWQ DataDirect JDBC driver must be in the 
following format:
+
+``` shell
+jdbc:pivotal:greenplum://host:port[;<property>=<value>[;...]]
+```
+
+Commonly-specified HAWQ JDBC connection properties include:
+
+| Property Name                                                    | Value 
Description                                                                     
                                                                                
                                    |
+|-------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| DatabaseName | Name of the database to which you want to connect. |
+| User                         | Username used to connect to the specified 
database.                                                                       
                    |
+| Password              | Password used to connect to the specified database.  
                                                                                
     |
+
+Refer to [Connection 
Properties](http://media.datadirect.com/download/docs/jdbc/alljdbc/help.html#page/jdbcconnect%2FConnection_Properties_10.html%23)
 for a list of JDBC connection properties supported by the HAWQ DataDirect JDBC 
driver.
+
+Example HAWQ JDBC connection string:
+
+``` shell
+jdbc:pivotal:greenplum://hdm1:5432;DatabaseName=getstartdb;User=hdbuser;Password=hdbpass
+```
+
+## <a id="libpq_api"></a>libpq API
+`libpq` is the C API to PostgreSQL/HAWQ. This API provides a set of library 
functions enabling client programs to pass queries to the PostgreSQL backend 
server and to receive the results of those queries.
+
+`libpq` is installed in the `lib/` directory of your HAWQ distribution. 
`libpq-fe.h`, the header file required for developing front-end PostgreSQL 
applications, can be found in the `include/` directory.
+
+For additional information on using the `libpq` API, refer to [libpq - C 
Library](https://www.postgresql.org/docs/8.2/static/libpq.html) in the 
PostgreSQL documentation.
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/clientaccess/g-establishing-a-database-session.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/clientaccess/g-establishing-a-database-session.html.md.erb 
b/markdown/clientaccess/g-establishing-a-database-session.html.md.erb
new file mode 100644
index 0000000..a1c5f1c
--- /dev/null
+++ b/markdown/clientaccess/g-establishing-a-database-session.html.md.erb
@@ -0,0 +1,17 @@
+---
+title: Establishing a Database Session
+---
+
+Users can connect to HAWQ using a PostgreSQL-compatible client program, such 
as `psql`. Users and administrators *always* connect to HAWQ through the 
*master*; the segments cannot accept client connections.
+
+In order to establish a connection to the HAWQ master, you will need to know 
the following connection information and configure your client program 
accordingly.
+
+|Connection Parameter|Description|Environment Variable|
+|--------------------|-----------|--------------------|
+|Application name|The application name that is connecting to the database. The 
default value, held in the `application_name` connection parameter is 
*psql*.|`$PGAPPNAME`|
+|Database name|The name of the database to which you want to connect. For a 
newly initialized system, use the `template1` database to connect for the first 
time.|`$PGDATABASE`|
+|Host name|The host name of the HAWQ master. The default host is the local 
host.|`$PGHOST`|
+|Port|The port number that the HAWQ master instance is running on. The default 
is 5432.|`$PGPORT`|
+|User name|The database user \(role\) name to connect as. This is not 
necessarily the same as your OS user name. Check with your HAWQ administrator 
if you are not sure what you database user name is. Note that every HAWQ system 
has one superuser account that is created automatically at initialization time. 
This account has the same name as the OS name of the user who initialized the 
HAWQ system \(typically `gpadmin`\).|`$PGUSER`|
+
+[Connecting with psql](g-connecting-with-psql.html) provides example commands 
for connecting to HAWQ.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/clientaccess/g-hawq-database-client-applications.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/clientaccess/g-hawq-database-client-applications.html.md.erb 
b/markdown/clientaccess/g-hawq-database-client-applications.html.md.erb
new file mode 100644
index 0000000..a1e8ff3
--- /dev/null
+++ b/markdown/clientaccess/g-hawq-database-client-applications.html.md.erb
@@ -0,0 +1,23 @@
+---
+title: HAWQ Client Applications
+---
+
+HAWQ comes installed with a number of client utility applications located in 
the `$GPHOME/bin` directory of your HAWQ master host installation. The 
following are the most commonly used client utility applications:
+
+|Name|Usage|
+|----|-----|
+|`createdb`|create a new database|
+|`createlang`|define a new procedural language|
+|`createuser`|define a new database role|
+|`dropdb`|remove a database|
+|`droplang`|remove a procedural language|
+|`dropuser`|remove a role|
+|`psql`|PostgreSQL interactive terminal|
+|`reindexdb`|reindex a database|
+|`vacuumdb`|garbage-collect and analyze a database|
+
+When using these client applications, you must connect to a database through 
the HAWQ master instance. You will need to know the name of your target 
database, the host name and port number of the master, and what database user 
name to connect as. This information can be provided on the command-line using 
the options `-d`, `-h`, `-p`, and `-U` respectively. If an argument is found 
that does not belong to any option, it will be interpreted as the database name 
first.
+
+All of these options have default values which will be used if the option is 
not specified. The default host is the local host. The default port number is 
5432. The default user name is your OS system user name, as is the default 
database name. Note that OS user names and HAWQ user names are not necessarily 
the same.
+
+If the default values are not correct, you can set the environment variables 
`PGDATABASE`, `PGHOST`, `PGPORT`, and `PGUSER` to the appropriate values, or 
use a `psql``~/.pgpass` file to contain frequently-used passwords.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/clientaccess/g-supported-client-applications.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/clientaccess/g-supported-client-applications.html.md.erb 
b/markdown/clientaccess/g-supported-client-applications.html.md.erb
new file mode 100644
index 0000000..202f625
--- /dev/null
+++ b/markdown/clientaccess/g-supported-client-applications.html.md.erb
@@ -0,0 +1,8 @@
+---
+title: Supported Client Applications
+---
+
+Users can connect to HAWQ using various client applications:
+
+-   A number of [HAWQ Client 
Applications](g-hawq-database-client-applications.html) are provided with your 
HAWQ installation. The `psql` client application provides an interactive 
command-line interface to HAWQ.
+-   Using standard ODBC/JDBC Application Interfaces, such as ODBC and JDBC, 
users can connect their client applications to HAWQ.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/clientaccess/g-troubleshooting-connection-problems.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/clientaccess/g-troubleshooting-connection-problems.html.md.erb 
b/markdown/clientaccess/g-troubleshooting-connection-problems.html.md.erb
new file mode 100644
index 0000000..0328606
--- /dev/null
+++ b/markdown/clientaccess/g-troubleshooting-connection-problems.html.md.erb
@@ -0,0 +1,13 @@
+---
+title: Troubleshooting Connection Problems
+---
+
+A number of things can prevent a client application from successfully 
connecting to HAWQ. This topic explains some of the common causes of connection 
problems and how to correct them.
+
+|Problem|Solution|
+|-------|--------|
+|No pg\_hba.conf entry for host or user|To enable HAWQ to accept remote client 
connections, you must configure your HAWQ master instance so that connections 
are allowed from the client hosts and database users that will be connecting to 
HAWQ. This is done by adding the appropriate entries to the pg\_hba.conf 
configuration file \(located in the master instance's data directory\). For 
more detailed information, see [Allowing Connections to 
HAWQ](client_auth.html).|
+|HAWQ is not running|If the HAWQ master instance is down, users will not be 
able to connect. You can verify that the HAWQ system is up by running the `hawq 
state` utility on the HAWQ master host.|
+|Network problems<br/><br/>Interconnect timeouts|If users connect to the HAWQ 
master host from a remote client, network problems can prevent a connection 
\(for example, DNS host name resolution problems, the host system is down, and 
so on.\). To ensure that network problems are not the cause, connect to the 
HAWQ master host from the remote client host. For example: `ping hostname`. 
<br/><br/>If the system cannot resolve the host names and IP addresses of the 
hosts involved in HAWQ, queries and connections will fail. For some operations, 
connections to the HAWQ master use `localhost` and others use the actual host 
name, so you must be able to resolve both. If you encounter this error, first 
make sure you can connect to each host in your HAWQ array from the master host 
over the network. In the `/etc/hosts` file of the master and all segments, make 
sure you have the correct host names and IP addresses for all hosts involved in 
the HAWQ array. The `127.0.0.1` IP must resolve to `localho
 st`.|
+|Too many clients already|By default, HAWQ is configured to allow a maximum of 
200 concurrent user connections on the master and 1280 connections on a 
segment. A connection attempt that causes that limit to be exceeded will be 
refused. This limit is controlled by the `max_connections` parameter on the 
master instance and by the `seg_max_connections` parameter on segment 
instances. If you change this setting for the master, you must also make 
appropriate changes at the segments.|
+|Query failure|Reverse DNS must be configured in your HAWQ cluster network. In 
cases where reverse DNS has not been configured, failing queries will generate 
"Failed to reverse DNS lookup for ip \<ip-address\>" warning messages to the 
HAWQ master node log file. |

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/clientaccess/index.md.erb
----------------------------------------------------------------------
diff --git a/markdown/clientaccess/index.md.erb 
b/markdown/clientaccess/index.md.erb
new file mode 100644
index 0000000..c88adeb
--- /dev/null
+++ b/markdown/clientaccess/index.md.erb
@@ -0,0 +1,17 @@
+---
+title: Managing Client Access
+---
+
+This section explains how to configure client connections and authentication 
for HAWQ:
+
+*  <a class="subnav" href="./client_auth.html">Configuring Client 
Authentication</a>
+*  <a class="subnav" href="./ldap.html">Using LDAP Authentication with 
TLS/SSL</a>
+*  <a class="subnav" href="./kerberos.html">Using Kerberos Authentication</a>
+*  <a class="subnav" href="./disable-kerberos.html">Disabling Kerberos 
Security</a>
+*  <a class="subnav" href="./roles_privs.html">Managing Roles and 
Privileges</a>
+*  <a class="subnav" 
href="./g-establishing-a-database-session.html">Establishing a Database 
Session</a>
+*  <a class="subnav" href="./g-supported-client-applications.html">Supported 
Client Applications</a>
+*  <a class="subnav" href="./g-hawq-database-client-applications.html">HAWQ 
Client Applications</a>
+*  <a class="subnav" href="./g-connecting-with-psql.html">Connecting with 
psql</a>
+*  <a class="subnav" href="./g-database-application-interfaces.html">Database 
Application Interfaces</a>
+*  <a class="subnav" 
href="./g-troubleshooting-connection-problems.html">Troubleshooting Connection 
Problems</a>

[40/51] [partial] incubator-hawq-docs git commit: HAWQ-1254 Fix/remove book branching on incubator-hawq-docs

Reply via email to