remove book branching on incubator-hawq-docs

yozie Fri, 06 Jan 2017 09:33:29 -0800

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/clientaccess/kerberos.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/clientaccess/kerberos.html.md.erb 
b/markdown/clientaccess/kerberos.html.md.erb
new file mode 100644
index 0000000..2e7cfe5
--- /dev/null
+++ b/markdown/clientaccess/kerberos.html.md.erb
@@ -0,0 +1,308 @@
+---
+title: Using Kerberos Authentication
+---
+
+**Note:** The following steps for enabling Kerberos *are not required* if you 
install HAWQ using Ambari.
+
+You can control access to HAWQ with a Kerberos authentication server.
+
+HAWQ supports the Generic Security Service Application Program Interface 
\(GSSAPI\) with Kerberos authentication. GSSAPI provides automatic 
authentication \(single sign-on\) for systems that support it. You specify the 
HAWQ users \(roles\) that require Kerberos authentication in the HAWQ 
configuration file `pg_hba.conf`. The login fails if Kerberos authentication is 
not available when a role attempts to log in to HAWQ.
+
+Kerberos provides a secure, encrypted authentication service. It does not 
encrypt data exchanged between the client and database and provides no 
authorization services. To encrypt data exchanged over the network, you must 
use an SSL connection. To manage authorization for access to HAWQ databases and 
objects such as schemas and tables, you use settings in the `pg_hba.conf` file 
and privileges given to HAWQ users and roles within the database. For 
information about managing authorization privileges, see [Managing Roles and 
Privileges](roles_privs.html).
+
+For more information about Kerberos, see 
[http://web.mit.edu/kerberos/](http://web.mit.edu/kerberos/).
+
+## <a id="kerberos_prereq"></a>Requirements for Using Kerberos with HAWQ 
+
+The following items are required for using Kerberos with HAWQ:
+
+-   Kerberos Key Distribution Center \(KDC\) server using the `krb5-server` 
library
+-   Kerberos version 5 `krb5-libs` and `krb5-workstation` packages installed 
on the HAWQ master host
+-   System time on the Kerberos server and HAWQ master host must be 
synchronized. \(Install Linux `ntp` package on both servers.\)
+-   Network connectivity between the Kerberos server and the HAWQ master
+-   Java 1.7.0\_17 or later is required to use Kerberos-authenticated JDBC on 
Red Hat Enterprise Linux 6.x
+-   Java 1.6.0\_21 or later is required to use Kerberos-authenticated JDBC on 
Red Hat Enterprise Linux 4.x or 5.x
+
+## <a id="nr166539"></a>Enabling Kerberos Authentication for HAWQ 
+
+Complete the following tasks to set up Kerberos authentication with HAWQ:
+
+1.  Verify your system satisfies the prequisites for using Kerberos with HAWQ. 
See [Requirements for Using Kerberos with HAWQ](#kerberos_prereq).
+2.  Set up, or identify, a Kerberos Key Distribution Center \(KDC\) server to 
use for authentication. See [Install and Configure a Kerberos KDC 
Server](#task_setup_kdc).
+3.  Create and deploy principals for your HDFS cluster, and ensure that 
kerberos authentication is enabled and functioning for all HDFS services. See 
your Hadoop documentation for additional details.
+4.  In a Kerberos database on the KDC server, set up a Kerberos realm and 
principals on the server. For HAWQ, a principal is a HAWQ role that uses 
Kerberos authentication. In the Kerberos database, a realm groups together 
Kerberos principals that are HAWQ roles.
+5.  Create Kerberos keytab files for HAWQ. To access HAWQ, you create a 
service key known only by Kerberos and HAWQ. On the Kerberos server, the 
service key is stored in the Kerberos database.
+
+    On the HAWQ master, the service key is stored in key tables, which are 
files known as keytabs. The service keys are usually stored in the keytab file 
`/etc/krb5.keytab`. This service key is the equivalent of the service's 
password, and must be kept secure. Data that is meant to be read-only by the 
service is encrypted using this key.
+
+6.  Install the Kerberos client packages and the keytab file on HAWQ master.
+7.  Create a Kerberos ticket for `gpadmin` on the HAWQ master node using the 
keytab file. The ticket contains the Kerberos authentication credentials that 
grant access to the HAWQ.
+
+With Kerberos authentication configured on the HAWQ, you can use Kerberos for 
PSQL and JDBC.
+
+[Set up HAWQ with Kerberos for PSQL](#topic6)
+
+[Set up HAWQ with Kerberos for JDBC](#topic9)
+
+## <a id="task_setup_kdc"></a>Install and Configure a Kerberos KDC Server 
+
+Steps to set up a Kerberos Key Distribution Center \(KDC\) server on a Red Hat 
Enterprise Linux host for use with HAWQ.
+
+Follow these steps to install and configure a Kerberos Key Distribution Center 
\(KDC\) server on a Red Hat Enterprise Linux host.
+
+1.  Install the Kerberos server packages:
+
+    ```
+    sudo yum install krb5-libs krb5-server krb5-workstation
+    ```
+
+2.  Edit the `/etc/krb5.conf` configuration file. The following example shows 
a Kerberos server with a default `KRB.EXAMPLE.COM` realm.
+
+    ```
+    [logging]
+    Â default = FILE:/var/log/krb5libs.log
+    Â kdc = FILE:/var/log/krb5kdc.log
+    Â admin_server = FILE:/var/log/kadmind.log
+
+    [libdefaults]
+    Â default_realm = KRB.EXAMPLE.COM
+    Â dns_lookup_realm = false
+    Â dns_lookup_kdc = false
+    Â ticket_lifetime = 24h
+    Â renew_lifetime = 7d
+    Â forwardable = true
+    Â default_tgs_enctypes = aes128-cts des3-hmac-sha1 des-cbc-crc des-cbc-md5
+    Â default_tkt_enctypes = aes128-cts des3-hmac-sha1 des-cbc-crc des-cbc-md5
+    Â permitted_enctypes = aes128-cts des3-hmac-sha1 des-cbc-crc des-cbc-md5
+
+    [realms]
+    Â KRB.EXAMPLE.COM = {
+    Â Â kdc = kerberos-gpdb:88
+    Â Â admin_server = kerberos-gpdb:749
+    Â Â default_domain = kerberos-gpdb
+     }
+
+    [domain_realm]
+    Â .kerberos-gpdb = KRB.EXAMPLE.COM
+    Â kerberos-gpdb = KRB.EXAMPLE.COM
+
+    [appdefaults]
+    Â pam = {
+    Â Â Â Â debug = false
+    Â Â Â Â ticket_lifetime = 36000
+    Â Â Â Â renew_lifetime = 36000
+    Â Â Â Â forwardable = true
+    Â Â Â Â krb4_convert = false
+       }
+    ```
+
+    The `kdc` and `admin_server` keys in the `[realms]` section specify the 
host \(`kerberos-gpdb`\) and port where the Kerberos server is running. IP 
numbers can be used in place of host names.
+
+    If your Kerberos server manages authentication for other realms, you would 
instead add the `KRB.EXAMPLE.COM` realm in the `[realms]` and `[domain_realm]` 
section of the `kdc.conf` file. See the [Kerberos 
documentation](http://web.mit.edu/kerberos/krb5-latest/doc/) for information 
about the `kdc.conf` file.
+
+3.  To create a Kerberos KDC database, run the `kdb5_util`.
+
+    ```
+    kdb5_util create -s
+    ```
+
+    The `kdb5_util`create option creates the database to store keys for the 
Kerberos realms that are managed by this KDC server. The `-s` option creates a 
stash file. Without the stash file, every time the KDC server starts it 
requests a password.
+
+4.  Add an administrative user to the KDC database with the `kadmin.local` 
utility. Because it does not itself depend on Kerberos authentication, the 
`kadmin.local` utility allows you to add an initial administrative user to the 
local Kerberos server. To add the user `gpadmin` as an administrative user to 
the KDC database, run the following command:
+
+    ```
+    kadmin.local -q "addprinc gpadmin/admin"
+    ```
+
+    Most users do not need administrative access to the Kerberos server. They 
can use `kadmin` to manage their own principals \(for example, to change their 
own password\). For information about `kadmin`, see the [Kerberos 
documentation](http://web.mit.edu/kerberos/krb5-latest/doc/).
+
+5.  If needed, edit the `/var/kerberos/krb5kdc/kadm5.acl` file to grant the 
appropriate permissions to `gpadmin`.
+6.  Start the Kerberos daemons:
+
+    ```
+    /sbin/service krb5kdc start
+    /sbin/service kadmin start
+    ```
+
+7.  To start Kerberos automatically upon restart:
+
+    ```
+    /sbin/chkconfig krb5kdc on
+    /sbin/chkconfig kadmin on
+    ```
+
+
+## <a id="task_m43_vwl_2p"></a>Create HAWQ Roles in the KDC Database 
+
+Add principals to the Kerberos realm for HAWQ.
+
+Start `kadmin.local` in interactive mode, then add two principals to the HAWQ 
Realm.
+
+1.  Start `kadmin.local` in interactive mode:
+
+    ```
+    kadmin.local
+    ```
+
+2.  Add principals:
+
+    ```
+    kadmin.local: addprinc gpadmin/[email protected]
+    kadmin.local: addprinc postgres/[email protected]
+    ```
+
+    The `addprinc` commands prompt for passwords for each principal. The first 
`addprinc` creates a HAWQ user as a principal, `gpadmin/kerberos-gpdb`. The 
second `addprinc` command creates the `postgres` process on the HAWQ master 
host as a principal in the Kerberos KDC. This principal is required when using 
Kerberos authentication with HAWQ.
+
+3.  Create a Kerberos keytab file with `kadmin.local`. The following example 
creates a keytab file `gpdb-kerberos.keytab` in the current directory with 
authentication information for the two principals.
+
+    ```
+    kadmin.local: xst -k gpdb-kerberos.keytab
+        gpadmin/[email protected]
+        postgres/[email protected]
+    ```
+
+    You will copy this file to the HAWQ master host.
+
+4.  Exit `kadmin.local` interactive mode with the `quit` 
command:`kadmin.local: quit`
+
+## <a id="topic6"></a>Install and Configure the Kerberos Client 
+
+Steps to install the Kerberos client on the HAWQ master host.
+
+Install the Kerberos client libraries on the HAWQ master and configure the 
Kerberos client.
+
+1.  Install the Kerberos packages on the HAWQ master.
+
+    ```
+    sudo yum install krb5-libs krb5-workstation
+    ```
+
+2.  Ensure that the `/etc/krb5.conf` file is the same as the one that is on 
the Kerberos server.
+3.  Copy the `gpdb-kerberos.keytab` file that was generated on the Kerberos 
server to the HAWQ master host.
+4.  Remove any existing tickets with the Kerberos utility `kdestroy`. Run the 
utility as root.
+
+    ```
+    sudo kdestroy
+    ```
+
+5.  Use the Kerberos utility `kinit` to request a ticket using the keytab file 
on the HAWQ master for `gpadmin/[email protected]`. The `-t` option 
specifies the keytab file on the HAWQ master.
+
+    ```
+    # kinit -k -t gpdb-kerberos.keytab gpadmin/[email protected]
+    ```
+
+6.  Use the Kerberos utility `klist` to display the contents of the Kerberos 
ticket cache on the HAWQ master. The following is an example:
+
+    ```screen
+    # klist
+    Ticket cache: FILE:/tmp/krb5cc_108061
+    Default principal: gpadmin/[email protected]
+    Valid startingÂ Â Â Â Â ExpiresÂ Â Â Â Â Â Â Â Â Â Â Â Service principal
+    03/28/13 14:50:26Â Â 03/29/13 14:50:26Â Â krbtgt/KRB.EXAMPLE.COM Â Â Â Â 
@KRB.EXAMPLE.COM
+    Â Â Â Â renew until 03/28/13 14:50:26
+    ```
+
+
+### <a id="topic7"></a>Set up HAWQ with Kerberos for PSQL 
+
+Configure a HAWQ to use Kerberos.
+
+After you have set up Kerberos on the HAWQ master, you can configure HAWQ to 
use Kerberos. For information on setting up the HAWQ master, see [Install and 
Configure the Kerberos Client](#topic6).
+
+1.  Create a HAWQ administrator role in the database `template1` for the 
Kerberos principal that is used as the database administrator. The following 
example uses `gpamin/kerberos-gpdb`.
+
+    ``` bash
+    $ psql template1 -c 'CREATE ROLE "gpadmin/kerberos-gpdb" LOGIN SUPERUSER;'
+
+    ```
+
+    The role you create in the database `template1` will be available in any 
new HAWQ that you create.
+
+2.  Modify `hawq-site.xml` to specify the location of the keytab file. For 
example, adding this line to the `hawq-site.xml` specifies the folder 
/home/gpadmin as the location of the keytab filegpdb-kerberos.keytab.
+
+    ``` xml
+      <property>
+          <name>krb_server_keyfile</name>
+          <value>/home/gpadmin/gpdb-kerberos.keytab</value>
+      </property>
+    ```
+
+3.  Modify the HAWQ file `pg_hba.conf` to enable Kerberos support. Then 
restart HAWQ \(`hawq restart -a`\). For example, adding the following line to 
`pg_hba.conf` adds GSSAPI and Kerberos support. The value for `krb_realm` is 
the Kerberos realm that is used for authentication to HAWQ.
+
+    ```
+    host all all 0.0.0.0/0 gss include_realm=0 krb_realm=KRB.EXAMPLE.COM
+    ```
+
+    For information about the `pg_hba.conf` file, see [The pg\_hba.conf 
file](http://www.postgresql.org/docs/9.0/static/auth-pg-hba-conf.html) in the 
Postgres documentation.
+
+4.  Create a ticket using `kinit` and show the tickets in the Kerberos ticket 
cache with `klist`.
+5.  As a test, log in to the database as the `gpadmin` role with the Kerberos 
credentials `gpadmin/kerberos-gpdb`:
+
+    ``` bash
+    $ psql -U "gpadmin/kerberos-gpdb" -h master.test template1
+    ```
+
+    A username map can be defined in the `pg_ident.conf` file and specified in 
the `pg_hba.conf` file to simplify logging into HAWQ. For example, this `psql` 
command logs into the default HAWQ on `mdw.proddb` as the Kerberos principal 
`adminuser/mdw.proddb`:
+
+    ``` bash
+    $ psql -U "adminuser/mdw.proddb" -h mdw.proddb
+    ```
+
+    If the default user is `adminuser`, the `pg_ident.conf` file and the 
`pg_hba.conf` file can be configured so that the `adminuser` can log in to the 
database as the Kerberos principal `adminuser/mdw.proddb` without specifying 
the `-U` option:
+
+    ``` bash
+    $ psql -h mdw.proddb
+    ```
+
+    The `pg_ident.conf` file defines the username map. This file is located in 
the HAWQ master data directory (identified by the `hawq_master_directory` 
property value in `hawq-site.xml`):
+
+    ```
+    # MAPNAME Â Â SYSTEM-USERNAME Â Â Â Â Â Â Â GP-USERNAME
+    mymap Â Â Â Â Â Â /^(.*)mdw\.proddb$Â Â Â Â  adminuser
+    ```
+
+    The map can be specified in the `pg_hba.conf` file as part of the line 
that enables Kerberos support:
+
+    ```
+    host all all 0.0.0.0/0 krb5 include_realm=0 krb_realm=proddb map=mymap
+    ```
+
+    For more information about specifying username maps see [Username 
maps](http://www.postgresql.org/docs/9.0/static/auth-username-maps.html) in the 
Postgres documentation.
+
+6.  If a Kerberos principal is not a HAWQ user, a message similar to the 
following is displayed from the `psql` command line when the user attempts to 
log in to the database:
+
+    ```
+    psql: krb5_sendauth: Bad response
+    ```
+
+    The principal must be added as a HAWQ user.
+
+
+### <a id="topic9"></a>Set up HAWQ with Kerberos for JDBC 
+
+Enable Kerberos-authenticated JDBC access to HAWQ.
+
+You can configure HAWQ to use Kerberos to run user-defined Java functions.
+
+1.  Ensure that Kerberos is installed and configured on the HAWQ master. See 
[Install and Configure the Kerberos Client](#topic6).
+2.  Create the file `.java.login.config` in the folder `/home/gpadmin` and add 
the following text to the file:
+
+    ```
+    pgjdbc {
+    Â Â com.sun.security.auth.module.Krb5LoginModule required
+    Â Â doNotPrompt=true
+    Â Â useTicketCache=true
+    Â Â debug=true
+    Â Â client=true;
+    };
+    ```
+
+3.  Create a Java application that connects to HAWQ using Kerberos 
authentication. The following example database connection URL uses a PostgreSQL 
JDBC driver and specifies parameters for Kerberos authentication:
+
+    ```
+    
jdbc:postgresql://mdw:5432/mytest?kerberosServerName=postgres&jaasApplicationName=pgjdbc&user=gpadmin/kerberos-gpdb
+    ```
+
+    The parameter names and values specified depend on how the Java 
application performs Kerberos authentication.
+
+4.  Test the Kerberos login by running a sample Java application from HAWQ.


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/clientaccess/ldap.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/clientaccess/ldap.html.md.erb 
b/markdown/clientaccess/ldap.html.md.erb
new file mode 100644
index 0000000..27b204f
--- /dev/null
+++ b/markdown/clientaccess/ldap.html.md.erb
@@ -0,0 +1,116 @@
+---
+title: Using LDAP Authentication with TLS/SSL
+---
+
+You can control access to HAWQ with an LDAP server and, optionally, secure the 
connection with encryption by adding parameters to pg\_hba.conf file entries.
+
+HAWQ supports LDAP authentication with the TLS/SSL protocol to encrypt 
communication with an LDAP server:
+
+-   LDAP authentication with STARTTLS and TLS protocol â STARTTLS starts 
with a clear text connection \(no encryption\) and upgrades it to a secure 
connection \(with encryption\).
+-   LDAP authentication with a secure connection and TLS/SSL \(LDAPS\) â 
HAWQ uses the TLS or SSL protocol based on the protocol that is used by the 
LDAP server.
+
+If no protocol is specified, HAWQ communicates with the LDAP server with a 
clear text connection.
+
+To use LDAP authentication, the HAWQ master host must be configured as an LDAP 
client. See your LDAP documentation for information about configuring LDAP 
clients.
+
+## Enabing LDAP Authentication with STARTTLS and TLS
+
+To enable STARTTLS with the TLS protocol, specify the `ldaptls` parameter with 
the value 1. The default port is 389. In this example, the authentication 
method parameters include the `ldaptls` parameter.
+
+```
+ldap ldapserver=ldap.example.com ldaptls=1 ldapprefix="uid=" 
ldapsuffix=",ou=People,dc=example,dc=com"
+```
+
+Specify a non-default port, with the `ldapport` parameter. In this example, 
the authentication method includes the `ldaptls` parameter and the `ldapport` 
parameter to specify the port 550.
+
+```
+ldap ldapserver=ldap.example.com ldaptls=1 ldapport=500 ldapprefix="uid=" 
ldapsuffix=",ou=People,dc=example,dc=com"
+```
+
+## Enabling LDAP Authentication with a Secure Connection and TLS/SSL
+
+To enable a secure connection with TLS/SSL, add `ldaps://` as the prefix to 
the LDAP server name specified in the `ldapserver` parameter. The default port 
is 636.
+
+This example `ldapserver` parameter specifies a secure connection and the 
TLS/SSL protocol for the LDAP server `ldap.example.com`.
+
+```
+ldapserver=ldaps://ldap.example.com
+```
+
+To specify a non-default port, add a colon \(:\) and the port number after the 
LDAP server name. This example `ldapserver` parameter includes the `ldaps://` 
prefix and the non-default port 550.
+
+```
+ldapserver=ldaps://ldap.example.com:550
+```
+
+### Notes
+
+HAWQ logs an error if the following are specified in a pg\_hba.conf file entry:
+
+-   If both the `ldaps://` prefix and the `ldaptls=1` parameter are specified.
+-   If both the `ldaps://` prefix and the `ldapport` parameter are specified.
+
+Enabling encrypted communication for LDAP authentication only encrypts the 
communication between HAWQ and the LDAP server.
+
+## Configuring Authentication with a System-wide OpenLDAP System
+
+If you have a system-wide OpenLDAP system and logins are configured to use 
LDAP with TLS or SSL in the pg_hba.conf file, logins may fail with the 
following message:
+
+```shell
+could not start LDAP TLS session: error code '-11'
+```
+
+To use an existing OpenLDAP system for authentication, HAWQ must be set up to 
use the LDAP server's CA certificate to validate user certificates. Follow 
these steps on both the master and standby hosts to configure HAWQ:
+
+1. Copy the base64-encoded root CA chain file from the Active Directory or 
LDAP server to
+the HAWQ master and standby master hosts. This example uses the directory 
`/etc/pki/tls/certs`.
+
+2. Change to the directory where you copied the CA certificate file and, as 
the root user, generate the hash for OpenLDAP:
+
+    ```
+    # cd /etc/pki/tls/certs
+    # openssl x509 -noout -hash -in <ca-certificate-file>
+    # ln -s <ca-certificate-file> <ca-certificate-file>.0
+    ```
+
+3. Configure an OpenLDAP configuration file for HAWQ with the CA certificate 
directory and certificate file specified.
+
+    As the root user, edit the OpenLDAP configuration file 
`/etc/openldap/ldap.conf`:
+
+    ```
+    SASL_NOCANON on
+    URI ldaps://ldapA.example.priv ldaps://ldapB.example.priv 
ldaps://ldapC.example.priv
+    BASE dc=example,dc=priv
+    TLS_CACERTDIR /etc/pki/tls/certs
+    TLS_CACERT /etc/pki/tls/certs/<ca-certificate-file>
+    ```
+
+    **Note**: For certificate validation to succeed, the hostname in the 
certificate must match a hostname in the URI property. Otherwise, you must also 
add `TLS_REQCERT allow` to the file.
+
+4. As the gpadmin user, edit `/usr/local/hawq/greenplum_path.sh` and add the 
following line.
+
+    ```bash
+    export LDAPCONF=/etc/openldap/ldap.conf
+    ```
+
+## Examples
+
+These are example entries from an pg\_hba.conf file.
+
+This example specifies LDAP authentication with no encryption between HAWQ and 
the LDAP server.
+
+```
+host all plainuser 0.0.0.0/0 ldap ldapserver=ldap.example.com 
ldapprefix="uid=" ldapsuffix=",ou=People,dc=example,dc=com"
+```
+
+This example specifies LDAP authentication with the STARTTLS and TLS protocol 
between HAWQ and the LDAP server.
+
+```
+host all tlsuser 0.0.0.0/0 ldap ldapserver=ldap.example.com ldaptls=1 
ldapprefix="uid=" ldapsuffix=",ou=People,dc=example,dc=com"
+```
+
+This example specifies LDAP authentication with a secure connection and 
TLS/SSL protocol between HAWQ and the LDAP server.
+
+```
+host all ldapsuser 0.0.0.0/0 ldap ldapserver=ldaps://ldap.example.com 
ldapprefix="uid=" ldapsuffix=",ou=People,dc=example,dc=com"
+```

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/clientaccess/roles_privs.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/clientaccess/roles_privs.html.md.erb 
b/markdown/clientaccess/roles_privs.html.md.erb
new file mode 100644
index 0000000..4bdf3ee
--- /dev/null
+++ b/markdown/clientaccess/roles_privs.html.md.erb
@@ -0,0 +1,285 @@
+---
+title: Managing Roles and Privileges
+---
+
+The HAWQ authorization mechanism stores roles and permissions to access 
database objects in the database and is administered using SQL statements or 
command-line utilities.
+
+HAWQ manages database access permissions using *roles*. The concept of roles 
subsumes the concepts of *users* and *groups*. A role can be a database user, a 
group, or both. Roles can own database objects \(for example, tables\) and can 
assign privileges on those objects to other roles to control access to the 
objects. Roles can be members of other roles, thus a member role can inherit 
the object privileges of its parent role.
+
+Every HAWQ system contains a set of database roles \(users and groups\). Those 
roles are separate from the users and groups managed by the operating system on 
which the server runs. However, for convenience you may want to maintain a 
relationship between operating system user names and HAWQ role names, since 
many of the client applications use the current operating system user name as 
the default.
+
+In HAWQ, users log in and connect through the master instance, which then 
verifies their role and access privileges. The master then issues commands to 
the segment instances behind the scenes as the currently logged in role.
+
+Roles are defined at the system level, meaning they are valid for all 
databases in the system.
+
+In order to bootstrap the HAWQ system, a freshly initialized system always 
contains one predefined *superuser* role \(also referred to as the system 
user\). This role will have the same name as the operating system user that 
initialized the HAWQ system. Customarily, this role is named `gpadmin`. In 
order to create more roles you first have to connect as this initial role.
+
+## <a id="topic2"></a>Security Best Practices for Roles and Privileges 
+
+-   **Secure the gpadmin system user.** HAWQ requires a UNIX user id to 
install and initialize the HAWQ system. This system user is referred to as 
`gpadmin` in the HAWQ documentation. This `gpadmin` user is the default 
database superuser in HAWQ, as well as the file system owner of the HAWQ 
installation and its underlying data files. This default administrator account 
is fundamental to the design of HAWQ. The system cannot run without it, and 
there is no way to limit the access of this gpadmin user id. Use roles to 
manage who has access to the database for specific purposes. You should only 
use the `gpadmin` account for system maintenance tasks such as expansion and 
upgrade. Anyone who logs on to a HAWQ host as this user id can read, alter or 
delete any data; specifically system catalog data and database access rights. 
Therefore, it is very important to secure the gpadmin user id and only provide 
access to essential system administrators. Administrators should only log in to 
HAWQ as
  `gpadmin` when performing certain system maintenance tasks \(such as upgrade 
or expansion\). Database users should never log on as `gpadmin`, and ETL or 
production workloads should never run as `gpadmin`.
+-   **Assign a distinct role to each user that logs in.** For logging and 
auditing purposes, each user that is allowed to log in to HAWQ should be given 
their own database role. For applications or web services, consider creating a 
distinct role for each application or service. See [Creating New Roles 
\(Users\)](#topic3).
+-   **Use groups to manage access privileges.** See [Role Membership](#topic5).
+-   **Limit users who have the SUPERUSER role attribute.** Roles that are 
superusers bypass all access privilege checks in HAWQ, as well as resource 
queuing. Only system administrators should be given superuser rights. See 
[Altering Role Attributes](#topic4).
+
+## <a id="topic3"></a>Creating New Roles \(Users\) 
+
+A user-level role is considered to be a database role that can log in to the 
database and initiate a database session. Therefore, when you create a new 
user-level role using the `CREATE ROLE` command, you must specify the `LOGIN` 
privilege. For example:
+
+``` sql
+=# CREATE ROLE jsmith WITH LOGIN;
+```
+
+A database role may have a number of attributes that define what sort of tasks 
that role can perform in the database. You can set these attributes when you 
create the role, or later using the `ALTER ROLE` command. See [Table 
1](#iq139556) for a description of the role attributes you can set.
+
+### <a id="topic4"></a>Altering Role Attributes 
+
+A database role may have a number of attributes that define what sort of tasks 
that role can perform in the database.
+
+<a id="iq139556"></a>
+
+|Attributes|Description|
+|----------|-----------|
+|SUPERUSER &#124; NOSUPERUSER|Determines if the role is a superuser. You must 
yourself be a superuser to create a new superuser. NOSUPERUSER is the default.|
+|CREATEDB &#124; NOCREATEDB|Determines if the role is allowed to create 
databases. NOCREATEDB is the default.|
+|CREATEROLE &#124; NOCREATEROLE|Determines if the role is allowed to create 
and manage other roles. NOCREATEROLE is the default.|
+|INHERIT &#124; NOINHERIT|Determines whether a role inherits the privileges of 
roles it is a member of. A role with the INHERIT attribute can automatically 
use whatever database privileges have been granted to all roles it is directly 
or indirectly a member of. INHERIT is the default.|
+|LOGIN &#124; NOLOGIN|Determines whether a role is allowed to log in. A role 
having the LOGIN attribute can be thought of as a user. Roles without this 
attribute are useful for managing database privileges \(groups\). NOLOGIN is 
the default.|
+|CONNECTION LIMIT *connlimit*|If role can log in, this specifies how many 
concurrent connections the role can make. -1 \(the default\) means no limit.|
+|PASSWORD '*password*'|Sets the role's password. If you do not plan to use 
password authentication you can omit this option. If no password is specified, 
the password will be set to null and password authentication will always fail 
for that user. A null password can optionally be written explicitly as PASSWORD 
NULL.|
+|ENCRYPTED &#124; UNENCRYPTED|Controls whether the password is stored 
encrypted in the system catalogs. The default behavior is determined by the 
configuration parameter `password_encryption` \(currently set to md5, for 
SHA-256 encryption, change this setting to password\). If the presented 
password string is already in encrypted format, then it is stored encrypted 
as-is, regardless of whether ENCRYPTED or UNENCRYPTED is specified \(since the 
system cannot decrypt the specified encrypted password string\). This allows 
reloading of encrypted passwords during dump/restore.|
+|VALID UNTIL '*timestamp*'|Sets a date and time after which the role's 
password is no longer valid. If omitted the password will be valid for all 
time.|
+|RESOURCE QUEUE *queue\_name*|Assigns the role to the named resource queue for 
workload management. Any statement that role issues is then subject to the 
resource queue's limits. Note that the RESOURCE QUEUE attribute is not 
inherited; it must be set on each user-level \(LOGIN\) role.|
+|DENY \{deny\_interval &#124; deny\_point\}|Restricts access during an 
interval, specified by day or day and time. For more information see 
[Time-based Authentication](#topic13).|
+
+You can set these attributes when you create the role, or later using the 
`ALTER ROLE` command. For example:
+
+``` sql
+=# ALTER ROLE jsmith WITH PASSWORD 'passwd123';
+=# ALTER ROLE admin VALID UNTIL 'infinity';
+=# ALTER ROLE jsmith LOGIN;
+=# ALTER ROLE jsmith RESOURCE QUEUE adhoc;
+=# ALTER ROLE jsmith DENY DAY 'Sunday';
+```
+
+## <a id="topic5"></a>Role Membership 
+
+It is frequently convenient to group users together to ease management of 
object privileges: that way, privileges can be granted to, or revoked from, a 
group as a whole. In HAWQ this is done by creating a role that represents the 
group, and then granting membership in the group role to individual user roles.
+
+Use the `CREATE ROLE` SQL command to create a new group role. For example:
+
+``` sql
+=# CREATE ROLE admin CREATEROLE CREATEDB;
+```
+
+Once the group role exists, you can add and remove members \(user roles\) 
using the `GRANT` and `REVOKE` commands. For example:
+
+``` sql
+=# GRANT admin TO john, sally;
+=# REVOKE admin FROM bob;
+```
+
+For managing object privileges, you would then grant the appropriate 
permissions to the group-level role only \(see [Table 2](#iq139925)\). The 
member user roles then inherit the object privileges of the group role. For 
example:
+
+``` sql
+=# GRANT ALL ON TABLE mytable TO admin;
+=# GRANT ALL ON SCHEMA myschema TO admin;
+=# GRANT ALL ON DATABASE mydb TO admin;
+```
+
+The role attributes `LOGIN`, `SUPERUSER`, `CREATEDB`, and `CREATEROLE` are 
never inherited as ordinary privileges on database objects are. User members 
must actually `SET ROLE` to a specific role having one of these attributes in 
order to make use of the attribute. In the above example, we gave `CREATEDB` 
and `CREATEROLE` to the `admin` role. If `sally` is a member of `admin`, she 
could issue the following command to assume the role attributes of the parent 
role:
+
+``` sql
+=> SET ROLE admin;
+```
+
+## <a id="topic6"></a>Managing Object Privileges 
+
+When an object \(table, view, sequence, database, function, language, schema, 
or tablespace\) is created, it is assigned an owner. The owner is normally the 
role that executed the creation statement. For most kinds of objects, the 
initial state is that only the owner \(or a superuser\) can do anything with 
the object. To allow other roles to use it, privileges must be granted. HAWQ 
supports the following privileges for each object type:
+
+<a id="iq139925"></a>
+
+|Object Type|Privileges|
+|-----------|----------|
+|Tables, Views, Sequences|SELECT <br/> INSERT <br/> RULE <br/> ALL|
+|External Tables|SELECT <br/> RULE <br/> ALL|
+|Databases|CONNECT<br/>CREATE<br/>TEMPORARY &#124; TEMP <br/> ALL|
+|Functions|EXECUTE|
+|Procedural Languages|USAGE|
+|Schemas|CREATE <br/> USAGE <br/> ALL|
+|Custom Protocol|SELECT <br/> INSERT <br/> RULE </br> ALL|
+
+**Note:** Privileges must be granted for each object individually. For 
example, granting ALL on a database does not grant full access to the objects 
within that database. It only grants all of the database-level privileges 
\(CONNECT, CREATE, TEMPORARY\) to the database itself.
+
+Use the `GRANT` SQL command to give a specified role privileges on an object. 
For example:
+
+``` sql
+=# GRANT INSERT ON mytable TO jsmith;
+```
+
+To revoke privileges, use the `REVOKE` command. For example:
+
+``` sql
+=# REVOKE ALL PRIVILEGES ON mytable FROM jsmith;
+```
+
+You can also use the `DROP OWNED` and `REASSIGN OWNED` commands for managing 
objects owned by deprecated roles \(Note: only an object's owner or a superuser 
can drop an object or reassign ownership\). For example:
+
+``` sql
+=# REASSIGN OWNED BY sally TO bob;
+=# DROP OWNED BY visitor;
+```
+
+### <a id="topic7"></a>Simulating Row and Column Level Access Control 
+
+Row-level or column-level access is not supported, nor is labeled security. 
Row-level and column-level access can be simulated using views to restrict the 
columns and/or rows that are selected. Row-level labels can be simulated by 
adding an extra column to the table to store sensitivity information, and then 
using views to control row-level access based on this column. Roles can then be 
granted access to the views rather than the base table.
+
+## <a id="topic8"></a>Encrypting Data 
+
+PostgreSQL provides an optional package of encryption/decryption functions 
called `pgcrypto`, which can also be installed and used in HAWQ. The `pgcrypto` 
package is not installed by default with HAWQ. However, you can download a 
`pgcrypto` package from [Pivotal Network](https://network.pivotal.io). 
+
+If you are building HAWQ from source files, then you should enable `pgcrypto` 
support as an option when compiling HAWQ.
+
+The `pgcrypto` functions allow database administrators to store certain 
columns of data in encrypted form. This adds an extra layer of protection for 
sensitive data, as data stored in HAWQ in encrypted form cannot be read by 
users who do not have the encryption key, nor be read directly from the disks.
+
+**Note:** The `pgcrypto` functions run inside the database server, which means 
that all the data and passwords move between `pgcrypto` and the client 
application in clear-text. For optimal security, consider also using SSL 
connections between the client and the HAWQ master server.
+
+## <a id="topic9"></a>Encrypting Passwords 
+
+This technical note outlines how to use a server parameter to implement 
SHA-256 encrypted password storage. Note that in order to use SHA-256 
encryption for storage, the client authentication method must be set to 
`password` rather than the default, `MD5`. \(See [Encrypting Client/Server 
Connections](client_auth.html) for more details.\) This means that the password 
is transmitted in clear text over the network; to avoid this, set up SSL to 
encrypt the client server communication channel.
+
+### <a id="topic10"></a>Enabling SHA-256 Encryption 
+
+You can set your chosen encryption method system-wide or on a per-session 
basis. There are three encryption methods available: `SHA-256`, `SHA-256-FIPS`, 
and `MD5` \(for backward compatibility\). The `SHA-256-FIPS` method requires 
that FIPS compliant libraries are used.
+
+#### <a id="topic11"></a>System-wide 
+
+You will perform different procedures to set the encryption method 
(`password_hash_algorithm` server parameter) system-wide depending upon whether 
you manage your cluster from the command line or use Ambari. If you use Ambari 
to manage your HAWQ cluster, you must ensure that you update encryption method 
configuration parameters only via the Ambari Web UI. If you manage your HAWQ 
cluster from the command line, you will use the `hawq config` command line 
utility to set encryption method configuration parameters.
+
+If you use Ambari to manage your HAWQ cluster:
+
+1. Set the `password_hash_algorithm` configuration property via the HAWQ 
service **Configs > Advanced > Custom hawq-site** drop down. Valid values 
include `SHA-256` \(or `SHA-256-FIPS` to use the FIPS-compliant libraries for 
SHA-256\).
+2. Select **Service Actions > Restart All** to load the updated configuration.
+
+If you manage your HAWQ cluster from the command line:
+
+1.  Log in to the HAWQ master host as a HAWQ administrator and source the file 
`/usr/local/hawq/greenplum_path.sh`.
+
+    ``` shell
+    $ source /usr/local/hawq/greenplum_path.sh
+    ```
+
+1. Use the `hawq config` utility to set `password_hash_algorithm` to `SHA-256` 
\(or `SHA-256-FIPS` to use the FIPS-compliant libraries for SHA-256\):
+
+    ``` shell
+    $ hawq config -c password_hash_algorithm -v 'SHA-256'
+    ```
+        
+    Or:
+        
+    ``` shell
+    $ hawq config -c password_hash_algorithm -v 'SHA-256-FIPS'
+    ```
+
+2. Reload the HAWQ configuration:
+
+    ``` shell
+    $ hawq stop cluster -u
+    ```
+
+3.  Verify the setting:
+
+    ``` bash
+    $ hawq config -s password_hash_algorithm
+    ```
+
+#### <a id="topic12"></a>Individual Session 
+
+To set the `password_hash_algorithm` server parameter for an individual 
database session:
+
+1.  Log in to your HAWQ instance as a superuser.
+2.  Set the `password_hash_algorithm` to `SHA-256` \(or `SHA-256-FIPS` to use 
the FIPS-compliant libraries for SHA-256\):
+
+    ``` sql
+    =# SET password_hash_algorithm = 'SHA-256'
+    SET
+    ```
+
+    or:
+
+    ``` sql
+    =# SET password_hash_algorithm = 'SHA-256-FIPS'
+    SET
+    ```
+
+3.  Verify the setting:
+
+    ``` sql
+    =# SHOW password_hash_algorithm;
+    password_hash_algorithm
+    ```
+
+    You will see:
+
+    ```
+    SHA-256
+    ```
+
+    or:
+
+    ```
+    SHA-256-FIPS
+    ```
+
+    **Example**
+
+    Following is an example of how the new setting works:
+
+4.  Login in as a super user and verify the password hash algorithm setting:
+
+    ``` sql
+    =# SHOW password_hash_algorithm
+    password_hash_algorithm
+    -------------------------------
+    SHA-256-FIPS
+    ```
+
+5.  Create a new role with password that has login privileges.
+
+    ``` sql
+    =# CREATE ROLE testdb WITH PASSWORD 'testdb12345#' LOGIN;
+    ```
+
+6.  Change the client authentication method to allow for storage of SHA-256 
encrypted passwords:
+
+    Open the `pg_hba.conf` file on the master and add the following line:
+
+    ```
+    host all testdb 0.0.0.0/0 password
+    ```
+
+7.  Restart the cluster.
+8.  Login to the database as user just created `testdb`.
+
+    ``` bash
+    $ psql -U testdb
+    ```
+
+9.  Enter the correct password at the prompt.
+10. Verify that the password is stored as a SHA-256 hash.
+
+    Note that password hashes are stored in `pg_authid.rolpasswod`
+
+    1.  Login as the super user.
+    2.  Execute the following:
+
+        ``` sql
+        =# SELECT rolpassword FROM pg_authid WHERE rolname = 'testdb';
+        Rolpassword
+        -----------
+        sha256<64 hexidecimal characters>
+        ```
+
+
+## <a id="topic13"></a>Time-based Authentication 
+
+HAWQ enables the administrator to restrict access to certain times by role. 
Use the `CREATE ROLE` or `ALTER ROLE` commands to specify time-based 
constraints.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/BasicDataOperations.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/BasicDataOperations.html.md.erb 
b/markdown/datamgmt/BasicDataOperations.html.md.erb
new file mode 100644
index 0000000..66328c7
--- /dev/null
+++ b/markdown/datamgmt/BasicDataOperations.html.md.erb
@@ -0,0 +1,64 @@
+---
+title: Basic Data Operations
+---
+
+This topic describes basic data operations that you perform in HAWQ.
+
+## <a id="topic3"></a>Inserting Rows
+
+Use the `INSERT` command to create rows in a table. This command requires the 
table name and a value for each column in the table; you may optionally specify 
the column names in any order. If you do not specify column names, list the 
data values in the order of the columns in the table, separated by commas.
+
+For example, to specify the column names and the values to insert:
+
+``` sql
+INSERT INTO products (name, price, product_no) VALUES ('Cheese', 9.99, 1);
+```
+
+To specify only the values to insert:
+
+``` sql
+INSERT INTO products VALUES (1, 'Cheese', 9.99);
+```
+
+Usually, the data values are literals (constants), but you can also use scalar 
expressions. For example:
+
+``` sql
+INSERT INTO films SELECT * FROM tmp_films WHERE date_prod <
+'2004-05-07';
+```
+
+You can insert multiple rows in a single command. For example:
+
+``` sql
+INSERT INTO products (product_no, name, price) VALUES
+    (1, 'Cheese', 9.99),
+    (2, 'Bread', 1.99),
+    (3, 'Milk', 2.99);
+```
+
+To insert data into a partitioned table, you specify the root partitioned 
table, the table created with the `CREATE TABLE` command. You also can specify 
a leaf child table of the partitioned table in an `INSERT` command. An error is 
returned if the data is not valid for the specified leaf child table. 
Specifying a child table that is not a leaf child table in the `INSERT` command 
is not supported.
+
+To insert large amounts of data, use external tables or the `COPY` command. 
These load mechanisms are more efficient than `INSERT` for inserting large 
quantities of rows. See [Loading and Unloading 
Data](load/g-loading-and-unloading-data.html#topic1) for more information about 
bulk data loading.
+
+## <a id="topic9"></a>Vacuuming the System Catalog Tables
+
+Only HAWQ system catalog tables use multiple version concurrency control. 
Deleted or updated data rows in the catalog tables occupy physical space on 
disk even though new transactions cannot see them. Periodically running the 
`VACUUM` command removes these expired rows. 
+
+The `VACUUM` command also collects table-level statistics such as the number 
of rows and pages.
+
+For example:
+
+``` sql
+VACUUM pg_class;
+```
+
+### <a id="topic10"></a>Configuring the Free Space Map
+
+Expired rows are held in the *free space map*. The free space map must be 
sized large enough to hold all expired rows in your database. If not, a regular 
`VACUUM` command cannot reclaim space occupied by expired rows that overflow 
the free space map.
+
+**Note:** `VACUUM FULL` is not recommended with HAWQ because it is not safe 
for large tables and may take an unacceptably long time to complete. See 
[VACUUM](../reference/sql/VACUUM.html#topic1).
+
+Size the free space map with the following server configuration parameters:
+
+-   `max_fsm_pages`
+-   `max_fsm_relations`

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/ConcurrencyControl.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/ConcurrencyControl.html.md.erb 
b/markdown/datamgmt/ConcurrencyControl.html.md.erb
new file mode 100644
index 0000000..2ced135
--- /dev/null
+++ b/markdown/datamgmt/ConcurrencyControl.html.md.erb
@@ -0,0 +1,24 @@
+---
+title: Concurrency Control
+---
+
+This topic discusses the mechanisms used in HAWQ to provide concurrency 
control.
+
+HAWQ and PostgreSQL do not use locks for concurrency control. They maintain 
data consistency using a multiversion model, Multiversion Concurrency Control 
(MVCC). MVCC achieves transaction isolation for each database session, and each 
query transaction sees a snapshot of data. This ensures the transaction sees 
consistent data that is not affected by other concurrent transactions.
+
+Because MVCC does not use explicit locks for concurrency control, lock 
contention is minimized and HAWQ maintains reasonable performance in multiuser 
environments. Locks acquired for querying (reading) data do not conflict with 
locks acquired for writing data.
+
+HAWQ provides multiple lock modes to control concurrent access to data in 
tables. Most HAWQ SQL commands automatically acquire the appropriate locks to 
ensure that referenced tables are not dropped or modified in incompatible ways 
while a command executes. For applications that cannot adapt easily to MVCC 
behavior, you can use the `LOCK` command to acquire explicit locks. However, 
proper use of MVCC generally provides better performance.
+
+<caption><span class="tablecap">Table 1. Lock Modes in HAWQ</span></caption>
+
+<a id="topic_f5l_qnh_kr__ix140861"></a>
+
+| Lock Mode              | Associated SQL Commands                             
                                | Conflicts With                                
                                                                          |
+|------------------------|-------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|
+| ACCESS SHARE           | `SELECT`                                            
                                | ACCESS EXCLUSIVE                              
                                                                          |
+| ROW EXCLUSIVE          | `INSERT`, `COPY`                                    
                                | SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS 
EXCLUSIVE                                                                 |
+| SHARE UPDATE EXCLUSIVE | `VACUUM` (without `FULL`), `ANALYZE`                
                                | SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW 
EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE                                         |
+| SHARE                  | `CREATE INDEX`                                      
                                | ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE 
ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE                                 |
+| SHARE ROW EXCLUSIVE    | Â                                                   
                                 | ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, 
SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE                         
 |
+| ACCESS EXCLUSIVE       | `ALTER TABLE`, `DROP TABLE`, `TRUNCATE`, `REINDEX`, 
`CLUSTER`, `VACUUM FULL`        | ACCESS SHARE, ROW SHARE, ROW EXCLUSIVE, SHARE 
UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE |

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/HAWQInputFormatforMapReduce.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/HAWQInputFormatforMapReduce.html.md.erb 
b/markdown/datamgmt/HAWQInputFormatforMapReduce.html.md.erb
new file mode 100644
index 0000000..a6fcca2
--- /dev/null
+++ b/markdown/datamgmt/HAWQInputFormatforMapReduce.html.md.erb
@@ -0,0 +1,304 @@
+---
+title: HAWQ InputFormat for MapReduce
+---
+
+MapReduce is a programming model developed by Google for processing and 
generating large data sets on an array of commodity servers. You can use the 
HAWQ InputFormat class to enable MapReduce jobs to access HAWQ data stored in 
HDFS.
+
+To use HAWQ InputFormat, you need only to provide the URL of the database to 
connect to, along with the table name you want to access. HAWQ InputFormat 
fetches only the metadata of the database and table of interest, which is much 
less data than the table data itself. After getting the metadata, HAWQ 
InputFormat determines where and how the table data is stored in HDFS. It reads 
and parses those HDFS files and processes the parsed table tuples directly 
inside a Map task.
+
+This chapter describes the document format and schema for defining HAWQ 
MapReduce jobs.
+
+## <a id="supporteddatatypes"></a>Supported Data Types
+
+HAWQ InputFormat supports the following data types:
+
+| SQL/HAWQ                | JDBC/JAVA                                        | 
setXXX        | getXXX        |
+|-------------------------|--------------------------------------------------|---------------|---------------|
+| DECIMAL/NUMERIC         | java.math.BigDecimal                             | 
setBigDecimal | getBigDecimal |
+| FLOAT8/DOUBLE PRECISION | double                                           | 
setDouble     | getDouble     |
+| INT8/BIGINT             | long                                             | 
setLong       | getLong       |
+| INTEGER/INT4/INT        | int                                              | 
setInt        | getInt        |
+| FLOAT4/REAL             | float                                            | 
setFloat      | getFloat      |
+| SMALLINT/INT2           | short                                            | 
setShort      | getShort      |
+| BOOL/BOOLEAN            | boolean                                          | 
setBoolean    | getBoolean    |
+| VARCHAR/CHAR/TEXT       | String                                           | 
setString     | getString     |
+| DATE                    | java.sql.Date                                    | 
setDate       | getDate       |
+| TIME/TIMETZ             | java.sql.Time                                    | 
setTime       | getTime       |
+| TIMESTAMP/TIMSTAMPTZ    | java.sql.Timestamp                               | 
setTimestamp  | getTimestamp  |
+| ARRAY                   | java.sq.Array                                    | 
setArray      | getArray      |
+| BIT/VARBIT              | com.pivotal.hawq.mapreduce.datatype.             | 
setVarbit     | getVarbit     |
+| BYTEA                   | byte\[\]                                         | 
setByte       | getByte       |
+| INTERVAL                | com.pivotal.hawq.mapreduce.datatype.HAWQInterval | 
setInterval   | getInterval   |
+| POINT                   | com.pivotal.hawq.mapreduce.datatype.HAWQPoint    | 
setPoint      | getPoint      |
+| LSEG                    | com.pivotal.hawq.mapreduce.datatype.HAWQLseg     | 
setLseg       | getLseg       |
+| BOX                     | com.pivotal.hawq.mapreduce.datatype.HAWQBox      | 
setBox        | getBox        |
+| CIRCLE                  | com.pivotal.hawq.mapreduce.datatype.HAWQCircle   | 
setVircle     | getCircle     |
+| PATH                    | com.pivotal.hawq.mapreduce.datatype.HAWQPath     | 
setPath       | getPath       |
+| POLYGON                 | com.pivotal.hawq.mapreduce.datatype.HAWQPolygon  | 
setPolygon    | getPolygon    |
+| MACADDR                 | com.pivotal.hawq.mapreduce.datatype.HAWQMacaddr  | 
setMacaddr    | getMacaddr    |
+| INET                    | com.pivotal.hawq.mapreduce.datatype.HAWQInet     | 
setInet       | getInet       |
+| CIDR                    | com.pivotal.hawq.mapreduce.datatype.HAWQCIDR     | 
setCIDR       | getCIDR       |
+
+## <a id="hawqinputformatexample"></a>HAWQ InputFormat Example
+
+The following example shows how you can use the `HAWQInputFormat` class to 
access HAWQ table data from MapReduce jobs.
+
+``` java
+package com.mycompany.app;
+import com.pivotal.hawq.mapreduce.HAWQException;
+import com.pivotal.hawq.mapreduce.HAWQInputFormat;
+import com.pivotal.hawq.mapreduce.HAWQRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.conf.Configured;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapreduce.Job;
+import org.apache.hadoop.mapreduce.Mapper;
+import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
+import org.apache.hadoop.util.Tool;
+import org.apache.hadoop.util.ToolRunner;
+import org.apache.hadoop.io.IntWritable;
+
+import java.io.IOException;
+public class HAWQInputFormatDemoDriver extends Configured
+implements Tool {
+
+    // CREATE TABLE employees (
+    // id INTEGER NOT NULL, name VARCHAR(32) NOT NULL);
+    public static class DemoMapper extends
+        Mapper<Void, HAWQRecord, IntWritable, Text> {
+       int id = 0;
+       String name = null;
+       public void map(Void key, HAWQRecord value, Context context)
+        throws IOException, InterruptedException {
+        try {
+        id = value.getInt(1);
+        name = value.getString(2);
+        } catch (HAWQException hawqE) {
+        throw new IOException(hawqE.getMessage());
+        }
+        context.write(new IntWritable(id), new Text(name));
+       }
+    }
+    private static int printUsage() {
+       System.out.println("HAWQInputFormatDemoDriver
+       <database_url> <table_name> <output_path> [username]
+       [password]");
+       ToolRunner.printGenericCommandUsage(System.out);
+       return 2;
+    }
+ 
+    public int run(String[] args) throws Exception {
+       if (args.length < 3) {
+        return printUsage();
+       }
+       Job job = Job.getInstance(getConf());
+       job.setJobName("hawq-inputformat-demo");
+       job.setJarByClass(HAWQInputFormatDemoDriver.class);
+       job.setMapperClass(DemoMapper.class);
+       job.setMapOutputValueClass(Text.class);
+       job.setOutputValueClass(Text.class);
+       String db_url = args[0];
+       String table_name = args[1];
+       String output_path = args[2];
+       String user_name = null;
+       if (args.length > 3) {
+         user_name = args[3];
+       }
+       String password = null;
+       if (args.length > 4) {
+         password = args[4];
+       }
+       job.setInputFormatClass(HAWQInputFormat.class);
+       HAWQInputFormat.setInput(job.getConfiguration(), db_url,
+       user_name, password, table_name);
+       FileOutputFormat.setOutputPath(job, new
+       Path(output_path));
+       job.setNumReduceTasks(0);
+       int res = job.waitForCompletion(true) ? 0 : 1;
+       return res;
+    }
+    
+    public static void main(String[] args) throws Exception {
+       int res = ToolRunner.run(new Configuration(),
+         new HAWQInputFormatDemoDriver(), args);
+       System.exit(res);
+    }
+}
+```
+
+**To compile and run the example:**
+
+1.  Create a work directory:
+
+    ``` shell
+    $ mkdir mrwork
+    $ cd mrwork
+    ```
+ 
+2.  Copy and paste the Java code above into a `.java` file.
+
+    ``` shell
+    $ mkdir -p com/mycompany/app
+    $ cd com/mycompany/app
+    $ vi HAWQInputFormatDemoDriver.java
+    ```
+
+3.  Note the following dependencies required for compilation:
+    1.  `HAWQInputFormat` jars (located in the 
`$GPHOME/lib/postgresql/hawq-mr-io` directory):
+        -   `hawq-mapreduce-common.jar`
+        -   `hawq-mapreduce-ao.jar`
+        -   `hawq-mapreduce-parquet.jar`
+        -   `hawq-mapreduce-tool.jar`
+
+    2.  Required 3rd party jars (located in the 
`$GPHOME/lib/postgresql/hawq-mr-io/lib` directory):
+        -   `parquet-common-1.1.0.jar`
+        -   `parquet-format-1.1.0.jar`
+        -   `parquet-hadoop-1.1.0.jar`
+        -   `postgresql-n.n-n-jdbc4.jar`
+        -   `snakeyaml-n.n.jar`
+
+    3.  Hadoop Mapreduce related jars (located inÂ the install directory of 
your Hadoop distribution).
+
+4.  Compile the Java program.  You may choose to use a different compilation 
command:
+
+    ``` shell
+    javac -classpath 
/usr/hdp/2.4.2.0-258/hadoop-mapreduce/*:/usr/local/hawq/lib/postgresql/hawq-mr-io/*:/usr/local/hawq/lib/postgresql/hawq-mr-io/lib/*:/usr/hdp/current/hadoop-client/*
 HAWQInputFormatDemoDriver.java
+    ```
+   
+5.  Build the JAR file.
+
+    ``` shell
+    $ cd ../../..
+    $ jar cf my-app.jar com
+    $ cp my-app.jar /tmp
+    ```
+    
+6.  Check that you have installed HAWQ and HDFS and your HAWQ cluster is 
running.
+
+7.  Create sample table:
+    1.  Log in to HAWQ:
+
+        ``` shell
+        Â $ psql -d postgresÂ 
+        ```
+
+    2.  Create the table:
+
+        ``` sql
+        CREATE TABLE employees (
+        id INTEGER NOT NULL,
+        name TEXT NOT NULL);
+        ```
+
+        Or a Parquet table:
+
+        ``` sql
+        CREATE TABLE employees ( id INTEGER NOT NULL, name TEXT NOT NULL) WITH 
(APPENDONLY=true, ORIENTATION=parquet);
+        ```
+
+    3.  Insert one tuple:
+
+        ``` sql
+        INSERT INTO employees VALUES (1, 'Paul');
+        \q
+        ```
+8.  Ensure the system `pg_hba.conf` configuration file is set up to allow 
`gpadmin` access to the `postgres` database.
+
+8.  Use the following shell script snippet showing how to run the Mapreduce 
job:
+
+    ``` shell
+    #!/bin/bash
+    
+    # set up environment variables
+    HAWQMRLIB=/usr/local/hawq/lib/postgresql/hawq-mr-io
+    export 
HADOOP_CLASSPATH=$HAWQMRLIB/hawq-mapreduce-ao.jar:$HAWQMRLIB/hawq-mapreduce-common.jar:$HAWQMRLIB/hawq-mapreduce-tool.jar:$HAWQMRLIB/hawq-mapreduce-parquet.jar:$HAWQMRLIB/lib/postgresql-9.2-1003-jdbc4.jar:$HAWQMRLIB/lib/snakeyaml-1.12.jar:$HAWQMRLIB/lib/parquet-hadoop-1.1.0.jar:$HAWQMRLIB/lib/parquet-common-1.1.0.jar:$HAWQMRLIB/lib/parquet-format-1.0.0.jar
+    export 
LIBJARS=$HAWQMRLIB/hawq-mapreduce-ao.jar,$HAWQMRLIB/hawq-mapreduce-common.jar,$HAWQMRLIB/hawq-mapreduce-tool.jar,$HAWQMRLIB/lib/postgresql-9.2-1003-jdbc4.jar,$HAWQMRLIB/lib/snakeyaml-1.12.jar,$HAWQMRLIB/hawq-mapreduce-parquet.jar,$HAWQMRLIB/lib/parquet-hadoop-1.1.0.jar,$HAWQMRLIB/lib/parquet-common-1.1.0.jar,$HAWQMRLIB/lib/parquet-format-1.0.0.jar
+    
+    # usage:  hadoop jar JARFILE CLASSNAME -libjars JARS <database_url> 
<table_name> <output_path_on_HDFS>
+    #   - writing output to HDFS, so run as hdfs user
+    #   - if not using the default postgres port, replace 5432 with port 
number for your HAWQ cluster
+    HADOOP_USER_NAME=hdfs hadoop jar /tmp/my-app.jar 
com.mycompany.app.HAWQInputFormatDemoDriver -libjars $LIBJARS 
localhost:5432/postgres employees /tmp/employees
+    ```
+    
+    The MapReduce job output is written to the `/tmp/employees` directory on 
the HDFS file system.
+
+9.  Use the following command to check the result of the Mapreduce job:
+
+    ``` shell
+    $ sudo -u hdfs hdfs dfs -ls /tmp/employees
+    $ sudo -u hdfs hdfs dfs -cat /tmp/employees/*
+    ```
+
+    The output will appear as follows:
+
+    ``` pre
+    1 Paul
+    ```
+        
+10.  If you choose to run the program again, delete the output file and 
directory:
+    
+    ``` shell
+    $ sudo -u hdfs hdfs dfs -rm /tmp/employees/*
+    $ sudo -u hdfs hdfs dfs -rmdir /tmp/employees
+    ```
+
+## <a id="accessinghawqdata"></a>Accessing HAWQ Data
+
+You can access HAWQ data using the `HAWQInputFormat.setInput()` interface.  
You will use a different API signature depending on whether HAWQ is running or 
not.
+
+-   When HAWQ is running, use `HAWQInputFormat.setInput(Configuration conf, 
String db_url, String username, String password, String tableName)`.
+-   When HAWQ is not running, first extract the table metadata to a file with 
the Metadata Export Tool and then use `HAWQInputFormat.setInput(Configuration 
conf, String pathStr)`.
+
+### <a id="hawqinputformatsetinput"></a>HAWQ is Running
+
+``` java
+  /**
+    * Initializes the map-part of the job with the appropriate input settings
+    * through connecting to Database.
+    *
+    * @param conf
+    * The map-reduce job configuration
+    * @param db_url
+    * The database URL to connect to
+    * @param username
+    * The username for setting up a connection to the database
+    * @param password
+    * The password for setting up a connection to the database
+    * @param tableName
+    * The name of the table to access to
+    * @throws Exception
+    */
+public static void setInput(Configuration conf, String db_url,
+    String username, String password, String tableName)
+throws Exception;
+```
+
+### <a id="metadataexporttool"></a>HAWQ is not Running
+
+Use the metadata export tool, `hawq extract`, to export the metadata of the 
target table into a local YAML file:
+
+``` shell
+$ hawq extract [-h hostname] [-p port] [-U username] [-d database] [-o 
output_file] [-W] <tablename>
+```
+
+Using the extracted metadata, access HAWQ data through the following 
interface.  Pass the complete path to the `.yaml` file in the `pathStr` 
argument.
+
+``` java
+ /**
+   * Initializes the map-part of the job with the appropriate input settings 
through reading metadata file stored in local filesystem.
+   *
+   * To get metadata file, please use hawq extract first
+   *
+   * @param conf
+   * The map-reduce job configuration
+   * @param pathStr
+   * The metadata file path in local filesystem. e.g.
+   * /home/gpadmin/metadata/postgres_test
+   * @throws Exception
+   */
+public static void setInput(Configuration conf, String pathStr)
+   throws Exception;
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/Transactions.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/Transactions.html.md.erb 
b/markdown/datamgmt/Transactions.html.md.erb
new file mode 100644
index 0000000..dfc9a5e
--- /dev/null
+++ b/markdown/datamgmt/Transactions.html.md.erb
@@ -0,0 +1,54 @@
+---
+title: Working with Transactions
+---
+
+This topic describes transaction support in HAWQ.
+
+Transactions allow you to bundle multiple SQL statements in one all-or-nothing 
operation.
+
+The following are the HAWQ SQL transaction commands:
+
+-   `BEGIN` or `START TRANSACTION `starts a transaction block.
+-   `END` or `COMMIT` commits the results of a transaction.
+-   `ROLLBACK` abandons a transaction without making any changes.
+-   `SAVEPOINT` marks a place in a transaction and enables partial rollback. 
You can roll back commands executed after a savepoint while maintaining 
commands executed before the savepoint.
+-   `ROLLBACK TO SAVEPOINT `rolls back a transaction to a savepoint.
+-   `RELEASE SAVEPOINT `destroys a savepoint within a transaction.
+
+## <a id="topic8"></a>Transaction Isolation Levels
+
+HAWQ accepts the standard SQL transaction levels as follows:
+
+-   *read uncommitted* and *read committed* behave like the standard *read 
committed*
+-   serializable and repeatable read behave like the standard serializable
+
+The following information describes the behavior of the HAWQ transaction 
levels:
+
+-   **read committed/read uncommitted** â Provides fast, simple, partial 
transaction isolation. With read committed and read uncommitted transaction 
isolation, `SELECT` transactions operate on a snapshot of the database taken 
when the query started.
+
+A `SELECT` query:
+
+-   Sees data committed before the query starts.
+-   Sees updates executed within the transaction.
+-   Does not see uncommitted data outside the transaction.
+-   Can possibly see changes that concurrent transactions made if the 
concurrent transaction is committed after the initial read in its own 
transaction.
+
+Successive `SELECT` queries in the same transaction can see different data if 
other concurrent transactions commit changes before the queries start.
+
+Read committed or read uncommitted transaction isolation may be inadequate for 
applications that perform complex queries and require a consistent view of the 
database.
+
+-   **serializable/repeatable read** â Provides strict transaction isolation 
in which transactions execute as if they run one after another rather than 
concurrently. Applications on the serializable or repeatable read level must be 
designed to retry transactions in case of serialization failures.
+
+A `SELECT` query:
+
+-   Sees a snapshot of the data as of the start of the transaction (not as of 
the start of the current query within the transaction).
+-   Sees only data committed before the query starts.
+-   Sees updates executed within the transaction.
+-   Does not see uncommitted data outside the transaction.
+-   Does not see changes that concurrent transactions made.
+
+    Successive `SELECT` commands within a single transaction always see the 
same data.
+
+The default transaction isolation level in HAWQ is *read committed*. To change 
the isolation level for a transaction, declare the isolation level when you 
`BEGIN` the transaction or use the `SET TRANSACTION` command after the 
transaction starts.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/about_statistics.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/about_statistics.html.md.erb 
b/markdown/datamgmt/about_statistics.html.md.erb
new file mode 100644
index 0000000..5e2184a
--- /dev/null
+++ b/markdown/datamgmt/about_statistics.html.md.erb
@@ -0,0 +1,209 @@
+---
+title: About Database Statistics
+---
+
+## <a id="overview"></a>Overview
+
+Statistics are metadata that describe the data stored in the database. The 
query optimizer needs up-to-date statistics to choose the best execution plan 
for a query. For example, if a query joins two tables and one of them must be 
broadcast to all segments, the optimizer can choose the smaller of the two 
tables to minimize network traffic.
+
+The statistics used by the optimizer are calculated and saved in the system 
catalog by the `ANALYZE` command. There are three ways to initiate an analyze 
operation:
+
+-   You can run the `ANALYZE` command directly.
+-   You can run the `analyzedb` management utility outside of the database, at 
the command line.
+-   An automatic analyze operation can be triggered when DML operations are 
performed on tables that have no statistics or when a DML operation modifies a 
number of rows greater than a specified threshold.
+
+These methods are described in the following sections.
+
+Calculating statistics consumes time and resources, so HAWQ produces estimates 
by calculating statistics on samples of large tables. In most cases, the 
default settings provide the information needed to generate correct execution 
plans for queries. If the statistics produced are not producing optimal query 
execution plans, the administrator can tune configuration parameters to produce 
more accurate stastistics by increasing the sample size or the granularity of 
statistics saved in the system catalog. Producing more accurate statistics has 
CPU and storage costs and may not produce better plans, so it is important to 
view explain plans and test query performance to ensure that the additional 
statistics-related costs result in better query performance.
+
+## <a id="topic_oq3_qxj_3s"></a>System Statistics
+
+### <a id="tablesize"></a>Table Size
+
+The query planner seeks to minimize the disk I/O and network traffic required 
to execute a query, using estimates of the number of rows that must be 
processed and the number of disk pages the query must access. The data from 
which these estimates are derived are the `pg_class` system table columns 
`reltuples` and `relpages`, which contain the number of rows and pages at the 
time a `VACUUM` or `ANALYZE` command was last run. As rows are added, the 
numbers become less accurate. However, an accurate count of disk pages is 
always available from the operating system, so as long as the ratio of 
`reltuples` to `relpages` does not change significantly, the optimizer can 
produce an estimate of the number of rows that is sufficiently accurate to 
choose the correct query execution plan.
+
+In append-optimized tables, the number of tuples is kept up-to-date in the 
system catalogs, so the `reltuples` statistic is not an estimate. Non-visible 
tuples in the table are subtracted from the total. The `relpages` value is 
estimated from the append-optimized block sizes.
+
+When the `reltuples` column differs significantly from the row count returned 
by `SELECT COUNT(*)`, an analyze should be performed to update the statistics.
+
+### <a id="views"></a>The pg\_statistic System Table and pg\_stats View
+
+The `pg_statistic` system table holds the results of the last `ANALYZE` 
operation on each database table. There is a row for each column of every 
table. It has the following columns:
+
+starelid  
+The object ID of the table or index the column belongs to.
+
+staatnum  
+The number of the described column, beginning with 1.
+
+stanullfrac  
+The fraction of the column's entries that are null.
+
+stawidth  
+The average stored width, in bytes, of non-null entries.
+
+stadistinct  
+The number of distinct nonnull data values in the column.
+
+stakind*N*  
+A code number indicating the kind of statistics stored in the *N*th slot of 
the `pg_statistic` row.
+
+staop*N*  
+An operator used to derive the statistics stored in the *N*th slot.
+
+stanumbers*N*  
+Numerical statistics of the appropriate kind for the *N*th slot, or NULL if 
the slot kind does not involve numerical values.
+
+stavalues*N*  
+Column data values of the appropriate kind for the *N*th slot, or NULL if the 
slot kind does not store any data values.
+
+The statistics collected for a column vary for different data types, so the 
`pg_statistic` table stores statistics that are appropriate for the data type 
in four *slots*, consisting of four columns per slot. For example, the first 
slot, which normally contains the most common values for a column, consists of 
the columns `stakind1`, `staop1`, `stanumbers1`, and `stavalues1`. Also see 
[pg\_statistic](../reference/catalog/pg_statistic.html#topic1).
+
+The `stakindN` columns each contain a numeric code to describe the type of 
statistics stored in their slot. The `stakind` code numbers from 1 to 99 are 
reserved for core PostgreSQL data types. HAWQ uses code numbers 1, 2, and 3. A 
value of 0 means the slot is unused. The following table describes the kinds of 
statistics stored for the three codes.
+
+<a id="topic_oq3_qxj_3s__table_upf_1yc_nt"></a>
+
+<table>
+<caption><span class="tablecap">Table 1. Contents of pg_statistic 
&quot;slots&quot;</span></caption>
+<colgroup>
+<col width="50%" />
+<col width="50%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th>stakind Code</th>
+<th>Description</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>1</td>
+<td><em>Most CommonValues (MCV) Slot</em>
+<ul>
+<li><code class="ph codeph">staop</code> contains the object ID of the 
&quot;=&quot; operator, used to decide whether values are the same or not.</li>
+<li><code class="ph codeph">stavalues</code> contains an array of the 
<em>K</em> most common non-null values appearing in the column.</li>
+<li><code class="ph codeph">stanumbers</code> contains the frequencies 
(fractions of total row count) of the values in the <code class="ph 
codeph">stavalues</code> array.</li>
+</ul>
+The values are ordered in decreasing frequency. Since the arrays are 
variable-size, <em>K</em> can be chosen by the statistics collector. Values 
must occur more than once to be added to the <code class="ph 
codeph">stavalues</code> array; a unique column has no MCV slot.</td>
+</tr>
+<tr class="even">
+<td>2</td>
+<td><em>Histogram Slot</em> â describes the distribution of scalar data.
+<ul>
+<li><code class="ph codeph">staop</code> is the object ID of the 
&quot;&lt;&quot; operator, which describes the sort ordering.</li>
+<li><code class="ph codeph">stavalues</code> contains <em>M</em> (where 
<em>M</em>&gt;=2) non-null values that divide the non-null column data values 
into <em>M</em>-1 bins of approximately equal population. The first <code 
class="ph codeph">stavalues</code> item is the minimum value and the last is 
the maximum value.</li>
+<li><code class="ph codeph">stanumbers</code> is not used and should be 
null.</li>
+</ul>
+<p>If a Most Common Values slot is also provided, then the histogram describes 
the data distribution after removing the values listed in the MCV array. (It is 
a <em>compressed histogram</em> in the technical parlance). This allows a more 
accurate representation of the distribution of a column with some very common 
values. In a column with only a few distinct values, it is possible that the 
MCV list describes the entire data population; in this case the histogram 
reduces to empty and should be omitted.</p></td>
+</tr>
+<tr class="odd">
+<td>3</td>
+<td><em>Correlation Slot</em> â describes the correlation between the 
physical order of table tuples and the ordering of data values of this column.
+<ul>
+<li><code class="ph codeph">staop</code> is the object ID of the 
&quot;&lt;&quot; operator. As with the histogram, more than one entry could 
theoretically appear.</li>
+<li><code class="ph codeph">stavalues</code> is not used and should be 
NULL.</li>
+<li><code class="ph codeph">stanumbers</code> contains a single entry, the 
correlation coefficient between the sequence of data values and the sequence of 
their actual tuple positions. The coefficient ranges from +1 to -1.</li>
+</ul></td>
+</tr>
+</tbody>
+</table>
+
+The `pg_stats` view presents the contents of `pg_statistic` in a friendlier 
format. For more information, see 
[pg\_stats](../reference/catalog/pg_stats.html#topic1).
+
+Newly created tables and indexes have no statistics.
+
+### <a id="topic_oq3_qxj_3s__section_wsy_1rv_mt"></a>Sampling
+
+When calculating statistics for large tables, HAWQ creates a smaller table by 
sampling the base table. If the table is partitioned, samples are taken from 
all partitions.
+
+If a sample table is created, the number of rows in the sample is calculated 
to provide a maximum acceptable relative error. The amount of acceptable error 
is specified with the `gp_analyze_relative_error` system configuration 
parameter, which is set to .25 (25%) by default. This is usually sufficiently 
accurate to generate correct query plans. If `ANALYZE` is not producing good 
estimates for a table column, you can increase the sample size by setting the 
`gp_analyze_relative_error` configuration parameter to a lower value. Beware 
that setting this parameter to a low value can lead to a very large sample size 
and dramatically increase analyze time.
+
+### <a id="topic_oq3_qxj_3s__section_u5p_brv_mt"></a>Updating Statistics
+
+Running `ANALYZE` with no arguments updates statistics for all tables in the 
database. This could take a very long time, so it is better to analyze tables 
selectively after data has changed. You can also analyze a subset of the 
columns in a table, for example columns used in joins, `WHERE` clauses, `SORT` 
clauses, `GROUP BY` clauses, or `HAVING` clauses.
+
+See the SQL Command Reference for details of running the `ANALYZE` command.
+
+Refer to the Management Utility Reference for details of running the 
`analyzedb` command.
+
+### <a id="topic_oq3_qxj_3s__section_cv2_crv_mt"></a>Analyzing Partitioned and 
Append-Optimized Tables
+
+When the `ANALYZE` command is run on a partitioned table, it analyzes each 
leaf-level subpartition, one at a time. You can run `ANALYZE` on just new or 
changed partition files to avoid analyzing partitions that have not changed. If 
a table is partitioned, you can analyze just new or changed partitions.
+
+The `analyzedb` command-line utility skips unchanged partitions automatically. 
It also runs concurrent sessions so it can analyze several partitions 
concurrently. It runs five sessions by default, but the number of sessions can 
be set from 1 to 10 with the `-p` command-line option. Each time `analyzedb` 
runs, it saves state information for append-optimized tables and partitions in 
the `db_analyze` directory in the master data directory. The next time it runs, 
`analyzedb` compares the current state of each table with the saved state and 
skips analyzing a table or partition if it is unchanged. Heap tables are always 
analyzed.
+
+If the Pivotal Query Optimizer is enabled, you also need to run `ANALYZE       
      ROOTPARTITION` to refresh the root partition statistics. The Pivotal 
Query Optimizer requires statistics at the root level for partitioned tables. 
The legacy optimizer does not use these statistics. Enable the Pivotal Query 
Optimizer by setting both the `optimizer` and 
`optimizer_analyze_root_partition` system configuration parameters to on. The 
root level statistics are then updated when you run `ANALYZE` or `ANALYZE 
ROOTPARTITION`. The time to run `ANALYZE ROOTPARTITION` is similar to the time 
to analyze a single partition since `ANALYZE ROOTPARTITION`. The `analyzedb` 
utility updates root partition statistics by default .
+
+## <a id="topic_gyb_qrd_2t"></a>Configuring Statistics
+
+There are several options for configuring HAWQ statistics collection.
+
+### <a id="statstarget"></a>Statistics Target
+
+The statistics target is the size of the `most_common_vals`, 
`most_common_freqs`, and `histogram_bounds` arrays for an individual column. By 
default, the target is 25. The default target can be changed by setting a 
server configuration parameter and the target can be set for any column using 
the `ALTER TABLE` command. Larger values increase the time needed to do 
`ANALYZE`, but may improve the quality of the legacy query optimizer (planner) 
estimates.
+
+Set the system default statistics target to a different value by setting the 
`default_statistics_target` server configuration parameter. The default value 
is usually sufficient, and you should only raise or lower it if your tests 
demonstrate that query plans improve with the new target. 
+
+You will perform different procedures to set server configuration parameters 
for your whole HAWQ cluster depending upon whether you manage your cluster from 
the command line or use Ambari. If you use Ambari to manage your HAWQ cluster, 
you must ensure that you update server configuration parameters via the Ambari 
Web UI only. If you manage your HAWQ cluster from the command line, you will 
use the `hawq config` command line utility to set server configuration 
parameters.
+
+The following examples show how to raise the default statistics target from 25 
to 50.
+
+If you use Ambari to manage your HAWQ cluster:
+
+1. Set the `default_statistics_target` configuration property to `50` via the 
HAWQ service **Configs > Advanced > Custom hawq-site** drop down.
+2. Select **Service Actions > Restart All** to load the updated configuration.
+
+If you manage your HAWQ cluster from the command line:
+
+1.  Log in to the HAWQ master host as a HAWQ administrator and source the file 
`/usr/local/hawq/greenplum_path.sh`.
+
+    ``` shell
+    $ source /usr/local/hawq/greenplum_path.sh
+    ```
+
+1. Use the `hawq config` utility to set `default_statistics_target`:
+
+    ``` shell
+    $ hawq config -c default_statistics_target -v 50
+    ```
+2. Reload the HAWQ configuration:
+
+    ``` shell
+    $ hawq stop cluster -u
+    ```
+
+The statististics target for individual columns can be set with the `ALTER     
        TABLE` command. For example, some queries can be improved by increasing 
the target for certain columns, especially columns that have irregular 
distributions. You can set the target to zero for columns that never contribute 
to query optimization. When the target is 0, `ANALYZE` ignores the column. For 
example, the following `ALTER TABLE` command sets the statistics target for the 
`notes` column in the `emp` table to zero:
+
+``` sql
+ALTER TABLE emp ALTER COLUMN notes SET STATISTICS 0;
+```
+
+The statistics target can be set in the range 0 to 1000, or set it to -1 to 
revert to using the system default statistics target.
+
+Setting the statistics target on a parent partition table affects the child 
partitions. If you set statistics to 0 on some columns on the parent table, the 
statistics for the same columns are set to 0 for all children partitions. 
However, if you later add or exchange another child partition, the new child 
partition will use either the default statistics target or, in the case of an 
exchange, the previous statistics target. Therefore, if you add or exchange 
child partitions, you should set the statistics targets on the new child table.
+
+### <a id="topic_gyb_qrd_2t__section_j3p_drv_mt"></a>Automatic Statistics 
Collection
+
+HAWQ can be set to automatically run `ANALYZE` on a table that either has no 
statistics or has changed significantly when certain operations are performed 
on the table. For partitioned tables, automatic statistics collection is only 
triggered when the operation is run directly on a leaf table, and then only the 
leaf table is analyzed.
+
+Automatic statistics collection has three modes:
+
+-   `none` disables automatic statistics collection.
+-   `on_no_stats` triggers an analyze operation for a table with no existing 
statistics when any of the commands `CREATE TABLE AS SELECT`, `INSERT`, or 
`COPY` are executed on the table.
+-   `on_change` triggers an analyze operation when any of the commands `CREATE 
TABLE AS SELECT`, `INSERT`, or `COPY` are executed on the table and the number 
of rows affected exceeds the threshold defined by the 
`gp_autostats_on_change_threshold` configuration parameter.
+
+The automatic statistics collection mode is set separately for commands that 
occur within a procedural language function and commands that execute outside 
of a function:
+
+-   The `gp_autostats_mode` configuration parameter controls automatic 
statistics collection behavior outside of functions and is set to `on_no_stats` 
by default.
+
+With the `on_change` mode, `ANALYZE` is triggered only if the number of rows 
affected exceeds the threshold defined by the 
`gp_autostats_on_change_threshold` configuration parameter. The default value 
for this parameter is a very high value, 2147483647, which effectively disables 
automatic statistics collection; you must set the threshold to a lower number 
to enable it. The `on_change` mode could trigger large, unexpected analyze 
operations that could disrupt the system, so it is not recommended to set it 
globally. It could be useful in a session, for example to automatically analyze 
a table following a load.
+
+To disable automatic statistics collection outside of functions, set the 
`gp_autostats_mode` parameter to `none`. For a command-line-managed HAWQ 
cluster:
+
+``` shell
+$ hawq configure -c gp_autostats_mode -v none
+```
+
+For an Ambari-managed cluster, set `gp_autostats_mode` via the Ambari Web UI.
+
+Set the `log_autostats` system configuration parameter to `on` if you want to 
log automatic statistics collection operations.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/dml.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/dml.html.md.erb 
b/markdown/datamgmt/dml.html.md.erb
new file mode 100644
index 0000000..681883a
--- /dev/null
+++ b/markdown/datamgmt/dml.html.md.erb
@@ -0,0 +1,35 @@
+---
+title: Managing Data with HAWQ
+---
+
+This chapter provides information about manipulating data and concurrent 
access in HAWQ.
+
+-   **[Basic Data Operations](../datamgmt/BasicDataOperations.html)**
+
+    This topic describes basic data operations that you perform in HAWQ.
+
+-   **[About Database Statistics](../datamgmt/about_statistics.html)**
+
+    An overview of statistics gathered by the `ANALYZE` command in HAWQ.
+
+-   **[Concurrency Control](../datamgmt/ConcurrencyControl.html)**
+
+    This topic discusses the mechanisms used in HAWQ to provide concurrency 
control.
+
+-   **[Working with Transactions](../datamgmt/Transactions.html)**
+
+    This topic describes transaction support in HAWQ.
+
+-   **[Loading and Unloading 
Data](../datamgmt/load/g-loading-and-unloading-data.html)**
+
+    The topics in this section describe methods for loading and writing data 
into and out of HAWQ, and how to format data files.
+
+-   **[Using PXF with Unmanaged Data](../pxf/HawqExtensionFrameworkPXF.html)**
+
+    HAWQ Extension Framework (PXF) is an extensible framework that allows HAWQ 
to query external system data.Â 
+
+-   **[HAWQ InputFormat for 
MapReduce](../datamgmt/HAWQInputFormatforMapReduce.html)**
+
+    MapReduce is a programming model developed by Google for processing and 
generating large data sets on an array of commodity servers. You can use the 
HAWQ InputFormat option to enable MapReduce jobs to access HAWQ data stored in 
HDFS.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/client-loadtools.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/client-loadtools.html.md.erb 
b/markdown/datamgmt/load/client-loadtools.html.md.erb
new file mode 100644
index 0000000..fe291d0
--- /dev/null
+++ b/markdown/datamgmt/load/client-loadtools.html.md.erb
@@ -0,0 +1,104 @@
+---
+title: Client-Based HAWQ Load Tools
+---
+HAWQ supports data loading from Red Hat Enterprise Linux 5, 6, and 7 and 
Windows XP client systems. HAWQ Load Tools include both a loader program and a 
parallel file distribution program.
+
+This topic presents the instructions to install the HAWQ Load Tools on your 
client machine. It also includes the information necessary to configure HAWQ 
databases to accept remote client connections.
+
+## <a id="installloadrunrhel"></a>RHEL Load Tools
+
+The RHEL Load Tools are provided in a HAWQ distribution. 
+
+
+### <a id="installloadrunux"></a>Installing the RHEL Loader
+
+1. Download a HAWQ installer package or build HAWQ from source.
+ 
+2. Refer to the HAWQ command line install instructions to set up your package 
repositories and install the HAWQ binary.
+
+3. Install the `libevent` and `libyaml` packages. These libraries are required 
by the HAWQ file server. You must have superuser privileges on the system.
+
+    ``` shell
+    $ sudo yum install -y libevent libyaml
+    ```
+
+### <a id="installrhelloadabout"></a>About the RHEL Loader Installation
+
+The files/directories of interest in a HAWQ RHEL Load Tools installation 
include:
+
+`bin/` â data loading command-line tools 
([gpfdist](../../reference/cli/admin_utilities/gpfdist.html) and [hawq 
load](../../reference/cli/admin_utilities/hawqload.html))   
+`greenplum_path.sh` â environment set up file
+
+### <a id="installloadrhelcfgenv"></a>Configuring the RHEL Load Environment
+
+A `greenplum_path.sh` file is located in the HAWQ base install directory 
following installation. Source `greenplum_path.sh` before running the HAWQ RHEL 
Load Tools to set up your HAWQ environment:
+
+``` shell
+$ . /usr/local/hawq/greenplum_path.sh
+```
+
+Continue to [Using the HAWQ File Server 
(gpfdist)](g-using-the-hawq-file-server--gpfdist-.html) for specific 
information about using the HAWQ load tools.
+
+## <a id="installloadrunwin"></a>Windows Load Tools
+
+### <a id="installpythonwin"></a>Installing Python 2.5
+The HAWQ Load Tools for Windows requires that the 32-bit version of Python 2.5 
be installed on your system. 
+
+**Note**: The 64-bit version of Python is **not** compatible with the HAWQ 
Load Tools for Windows.
+
+1. Download the [Python 2.5 installer for 
Windows](https://www.python.org/downloads/).  Make note of the directory to 
which it was downloaded.
+
+2. Double-click on the `python Load Tools for Windows-2.5.x.msi` package to 
launch the installer.
+3. Select **Install for all users** and click **Next**.
+4. The default Python install location is `C:\Pythonxx`. Click **Up** or 
**New** to choose another location. Click **Next**.
+5. Click **Next** to install the selected Python components.
+6. Click **Finish** to complete the Python installation.
+
+
+### <a id="installloadrunwin"></a>Running the Windows Installer
+
+1. Download the `greenplum-loaders-4.3.x.x-build-n-WinXP-x86_32.msi` installer 
package from [Pivotal 
Network](https://network.pivotal.io/products/pivotal-gpdb). Make note of the 
directory to which it was downloaded.
+ 
+2. Double-click the `greenplum-loaders-4.3.x.x-build-n-WinXP-x86_32.msi` file 
to launch the installer.
+3. Click **Next** on the **Welcome** screen.
+4. Click **I Agree** on the **License Agreement** screen.
+5. The default install location for HAWQ Loader Tools for Windows is 
`C:\"Program Files (x86)"\Greenplum\greenplum-loaders-4.3.8.1-build-1`. Click 
**Browse** to choose another location.
+6. Click **Next**.
+7. Click **Install** to begin the installation.
+8. Click **Finish** to exit the installer.
+
+    
+### <a id="installloadabout"></a>About the Windows Loader Installation
+Your HAWQ Windows Load Tools installation includes the following files and 
directories:
+
+`bin/` â data loading command-line tools 
([gpfdist](http://gpdb.docs.pivotal.io/4380/client_tool_guides/load/unix/gpfdist.html)
 and 
[gpload](http://gpdb.docs.pivotal.io/4380/client_tool_guides/load/unix/gpload.html))
  
+`lib/` â data loading library files  
+`greenplum_loaders_path.bat` â environment set up file
+
+
+### <a id="installloadcfgenv"></a>Configuring the Windows Load Environment
+
+A `greenplum_loaders_path.bat` file is provided in your load tools base 
install directory following installation. This file sets the following 
environment variables:
+
+- `GPHOME_LOADERS` - base directory of loader installation
+- `PATH` - adds the loader and component program directories
+- `PYTHONPATH` - adds component library directories
+
+Execute `greenplum_loaders_path.bat` to set up your HAWQ environment before 
running the HAWQ Windows Load Tools.
+ 
+
+## <a id="installloadenableclientconn"></a>Enabling Remote Client Connections
+The HAWQ master database must be configured to accept remote client 
connections.  Specifically, you need to identify the client hosts and database 
users that will be connecting to the HAWQ database.
+
+1. Ensure that the HAWQ database master `pg_hba.conf` file is correctly 
configured to allow connections from the desired users operating on the desired 
database from the desired hosts, using the authentication method you choose. 
For details, see [Configuring Client 
Access](../../clientaccess/client_auth.html#topic2).
+
+    Make sure the authentication method you choose is supported by the client 
tool you are using.
+    
+2. If you edited the `pg_hba.conf` file, reload the server configuration. If 
you have any active database connections, you must include the `-M fast` option 
in the `hawq stop` command:
+
+    ``` shell
+    $ hawq stop cluster -u [-M fast]
+    ```
+   
+
+3. Verify and/or configure the databases and roles you are using to connect, 
and that the roles have the correct privileges to the database objects.
\ No newline at end of file

[39/51] [partial] incubator-hawq-docs git commit: HAWQ-1254 Fix/remove book branching on incubator-hawq-docs

Reply via email to