http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/creating-external-tables-examples.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/creating-external-tables-examples.html.md.erb 
b/markdown/datamgmt/load/creating-external-tables-examples.html.md.erb
new file mode 100644
index 0000000..8cdbff1
--- /dev/null
+++ b/markdown/datamgmt/load/creating-external-tables-examples.html.md.erb
@@ -0,0 +1,117 @@
+---
+title: Creating External Tables - Examples
+---
+
+The following examples show how to define external data with different 
protocols. Each `CREATE EXTERNAL TABLE` command can contain only one protocol.
+
+**Note:** When using IPv6, always enclose the numeric IP addresses in square 
brackets.
+
+Start `gpfdist` before you create external tables with the `gpfdist` protocol. 
The following code starts the `gpfdist` file server program in the background 
on port *8081* serving files from directory `/var/data/staging`. The logs are 
saved in `/home/gpadmin/log`.
+
+``` shell
+$ gpfdist -p 8081 -d /var/data/staging -l /home/gpadmin/log &
+```
+
+## <a id="ex1"></a>Example 1 - Single gpfdist instance on single-NIC machine
+
+Creates a readable external table, `ext_expenses`, using the `gpfdist` 
protocol. The files are formatted with a pipe (|) as the column delimiter.
+
+``` sql
+=# CREATE EXTERNAL TABLE ext_expenses
+        ( name text, date date, amount float4, category text, desc1 text )
+    LOCATION ('gpfdist://etlhost-1:8081/*', 'gpfdist://etlhost-1:8082/*')
+    FORMAT 'TEXT' (DELIMITER '|');
+```
+
+## <a id="ex2"></a>Example 2 - Multiple gpfdist instances
+
+Creates a readable external table, *ext\_expenses*, using the `gpfdist` 
protocol from all files with the *txt* extension. The column delimiter is a 
pipe ( | ) and NULL is a space (' ').
+
+``` sql
+=# CREATE EXTERNAL TABLE ext_expenses
+        ( name text, date date, amount float4, category text, desc1 text )
+    LOCATION ('gpfdist://etlhost-1:8081/*.txt', 
'gpfdist://etlhost-2:8081/*.txt')
+    FORMAT 'TEXT' ( DELIMITER '|' NULL ' ') ;
+    
+```
+
+## <a id="ex3"></a>Example 3 - Multiple gpfdists instances
+
+Creates a readable external table, *ext\_expenses,* from all files with the 
*txt* extension using the `gpfdists` protocol. The column delimiter is a pipe ( 
| ) and NULL is a space (' '). For information about the location of security 
certificates, see [gpfdists Protocol](g-gpfdists-protocol.html).
+
+1.  Run `gpfdist` with the `--ssl` option.
+2.  Run the following command.
+
+    ``` sql
+    =# CREATE EXTERNAL TABLE ext_expenses
+             ( name text, date date, amount float4, category text, desc1 text )
+        LOCATION ('gpfdists://etlhost-1:8081/*.txt', 
'gpfdists://etlhost-2:8082/*.txt')
+        FORMAT 'TEXT' ( DELIMITER '|' NULL ' ') ;
+        
+    ```
+
+## <a id="ex4"></a>Example 4 - Single gpfdist instance with error logging
+
+Uses the gpfdist protocol to create a readable external table, `ext_expenses,` 
from all files with the *txt* extension. The column delimiter is a pipe ( | ) 
and NULL (' ') is a space.
+
+Access to the external table is single row error isolation mode. Input data 
formatting errors can be captured so that you can view the errors, fix the 
issues, and then reload the rejected data. If the error count on a segment is 
greater than five (the `SEGMENT REJECT LIMIT` value), the entire external table 
operation fails and no rows are processed.
+
+``` sql
+=# CREATE EXTERNAL TABLE ext_expenses
+         ( name text, date date, amount float4, category text, desc1 text )
+    LOCATION ('gpfdist://etlhost-1:8081/*.txt', 
'gpfdist://etlhost-2:8082/*.txt')
+    FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
+    LOG ERRORS INTO expenses_errs SEGMENT REJECT LIMIT 5;
+    
+```
+
+To create the readable `ext_expenses` table from CSV-formatted text files:
+
+``` sql
+=# CREATE EXTERNAL TABLE ext_expenses
+         ( name text, date date, amount float4, category text, desc1 text )
+    LOCATION ('gpfdist://etlhost-1:8081/*.txt', 
'gpfdist://etlhost-2:8082/*.txt')
+    FORMAT 'CSV' ( DELIMITER ',' )
+    LOG ERRORS INTO expenses_errs SEGMENT REJECT LIMIT 5;
+    
+```
+
+## <a id="ex5"></a>Example 5 - Readable Web External Table with Script
+
+Creates a readable web external table that executes a script once on five 
virtual segments:
+
+``` sql
+=# CREATE EXTERNAL WEB TABLE log_output (linenum int, message text)
+    EXECUTE '/var/load_scripts/get_log_data.sh' ON 5
+    FORMAT 'TEXT' (DELIMITER '|');
+    
+```
+
+## <a id="ex6"></a>Example 6 - Writable External Table with gpfdist
+
+Creates a writable external table, *sales\_out*, that uses `gpfdist` to write 
output data to the file *sales.out*. The column delimiter is a pipe ( | ) and 
NULL is a space (' '). The file will be created in the directory specified when 
you started the gpfdist file server.
+
+``` sql
+=# CREATE WRITABLE EXTERNAL TABLE sales_out (LIKE sales)
+    LOCATION ('gpfdist://etl1:8081/sales.out')
+    FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
+    DISTRIBUTED BY (txn_id);
+    
+```
+
+## <a id="ex7"></a>Example 7 - Writable External Web Table with Script
+
+Creates a writable external web table, `campaign_out`, that pipes output data 
recieved by the segments to an executable script, `to_adreport_etl.sh`:
+
+``` sql
+=# CREATE WRITABLE EXTERNAL WEB TABLE campaign_out
+        (LIKE campaign)
+        EXECUTE '/var/unload_scripts/to_adreport_etl.sh' ON 6
+        FORMAT 'TEXT' (DELIMITER '|');
+```
+
+## <a id="ex8"></a>Example 8 - Readable and Writable External Tables with XML 
Transformations
+
+HAWQ can read and write XML data to and from external tables with gpfdist. For 
information about setting up an XML transform, see [Transforming XML 
Data](g-transforming-xml-data.html#topic75).
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-about-gpfdist-setup-and-performance.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-about-gpfdist-setup-and-performance.html.md.erb 
b/markdown/datamgmt/load/g-about-gpfdist-setup-and-performance.html.md.erb
new file mode 100644
index 0000000..28a0bfe
--- /dev/null
+++ b/markdown/datamgmt/load/g-about-gpfdist-setup-and-performance.html.md.erb
@@ -0,0 +1,22 @@
+---
+title: About gpfdist Setup and Performance
+---
+
+Consider the following scenarios for optimizing your ETL network performance.
+
+-   Allow network traffic to use all ETL host Network Interface Cards (NICs) 
simultaneously. Run one instance of `gpfdist` on the ETL host, then declare the 
host name of each NIC in the `LOCATION` clause of your external table 
definition (see [Creating External Tables - 
Examples](creating-external-tables-examples.html#topic44)).
+
+<a id="topic14__du165872"></a>
+<span class="figtitleprefix">Figure: </span>External Table Using Single 
gpfdist Instance with Multiple NICs
+
+<img src="../../images/ext_tables_multinic.jpg" class="image" width="472" 
height="271" />
+
+-   Divide external table data equally among multiple `gpfdist` instances on 
the ETL host. For example, on an ETL system with two NICs, run two `gpfdist` 
instances (one on each NIC) to optimize data load performance and divide the 
external table data files evenly between the two `gpfdists`.
+
+<a id="topic14__du165882"></a>
+
+<span class="figtitleprefix">Figure: </span>External Tables Using Multiple 
gpfdist Instances with Multiple NICs
+
+<img src="../../images/ext_tables.jpg" class="image" width="467" height="282" 
/>
+
+**Note:** Use pipes (|) to separate formatted text when you submit files to 
`gpfdist`. HAWQ encloses comma-separated text strings in single or double 
quotes. `gpfdist` has to remove the quotes to parse the strings. Using pipes to 
separate formatted text avoids the extra step and improves performance.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-character-encoding.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-character-encoding.html.md.erb 
b/markdown/datamgmt/load/g-character-encoding.html.md.erb
new file mode 100644
index 0000000..9f3756d
--- /dev/null
+++ b/markdown/datamgmt/load/g-character-encoding.html.md.erb
@@ -0,0 +1,11 @@
+---
+title: Character Encoding
+---
+
+Character encoding systems consist of a code that pairs each character from a 
character set with something else, such as a sequence of numbers or octets, to 
facilitate data stransmission and storage. HAWQ supports a variety of character 
sets, including single-byte character sets such as the ISO 8859 series and 
multiple-byte character sets such as EUC (Extended UNIX Code), UTF-8, and Mule 
internal code. Clients can use all supported character sets transparently, but 
a few are not supported for use within the server as a server-side encoding.
+
+Data files must be in a character encoding recognized by HAWQ. Data files that 
contain invalid or unsupported encoding sequences encounter errors when loading 
into HAWQ.
+
+**Note:** On data files generated on a Microsoft Windows operating system, run 
the `dos2unix` system command to remove any Windows-only characters before 
loading into HAWQ.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-command-based-web-external-tables.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-command-based-web-external-tables.html.md.erb 
b/markdown/datamgmt/load/g-command-based-web-external-tables.html.md.erb
new file mode 100644
index 0000000..7830cc3
--- /dev/null
+++ b/markdown/datamgmt/load/g-command-based-web-external-tables.html.md.erb
@@ -0,0 +1,26 @@
+---
+title: Command-based Web External Tables
+---
+
+The output of a shell command or script defines command-based web table data. 
Specify the command in the `EXECUTE` clause of `CREATE EXTERNAL WEB             
    TABLE`. The data is current as of the time the command runs. The `EXECUTE` 
clause runs the shell command or script on the specified master or virtual 
segments. The virtual segments run the command in parallel. Scripts must be 
executable by the gpadmin user and reside in the same location on the master or 
the hosts of virtual segments.
+
+The command that you specify in the external table definition executes from 
the database and cannot access environment variables from `.bashrc` or 
`.profile`. Set environment variables in the `EXECUTE` clause. The following 
external web table, for example, runs a command on the HAWQ master host:
+
+``` sql
+CREATE EXTERNAL WEB TABLE output (output text)
+EXECUTE 'PATH=/home/gpadmin/programs; export PATH; myprogram.sh'
+    ON MASTER 
+FORMAT 'TEXT';
+```
+
+The following command defines a web table that runs a script on five virtual 
segments.
+
+``` sql
+CREATE EXTERNAL WEB TABLE log_output (linenum int, message text) 
+EXECUTE '/var/load_scripts/get_log_data.sh' ON 5 
+FORMAT 'TEXT' (DELIMITER '|');
+```
+
+The virtual segments are selected by the resource manager at runtime.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-configuration-file-format.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-configuration-file-format.html.md.erb 
b/markdown/datamgmt/load/g-configuration-file-format.html.md.erb
new file mode 100644
index 0000000..73f51a9
--- /dev/null
+++ b/markdown/datamgmt/load/g-configuration-file-format.html.md.erb
@@ -0,0 +1,66 @@
+---
+title: Configuration File Format
+---
+
+The `gpfdist` configuration file uses the YAML 1.1 document format and 
implements a schema for defining the transformation parameters. The 
configuration file must be a valid YAML document.
+
+The `gpfdist` program processes the document in order and uses indentation 
(spaces) to determine the document hierarchy and relationships of the sections 
to one another. The use of white space is significant. Do not use white space 
for formatting and do not use tabs.
+
+The following is the basic structure of a configuration file.
+
+``` pre
+---
+VERSION:   1.0.0.1
+TRANSFORMATIONS: 
+transformation_name1:
+TYPE:      input | output
+COMMAND:   command
+CONTENT:   data | paths
+SAFE:      posix-regex
+STDERR:    server | console
+transformation_name2:
+TYPE:      input | output
+COMMAND:   command 
+...
+```
+
+VERSION  
+Required. The version of the `gpfdist` configuration file schema. The current 
version is 1.0.0.1.
+
+TRANSFORMATIONS  
+Required. Begins the transformation specification section. A configuration 
file must have at least one transformation. When `gpfdist` receives a 
transformation request, it looks in this section for an entry with the matching 
transformation name.
+
+TYPE  
+Required. Specifies the direction of transformation. Values are `input` or 
`output`.
+
+-   `input`: `gpfdist` treats the standard output of the transformation 
process as a stream of records to load into HAWQ.
+-   `output` <span class="ph">: </span> `gpfdist` treats the standard input of 
the transformation process as a stream of records from HAWQ to transform and 
write to the appropriate output.
+
+COMMAND  
+Required. Specifies the command `gpfdist` will execute to perform the 
transformation.
+
+For input transformations, `gpfdist` invokes the command specified in the 
`CONTENT` setting. The command is expected to open the underlying file(s) as 
appropriate and produce one line of `TEXT` for each row to load into HAWQ 
/&gt;. The input transform determines whether the entire content should be 
converted to one row or to multiple rows.
+
+For output transformations, `gpfdist` invokes this command as specified in the 
`CONTENT` setting. The output command is expected to open and write to the 
underlying file(s) as appropriate. The output transformation determines the 
final placement of the converted output.
+
+CONTENT  
+Optional. The values are `data` and `paths`. The default value is `data`.
+
+-   When `CONTENT` specifies `data`, the text `%filename%` in the `COMMAND` 
section is replaced by the path to the file to read or write.
+-   When `CONTENT` specifies `paths`, the text `%filename%` in the `COMMAND` 
section is replaced by the path to the temporary file that contains the list of 
files to read or write.
+
+The following is an example of a `COMMAND` section showing the text 
`%filename%` that is replaced.
+
+``` pre
+COMMAND: /bin/bash input_transform.sh %filename%
+```
+
+SAFE  
+Optional. A `POSIX `regular expression that the paths must match to be passed 
to the transformation. Specify `SAFE` when there is a concern about injection 
or improper interpretation of paths passed to the command. The default is no 
restriction on paths.
+
+STDERR  
+Optional.The values are `server` and `console`.
+
+This setting specifies how to handle standard error output from the 
transformation. The default, `server`, specifies that `gpfdist` will capture 
the standard error output from the transformation in a temporary file and send 
the first 8k of that file to HAWQ as an error message. The error message will 
appear as a SQL error. `Console` specifies that `gpfdist` does not redirect or 
transmit the standard error output from the transformation.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-controlling-segment-parallelism.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-controlling-segment-parallelism.html.md.erb 
b/markdown/datamgmt/load/g-controlling-segment-parallelism.html.md.erb
new file mode 100644
index 0000000..4e0096c
--- /dev/null
+++ b/markdown/datamgmt/load/g-controlling-segment-parallelism.html.md.erb
@@ -0,0 +1,11 @@
+---
+title: Controlling Segment Parallelism
+---
+
+The `gp_external_max_segs` server configuration parameter controls the number 
of virtual segments that can simultaneously access a single `gpfdist` instance. 
The default is 64. You can set the number of segments such that some segments 
process external data files and some perform other database processing. Set 
this parameter in the `hawq-site.xml` file of your master instance.
+
+The number of segments in the `gpfdist` location list specify the minimum 
number of virtual segments required to serve data to a `gpfdist` external table.
+
+The `hawq_rm_nvseg_perquery_perseg_limit` and `hawq_rm_nvseg_perquery_limit` 
parameters also control segment parallelism by specifying the maximum number of 
segments used in running queries on a `gpfdist` external table on the cluster.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-create-an-error-table-and-declare-a-reject-limit.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-create-an-error-table-and-declare-a-reject-limit.html.md.erb
 
b/markdown/datamgmt/load/g-create-an-error-table-and-declare-a-reject-limit.html.md.erb
new file mode 100644
index 0000000..ade14ea
--- /dev/null
+++ 
b/markdown/datamgmt/load/g-create-an-error-table-and-declare-a-reject-limit.html.md.erb
@@ -0,0 +1,11 @@
+---
+title: Capture Row Formatting Errors and Declare a Reject Limit
+---
+
+The following SQL fragment captures formatting errors internally in HAWQ and 
declares a reject limit of 10 rows.
+
+``` sql
+LOG ERRORS INTO errortable SEGMENT REJECT LIMIT 10 ROWS
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-creating-and-using-web-external-tables.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-creating-and-using-web-external-tables.html.md.erb 
b/markdown/datamgmt/load/g-creating-and-using-web-external-tables.html.md.erb
new file mode 100644
index 0000000..4ef6cab
--- /dev/null
+++ 
b/markdown/datamgmt/load/g-creating-and-using-web-external-tables.html.md.erb
@@ -0,0 +1,13 @@
+---
+title: Creating and Using Web External Tables
+---
+
+`CREATE EXTERNAL WEB TABLE` creates a web table definition. Web external 
tables allow HAWQ to treat dynamic data sources like regular database tables. 
Because web table data can change as a query runs, the data is not rescannable.
+
+You can define command-based or URL-based web external tables. The definition 
forms are distinct: you cannot mix command-based and URL-based definitions.
+
+-   **[Command-based Web External 
Tables](../../datamgmt/load/g-command-based-web-external-tables.html)**
+
+-   **[URL-based Web External 
Tables](../../datamgmt/load/g-url-based-web-external-tables.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-define-an-external-table-with-single-row-error-isolation.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-define-an-external-table-with-single-row-error-isolation.html.md.erb
 
b/markdown/datamgmt/load/g-define-an-external-table-with-single-row-error-isolation.html.md.erb
new file mode 100644
index 0000000..e0c3c17
--- /dev/null
+++ 
b/markdown/datamgmt/load/g-define-an-external-table-with-single-row-error-isolation.html.md.erb
@@ -0,0 +1,24 @@
+---
+title: Define an External Table with Single Row Error Isolation
+---
+
+The following example logs errors internally in HAWQ and sets an error 
threshold of 10 errors.
+
+``` sql
+=# CREATE EXTERNAL TABLE ext_expenses ( name text, date date, amount float4, 
category text, desc1 text )
+   LOCATION ('gpfdist://etlhost-1:8081/*', 'gpfdist://etlhost-2:8082/*')
+   FORMAT 'TEXT' (DELIMITER '|')
+   LOG ERRORS INTO errortable SEGMENT REJECT LIMIT 10 ROWS;
+```
+
+The following example creates an external table, *ext\_expenses*, sets an 
error threshold of 10 errors, and writes error rows to the table 
*err\_expenses*.
+
+``` sql
+=# CREATE EXTERNAL TABLE ext_expenses
+     ( name text, date date, amount float4, category text, desc1 text )
+   LOCATION ('gpfdist://etlhost-1:8081/*', 'gpfdist://etlhost-2:8082/*')
+   FORMAT 'TEXT' (DELIMITER '|')
+   LOG ERRORS INTO err_expenses SEGMENT REJECT LIMIT 10 ROWS;
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-defining-a-command-based-writable-external-web-table.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-defining-a-command-based-writable-external-web-table.html.md.erb
 
b/markdown/datamgmt/load/g-defining-a-command-based-writable-external-web-table.html.md.erb
new file mode 100644
index 0000000..8a24474
--- /dev/null
+++ 
b/markdown/datamgmt/load/g-defining-a-command-based-writable-external-web-table.html.md.erb
@@ -0,0 +1,43 @@
+---
+title: Defining a Command-Based Writable External Web Table
+---
+
+You can define writable external web tables to send output rows to an 
application or script. The application must accept an input stream, reside in 
the same location on all of the HAWQ segment hosts, and be executable by the 
`gpadmin` user. All segments in the HAWQ system run the application or script, 
whether or not a segment has output rows to process.
+
+Use `CREATE WRITABLE EXTERNAL WEB TABLE` to define the external table and 
specify the application or script to run on the segment hosts. Commands execute 
from within the database and cannot access environment variables (such as 
`$PATH`). Set environment variables in the `EXECUTE` clause of your writable 
external table definition. For example:
+
+``` sql
+=# CREATE WRITABLE EXTERNAL WEB TABLE output (output text) 
+    EXECUTE 'export PATH=$PATH:/home/gpadmin/programs; myprogram.sh' 
+    ON 6
+    FORMAT 'TEXT'
+    DISTRIBUTED RANDOMLY;
+```
+
+The following HAWQ variables are available for use in OS commands executed by 
a web or writable external table. Set these variables as environment variables 
in the shell that executes the command(s). They can be used to identify a set 
of requests made by an external table statement across the HAWQ array of hosts 
and segment instances.
+
+<caption><span class="tablecap">Table 1. External Table EXECUTE 
Variables</span></caption>
+
+<a id="topic71__du224024"></a>
+
+| Variable            | Description                                            
                                                                    |
+|---------------------|----------------------------------------------------------------------------------------------------------------------------|
+| $GP\_CID            | Command count of the transaction executing the 
external table statement.                                                   |
+| $GP\_DATABASE       | The database in which the external table definition 
resides.                                                               |
+| $GP\_DATE           | The date on which the external table command ran.      
                                                                    |
+| $GP\_MASTER\_HOST   | The host name of the HAWQ master host from which the 
external table statement was dispatched.                              |
+| $GP\_MASTER\_PORT   | The port number of the HAWQ master instance from which 
the external table statement was dispatched.                        |
+| $GP\_SEG\_DATADIR   | The location of the data directory of the segment 
instance executing the external table command.                           |
+| $GP\_SEG\_PG\_CONF  | The location of the `hawq-site.xml` file of the 
segment instance executing the external table command.                     |
+| $GP\_SEG\_PORT      | The port number of the segment instance executing the 
external table command.                                              |
+| $GP\_SEGMENT\_COUNT | The total number of segment instances in the HAWQ 
system.                                                                  |
+| $GP\_SEGMENT\_ID    | The ID number of the segment instance executing the 
external table command (same as `dbid` in `gp_segment_configuration`). |
+| $GP\_SESSION\_ID    | The database session identifier number associated with 
the external table statement.                                       |
+| $GP\_SN             | Serial number of the external table scan node in the 
query plan of the external table statement.                           |
+| $GP\_TIME           | The time the external table command was executed.      
                                                                    |
+| $GP\_USER           | The database user executing the external table 
statement.                                                                  |
+| $GP\_XID            | The transaction ID of the external table statement.    
                                                                    |
+
+-   **[Disabling EXECUTE for Web or Writable External 
Tables](../../datamgmt/load/g-disabling-execute-for-web-or-writable-external-tables.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-defining-a-file-based-writable-external-table.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-defining-a-file-based-writable-external-table.html.md.erb
 
b/markdown/datamgmt/load/g-defining-a-file-based-writable-external-table.html.md.erb
new file mode 100644
index 0000000..fa1ddfa
--- /dev/null
+++ 
b/markdown/datamgmt/load/g-defining-a-file-based-writable-external-table.html.md.erb
@@ -0,0 +1,16 @@
+---
+title: Defining a File-Based Writable External Table
+---
+
+Writable external tables that output data to files use the HAWQ parallel file 
server program, `gpfdist`, or HAWQ Extensions Framework (PXF).
+
+Use the `CREATE WRITABLE EXTERNAL TABLE` command to define the external table 
and specify the location and format of the output files.
+
+-   With a writable external table using the `gpfdist` protocol, the HAWQ 
segments send their data to `gpfdist`, which writes the data to the named file. 
`gpfdist` must run on a host that the HAWQ segments can access over the 
network. `gpfdist` points to a file location on the output host and writes data 
received from the HAWQ segments to the file. To divide the output data among 
multiple files, list multiple `gpfdist` URIs in your writable external table 
definition.
+-   A writable external web table sends data to an application as a stream of 
data. For example, unload data from HAWQ and send it to an application that 
connects to another database or ETL tool to load the data elsewhere. Writable 
external web tables use the `EXECUTE` clause to specify a shell command, 
script, or application to run on the segment hosts and accept an input stream 
of data. See [Defining a Command-Based Writable External Web 
Table](g-defining-a-command-based-writable-external-web-table.html#topic71) for 
more information about using `EXECUTE` commands in a writable external table 
definition.
+
+You can optionally declare a distribution policy for your writable external 
tables. By default, writable external tables use a random distribution policy. 
If the source table you are exporting data from has a hash distribution policy, 
defining the same distribution key column(s) for the writable external table 
improves unload performance by eliminating the requirement to move rows over 
the interconnect. If you unload data from a particular table, you can use the 
`LIKE` clause to copy the column definitions and distribution policy from the 
source table.
+
+-   **[Example - HAWQ file server 
(gpfdist)](../../datamgmt/load/g-example-hawq-file-server-gpfdist.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-determine-the-transformation-schema.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-determine-the-transformation-schema.html.md.erb 
b/markdown/datamgmt/load/g-determine-the-transformation-schema.html.md.erb
new file mode 100644
index 0000000..1a4eb9b
--- /dev/null
+++ b/markdown/datamgmt/load/g-determine-the-transformation-schema.html.md.erb
@@ -0,0 +1,33 @@
+---
+title: Determine the Transformation Schema
+---
+
+To prepare for the transformation project:
+
+1.  <span class="ph">Determine the goal of the project, such as indexing data, 
analyzing data, combining data, and so on.</span>
+2.  <span class="ph">Examine the XML file and note the file structure and 
element names. </span>
+3.  <span class="ph">Choose the elements to import and decide if any other 
limits are appropriate. </span>
+
+For example, the following XML file, *prices.xml*, is a simple, short file 
that contains price records. Each price record contains two fields: an item 
number and a price.
+
+``` xml
+<?xml version="1.0" encoding="ISO-8859-1" ?>
+<prices>
+  <pricerecord>
+    <itemnumber>708421</itemnumber>
+    <price>19.99</price>
+  </pricerecord>
+  <pricerecord>
+    <itemnumber>708466</itemnumber>
+    <price>59.25</price>
+  </pricerecord>
+  <pricerecord>
+    <itemnumber>711121</itemnumber>
+    <price>24.99</price>
+  </pricerecord>
+</prices>
+```
+
+The goal is to import all the data into a HAWQ table with an integer 
`itemnumber` column and a decimal `price` column.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-disabling-execute-for-web-or-writable-external-tables.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-disabling-execute-for-web-or-writable-external-tables.html.md.erb
 
b/markdown/datamgmt/load/g-disabling-execute-for-web-or-writable-external-tables.html.md.erb
new file mode 100644
index 0000000..f0332b5
--- /dev/null
+++ 
b/markdown/datamgmt/load/g-disabling-execute-for-web-or-writable-external-tables.html.md.erb
@@ -0,0 +1,11 @@
+---
+title: Disabling EXECUTE for Web or Writable External Tables
+---
+
+There is a security risk associated with allowing external tables to execute 
OS commands or scripts. To disable the use of `EXECUTE` in web and writable 
external table definitions, set the `gp_external_enable_exec server` 
configuration parameter to off in your master `hawq-site.xml` file:
+
+``` pre
+gp_external_enable_exec = off
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-escaping-in-csv-formatted-files.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-escaping-in-csv-formatted-files.html.md.erb 
b/markdown/datamgmt/load/g-escaping-in-csv-formatted-files.html.md.erb
new file mode 100644
index 0000000..d07b463
--- /dev/null
+++ b/markdown/datamgmt/load/g-escaping-in-csv-formatted-files.html.md.erb
@@ -0,0 +1,29 @@
+---
+title: Escaping in CSV Formatted Files
+---
+
+By default, the escape character is a `"` (double quote) for CSV-formatted 
files. If you want to use a different escape character, use the `ESCAPE` clause 
of `COPY`, `CREATE EXTERNAL TABLE` or the `hawq load` control file to declare a 
different escape character. In cases where your selected escape character is 
present in your data, you can use it to escape itself.
+
+For example, suppose you have a table with three columns and you want to load 
the following three fields:
+
+-   `Free trip to A,B`
+-   `5.89`
+-   `Special rate "1.79"`
+
+Your designated delimiter character is `,` (comma), and your designated escape 
character is `"` (double quote). The formatted row in your data file looks like 
this:
+
+``` pre
+         "Free trip to A,B","5.89","Special rate ""1.79"""
+
+      
+```
+
+The data value with a comma character that is part of the data is enclosed in 
double quotes. The double quotes that are part of the data are escaped with a 
double quote even though the field value is enclosed in double quotes.
+
+Embedding the entire field inside a set of double quotes guarantees 
preservation of leading and trailing whitespace characters:
+
+`"`Free trip to A,B `"`,`"`5.89 `"`,`"`Special rate `""`1.79`""             "`
+
+**Note:** In CSV mode, all characters are significant. A quoted value 
surrounded by white space, or any characters other than `DELIMITER`, includes 
those characters. This can cause errors if you import data from a system that 
pads CSV lines with white space to some fixed width. In this case, preprocess 
the CSV file to remove the trailing white space before importing the data into 
HAWQ.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-escaping-in-text-formatted-files.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-escaping-in-text-formatted-files.html.md.erb 
b/markdown/datamgmt/load/g-escaping-in-text-formatted-files.html.md.erb
new file mode 100644
index 0000000..e24a2b7
--- /dev/null
+++ b/markdown/datamgmt/load/g-escaping-in-text-formatted-files.html.md.erb
@@ -0,0 +1,31 @@
+---
+title: Escaping in Text Formatted Files
+---
+
+By default, the escape character is a \\ (backslash) for text-formatted files. 
You can declare a different escape character in the `ESCAPE` clause of `COPY`, 
`CREATE EXTERNAL TABLE`, or the `hawq             load` control file. If your 
escape character appears in your data, use it to escape itself.
+
+For example, suppose you have a table with three columns and you want to load 
the following three fields:
+
+-   `backslash = \`
+-   `vertical bar = |`
+-   `exclamation point = !`
+
+Your designated delimiter character is `|` (pipe character), and your 
designated escape character is `\` (backslash). The formatted row in your data 
file looks like this:
+
+``` pre
+backslash = \\ | vertical bar = \| | exclamation point = !
+```
+
+Notice how the backslash character that is part of the data is escaped with 
another backslash character, and the pipe character that is part of the data is 
escaped with a backslash character.
+
+You can use the escape character to escape octal and hexidecimal sequences. 
The escaped value is converted to the equivalent character when loaded into 
HAWQ. For example, to load the ampersand character (`&`), use the escape 
character to escape its equivalent hexidecimal (`\0x26`) or octal (`\046`) 
representation.
+
+You can disable escaping in `TEXT`-formatted files using the `ESCAPE` clause 
of `COPY`, `CREATE EXTERNAL TABLE` or the `hawq load` control file as follows:
+
+``` pre
+ESCAPE 'OFF'
+```
+
+This is useful for input data that contains many backslash characters, such as 
web log data.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-escaping.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-escaping.html.md.erb 
b/markdown/datamgmt/load/g-escaping.html.md.erb
new file mode 100644
index 0000000..0a1e62a
--- /dev/null
+++ b/markdown/datamgmt/load/g-escaping.html.md.erb
@@ -0,0 +1,16 @@
+---
+title: Escaping
+---
+
+There are two reserved characters that have special meaning to HAWQ:
+
+-   The designated delimiter character separates columns or fields in the data 
file.
+-   The newline character designates a new row in the data file.
+
+If your data contains either of these characters, you must escape the 
character so that HAWQ treats it as data and not as a field separator or new 
row. By default, the escape character is a \\ (backslash) for text-formatted 
files and a double quote (") for csv-formatted files.
+
+-   **[Escaping in Text Formatted 
Files](../../datamgmt/load/g-escaping-in-text-formatted-files.html)**
+
+-   **[Escaping in CSV Formatted 
Files](../../datamgmt/load/g-escaping-in-csv-formatted-files.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-example-1-dblp-database-publications-in-demo-directory.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-example-1-dblp-database-publications-in-demo-directory.html.md.erb
 
b/markdown/datamgmt/load/g-example-1-dblp-database-publications-in-demo-directory.html.md.erb
new file mode 100644
index 0000000..4f61396
--- /dev/null
+++ 
b/markdown/datamgmt/load/g-example-1-dblp-database-publications-in-demo-directory.html.md.erb
@@ -0,0 +1,29 @@
+---
+title: Command-based Web External Tables
+---
+
+The output of a shell command or script defines command-based web table data. 
Specify the command in the `EXECUTE` clause of `CREATE EXTERNAL WEB             
    TABLE`. The data is current as of the time the command runs. The `EXECUTE` 
clause runs the shell command or script on the specified master, and/or segment 
host or hosts. The command or script must reside on the hosts corresponding to 
the host(s) defined in the `EXECUTE` clause.
+
+By default, the command is run on segment hosts when active segments have 
output rows to process. For example, if each segment host runs four primary 
segment instances that have output rows to process, the command runs four times 
per segment host. You can optionally limit the number of segment instances that 
execute the web table command. All segments included in the web table 
definition in the `ON` clause run the command in parallel.
+
+The command that you specify in the external table definition executes from 
the database and cannot access environment variables from `.bashrc` or 
`.profile`. Set environment variables in the `EXECUTE` clause. For example:
+
+``` sql
+=# CREATE EXTERNAL WEB TABLE output (output text)
+EXECUTE 'PATH=/home/gpadmin/programs; export PATH; myprogram.sh'
+    ON MASTER
+FORMAT 'TEXT';
+```
+
+Scripts must be executable by the `gpadmin` user and reside in the same 
location on the master or segment hosts.
+
+The following command defines a web table that runs a script. The script runs 
on five virtual segments selected by the resource manager at runtime.
+
+``` sql
+=# CREATE EXTERNAL WEB TABLE log_output
+(linenum int, message text)
+EXECUTE '/var/load_scripts/get_log_data.sh' ON 5
+FORMAT 'TEXT' (DELIMITER '|');
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-example-hawq-file-server-gpfdist.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-example-hawq-file-server-gpfdist.html.md.erb 
b/markdown/datamgmt/load/g-example-hawq-file-server-gpfdist.html.md.erb
new file mode 100644
index 0000000..a0bf669
--- /dev/null
+++ b/markdown/datamgmt/load/g-example-hawq-file-server-gpfdist.html.md.erb
@@ -0,0 +1,13 @@
+---
+title: Example - HAWQ file server (gpfdist)
+---
+
+``` sql
+=# CREATE WRITABLE EXTERNAL TABLE unload_expenses
+( LIKE expenses )
+LOCATION ('gpfdist://etlhost-1:8081/expenses1.out',
+'gpfdist://etlhost-2:8081/expenses2.out')
+FORMAT 'TEXT' (DELIMITER ',');
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-example-irs-mef-xml-files-in-demo-directory.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-example-irs-mef-xml-files-in-demo-directory.html.md.erb
 
b/markdown/datamgmt/load/g-example-irs-mef-xml-files-in-demo-directory.html.md.erb
new file mode 100644
index 0000000..6f5b9e3
--- /dev/null
+++ 
b/markdown/datamgmt/load/g-example-irs-mef-xml-files-in-demo-directory.html.md.erb
@@ -0,0 +1,54 @@
+---
+title: Example using IRS MeF XML Files (In demo Directory)
+---
+
+This example demonstrates loading a sample IRS Modernized eFile tax return 
using a Joost STX transformation. The data is in the form of a complex XML file.
+
+The U.S. Internal Revenue Service (IRS) made a significant commitment to XML 
and specifies its use in its Modernized e-File (MeF) system. In MeF, each tax 
return is an XML document with a deep hierarchical structure that closely 
reflects the particular form of the underlying tax code.
+
+XML, XML Schema and stylesheets play a role in their data representation and 
business workflow. The actual XML data is extracted from a ZIP file attached to 
a MIME "transmission file" message. For more information about MeF, see 
[Modernized e-File 
(Overview)](http://www.irs.gov/uac/Modernized-e-File-Overview) on the IRS web 
site.
+
+The sample XML document, *RET990EZ\_2006.xml*, is about 350KB in size with two 
elements:
+
+-   ReturnHeader
+-   ReturnData
+
+The &lt;ReturnHeader&gt; element contains general details about the tax return 
such as the taxpayer's name, the tax year of the return, and the preparer. The 
&lt;ReturnData&gt; element contains multiple sections with specific details 
about the tax return and associated schedules.
+
+The following is an abridged sample of the XML file.
+
+``` xml
+<?xml version="1.0" encoding="UTF-8"?> 
+<Return returnVersion="2006v2.0"
+   xmlns="http://www.irs.gov/efile"; 
+   xmlns:efile="http://www.irs.gov/efile";
+   xsi:schemaLocation="http://www.irs.gov/efile";
+   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";> 
+   <ReturnHeader binaryAttachmentCount="1">
+     <ReturnId>AAAAAAAAAAAAAAAAAAAA</ReturnId>
+     <Timestamp>1999-05-30T12:01:01+05:01</Timestamp>
+     <ReturnType>990EZ</ReturnType>
+     <TaxPeriodBeginDate>2005-01-01</TaxPeriodBeginDate>
+     <TaxPeriodEndDate>2005-12-31</TaxPeriodEndDate>
+     <Filer>
+       <EIN>011248772</EIN>
+       ... more data ...
+     </Filer>
+     <Preparer>
+       <Name>Percy Polar</Name>
+       ... more data ...
+     </Preparer>
+     <TaxYear>2005</TaxYear>
+   </ReturnHeader>
+   ... more data ..
+```
+
+The goal is to import all the data into a HAWQ database. First, convert the 
XML document into text with newlines "escaped", with two columns: `ReturnId` 
and a single column on the end for the entire MeF tax return. For example:
+
+``` pre
+AAAAAAAAAAAAAAAAAAAA|<Return returnVersion="2006v2.0"... 
+```
+
+Load the data into HAWQ.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-example-witsml-files-in-demo-directory.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-example-witsml-files-in-demo-directory.html.md.erb 
b/markdown/datamgmt/load/g-example-witsml-files-in-demo-directory.html.md.erb
new file mode 100644
index 0000000..0484523
--- /dev/null
+++ 
b/markdown/datamgmt/load/g-example-witsml-files-in-demo-directory.html.md.erb
@@ -0,0 +1,54 @@
+---
+title: Example using WITSML™ Files (In demo Directory)
+---
+
+This example demonstrates loading sample data describing an oil rig using a 
Joost STX transformation. The data is in the form of a complex XML file 
downloaded from energistics.org.
+
+The Wellsite Information Transfer Standard Markup Language (WITSML™) is an 
oil industry initiative to provide open, non-proprietary, standard interfaces 
for technology and software to share information among oil companies, service 
companies, drilling contractors, application vendors, and regulatory agencies. 
For more information about WITSML™, see 
[http://www.witsml.org](http://www.witsml.org).
+
+The oil rig information consists of a top level `<rigs>` element with multiple 
child elements such as `<documentInfo>,                             <rig>`, and 
so on. The following excerpt from the file shows the type of information in the 
`<rig>` tag.
+
+``` xml
+<?xml version="1.0" encoding="UTF-8"?>
+<?xml-stylesheet href="../stylesheets/rig.xsl" type="text/xsl" media="screen"?>
+<rigs 
+ xmlns="http://www.witsml.org/schemas/131"; 
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
+ xsi:schemaLocation="http://www.witsml.org/schemas/131 ../obj_rig.xsd" 
+ version="1.3.1.1">
+ <documentInfo>
+ ... misc data ...
+ </documentInfo>
+ <rig uidWell="W-12" uidWellbore="B-01" uid="xr31">
+     <nameWell>6507/7-A-42</nameWell>
+     <nameWellbore>A-42</nameWellbore>
+     <name>Deep Drill #5</name>
+     <owner>Deep Drilling Co.</owner>
+     <typeRig>floater</typeRig>
+     <manufacturer>Fitsui Engineering</manufacturer>
+     <yearEntService>1980</yearEntService>
+     <classRig>ABS Class A1 M CSDU AMS ACCU</classRig>
+     <approvals>DNV</approvals>
+ ... more data ...
+```
+
+The goal is to import the information for this rig into HAWQ.
+
+The sample document, *rig.xml*, is about 11KB in size. The input does not 
contain tabs so the relevant information can be converted into records 
delimited with a pipe (|).
+
+`W-12|6507/7-A-42|xr31|Deep Drill #5|Deep Drilling Co.|John                    
         Doe|[email protected]|`
+
+With the columns:
+
+-   `well_uid text`, -- e.g. W-12
+-   `well_name text`, -- e.g. 6507/7-A-42
+-   `rig_uid text`, -- e.g. xr31
+-   `rig_name text`, -- e.g. Deep Drill \#5
+-   `rig_owner text`, -- e.g. Deep Drilling Co.
+-   `rig_contact text`, -- e.g. John Doe
+-   `rig_email text`, -- e.g. [email protected]
+-   `doc xml`
+
+Then, load the data into HAWQ.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-examples-read-fixed-width-data.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-examples-read-fixed-width-data.html.md.erb 
b/markdown/datamgmt/load/g-examples-read-fixed-width-data.html.md.erb
new file mode 100644
index 0000000..174529a
--- /dev/null
+++ b/markdown/datamgmt/load/g-examples-read-fixed-width-data.html.md.erb
@@ -0,0 +1,37 @@
+---
+title: Examples - Read Fixed-Width Data
+---
+
+The following examples show how to read fixed-width data.
+
+## Example 1 – Loading a table with PRESERVED\_BLANKS on
+
+``` sql
+CREATE READABLE EXTERNAL TABLE students (
+  name varchar(20), address varchar(30), age int)
+LOCATION ('gpfdist://host:port/file/path/')
+FORMAT 'CUSTOM' (formatter=fixedwidth_in, name=20, address=30, age=4,
+        preserve_blanks='on',null='NULL');
+```
+
+## Example 2 – Loading data with no line delimiter
+
+``` sql
+CREATE READABLE EXTERNAL TABLE students (
+  name varchar(20), address varchar(30), age int)
+LOCATION ('gpfdist://host:port/file/path/')
+FORMAT 'CUSTOM' (formatter=fixedwidth_in, name='20', address='30', age='4', 
+        line_delim='?@');
+```
+
+## Example 3 – Create a writable external table with a \\r\\n line delimiter
+
+``` sql
+CREATE WRITABLE EXTERNAL TABLE students_out (
+  name varchar(20), address varchar(30), age int)
+LOCATION ('gpfdist://host:port/file/path/filename')     
+FORMAT 'CUSTOM' (formatter=fixedwidth_out, 
+   name=20, address=30, age=4, line_delim=E'\r\n');
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-external-tables.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-external-tables.html.md.erb 
b/markdown/datamgmt/load/g-external-tables.html.md.erb
new file mode 100644
index 0000000..4142a07
--- /dev/null
+++ b/markdown/datamgmt/load/g-external-tables.html.md.erb
@@ -0,0 +1,44 @@
+---
+title: Accessing File-Based External Tables
+---
+
+External tables enable accessing external files as if they are regular 
database tables. They are often used to move data into and out of a HAWQ 
database.
+
+To create an external table definition, you specify the format of your input 
files and the location of your external data sources. For information input 
file formats, see [Formatting Data Files](g-formatting-data-files.html#topic95).
+
+Use one of the following protocols to access external table data sources. You 
cannot mix protocols in `CREATE EXTERNAL TABLE` statements:
+
+-   `gpfdist://` points to a directory on the file host and serves external 
data files to all HAWQ segments in parallel. See [gpfdist 
Protocol](g-gpfdist-protocol.html#topic_sny_yph_kr).
+-   `gpfdists://` is the secure version of `gpfdist`. See [gpfdists 
Protocol](g-gpfdists-protocol.html#topic_sny_yph_kr).
+-   `pxf://` specifies data accessed through the HAWQ Extensions Framework 
(PXF). PXF is a service that uses plug-in Java classes to read and write data 
in external data sources. PXF includes plug-ins to access data in HDFS, HBase, 
and Hive. Custom plug-ins can be written to access other external data sources.
+
+External tables allow you to access external files from within the database as 
if they are regular database tables. Used with `gpfdist`, the HAWQ parallel 
file distribution program, or HAWQ Extensions Framework (PXF), external tables 
provide full parallelism by using the resources of all HAWQ segments to load or 
unload data.
+
+You can query external table data directly and in parallel using SQL commands 
such as `SELECT`, `JOIN`, or `SORT EXTERNAL TABLE             DATA`, and you 
can create views for external tables.
+
+The steps for using external tables are:
+
+1.  Define the external table.
+2.  Start the gpfdist file server(s) if you plan to use the `gpfdist` or 
`gpdists` protocols.
+3.  Place the data files in the correct locations.
+4.  Query the external table with SQL commands.
+
+HAWQ provides readable and writable external tables:
+
+-   Readable external tables for data loading. Readable external tables 
support basic extraction, transformation, and loading (ETL) tasks common in 
data warehousing. HAWQ segment instances read external table data in parallel 
to optimize large load operations. You cannot modify readable external tables.
+-   Writable external tables for data unloading. Writable external tables 
support:
+
+    -   Selecting data from database tables to insert into the writable 
external table.
+    -   Sending data to an application as a stream of data. For example, 
unload data from HAWQ and send it to an application that connects to another 
database or ETL tool to load the data elsewhere.
+    -   Receiving output from HAWQ parallel MapReduce calculations.
+
+    Writable external tables allow only `INSERT` operations.
+
+External tables can be file-based or web-based.
+
+-   Regular (file-based) external tables access static flat files. Regular 
external tables are rescannable: the data is static while the query runs.
+-   Web (web-based) external tables access dynamic data sources, either on a 
web server with the `http://` protocol or by executing OS commands or scripts. 
Web external tables are not rescannable: the data can change while the query 
runs.
+
+Dump and restore operate only on external and web external table 
*definitions*, not on the data sources.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-formatting-columns.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-formatting-columns.html.md.erb 
b/markdown/datamgmt/load/g-formatting-columns.html.md.erb
new file mode 100644
index 0000000..b828212
--- /dev/null
+++ b/markdown/datamgmt/load/g-formatting-columns.html.md.erb
@@ -0,0 +1,19 @@
+---
+title: Formatting Columns
+---
+
+The default column or field delimiter is the horizontal `TAB` character 
(`0x09`) for text files and the comma character (`0x2C`) for CSV files. You can 
declare a single character delimiter using the `DELIMITER` clause of `COPY`, 
`CREATE                 EXTERNAL TABLE` or the `hawq load` configuration table 
when you define your data format. The delimiter character must appear between 
any two data value fields. Do not place a delimiter at the beginning or end of 
a row. For example, if the pipe character ( | ) is your delimiter:
+
+``` pre
+data value 1|data value 2|data value 3
+```
+
+The following command shows the use of the pipe character as a column 
delimiter:
+
+``` sql
+=# CREATE EXTERNAL TABLE ext_table (name text, date date)
+LOCATION ('gpfdist://host:port/filename.txt)
+FORMAT 'TEXT' (DELIMITER '|');
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-formatting-data-files.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-formatting-data-files.html.md.erb 
b/markdown/datamgmt/load/g-formatting-data-files.html.md.erb
new file mode 100644
index 0000000..6c929ad
--- /dev/null
+++ b/markdown/datamgmt/load/g-formatting-data-files.html.md.erb
@@ -0,0 +1,17 @@
+---
+title: Formatting Data Files
+---
+
+When you use the HAWQ tools for loading and unloading data, you must specify 
how your data is formatted. `COPY`, `CREATE             EXTERNAL TABLE, `and 
`hawq load` have clauses that allow you to specify how your data is formatted. 
Data can be delimited text (`TEXT`) or comma separated values (`CSV`) format. 
External data must be formatted correctly to be read by HAWQ. This topic 
explains the format of data files expected by HAWQ.
+
+-   **[Formatting Rows](../../datamgmt/load/g-formatting-rows.html)**
+
+-   **[Formatting Columns](../../datamgmt/load/g-formatting-columns.html)**
+
+-   **[Representing NULL 
Values](../../datamgmt/load/g-representing-null-values.html)**
+
+-   **[Escaping](../../datamgmt/load/g-escaping.html)**
+
+-   **[Character Encoding](../../datamgmt/load/g-character-encoding.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-formatting-rows.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-formatting-rows.html.md.erb 
b/markdown/datamgmt/load/g-formatting-rows.html.md.erb
new file mode 100644
index 0000000..ea9b416
--- /dev/null
+++ b/markdown/datamgmt/load/g-formatting-rows.html.md.erb
@@ -0,0 +1,7 @@
+---
+title: Formatting Rows
+---
+
+HAWQ expects rows of data to be separated by the `LF` character (Line feed, 
`0x0A`), `CR` (Carriage return, `0x0D`), or `CR` followed by `LF` (`CR+LF`, 
`0x0D 0x0A`). `LF` is the standard newline representation on UNIX or UNIX-like 
operating systems. Operating systems such as Windows or Mac OS X use `CR` or 
`CR+LF`. All of these representations of a newline are supported by HAWQ as a 
row delimiter. For more information, see [Importing and Exporting Fixed Width 
Data](g-importing-and-exporting-fixed-width-data.html#topic37).
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-gpfdist-protocol.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-gpfdist-protocol.html.md.erb 
b/markdown/datamgmt/load/g-gpfdist-protocol.html.md.erb
new file mode 100644
index 0000000..ee98609
--- /dev/null
+++ b/markdown/datamgmt/load/g-gpfdist-protocol.html.md.erb
@@ -0,0 +1,15 @@
+---
+title: gpfdist Protocol
+---
+
+The `gpfdist://` protocol is used in a URI to reference a running `gpfdist` 
instance. The `gpfdist` utility serves external data files from a directory on 
a file host to all HAWQ segments in parallel.
+
+`gpfdist` is located in the `$GPHOME/bin` directory on your HAWQ master host 
and on each segment host.
+
+Run `gpfdist` on the host where the external data files reside. `gpfdist` 
uncompresses `gzip` (`.gz`) and `bzip2` (.`bz2`) files automatically. You can 
use the wildcard character (\*) or other C-style pattern matching to denote 
multiple files to read. The files specified are assumed to be relative to the 
directory that you specified when you started the `gpfdist` instance.
+
+All virtual segments access the external file(s) in parallel, subject to the 
number of segments set in the `gp_external_max_segments` parameter, the length 
of the `gpfdist` location list, and the limits specified by the 
`hawq_rm_nvseg_perquery_limit` and `hawq_rm_nvseg_perquery_perseg_limit` 
parameters. Use multiple `gpfdist` data sources in a `CREATE EXTERNAL TABLE` 
statement to scale the external table's scan performance. For more information 
about configuring `gpfdist`, see [Using the HAWQ File Server 
(gpfdist)](g-using-the-hawq-file-server--gpfdist-.html#topic13).
+
+See the `gpfdist` reference documentation for more information about using 
`gpfdist` with external tables.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-gpfdists-protocol.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-gpfdists-protocol.html.md.erb 
b/markdown/datamgmt/load/g-gpfdists-protocol.html.md.erb
new file mode 100644
index 0000000..2f5641d
--- /dev/null
+++ b/markdown/datamgmt/load/g-gpfdists-protocol.html.md.erb
@@ -0,0 +1,37 @@
+---
+title: gpfdists Protocol
+---
+
+The `gpfdists://` protocol is a secure version of the `gpfdist://         
protocol`. To use it, you run the `gpfdist` utility with the `--ssl` option. 
When specified in a URI, the `gpfdists://` protocol enables encrypted 
communication and secure identification of the file server and the HAWQ to 
protect against attacks such as eavesdropping and man-in-the-middle attacks.
+
+`gpfdists` implements SSL security in a client/server scheme with the 
following attributes and limitations:
+
+-   Client certificates are required.
+-   Multilingual certificates are not supported.
+-   A Certificate Revocation List (CRL) is not supported.
+-   The `TLSv1` protocol is used with the `TLS_RSA_WITH_AES_128_CBC_SHA` 
encryption algorithm.
+-   SSL parameters cannot be changed.
+-   SSL renegotiation is supported.
+-   The SSL ignore host mismatch parameter is set to `false`.
+-   Private keys containing a passphrase are not supported for the `gpfdist` 
file server (server.key) and for the HAWQ (client.key).
+-   Issuing certificates that are appropriate for the operating system in use 
is the user's responsibility. Generally, converting certificates as shown in 
[https://www.sslshopper.com/ssl-converter.html](https://www.sslshopper.com/ssl-converter.html)
 is supported.
+
+    **Note:** A server started with the `gpfdist --ssl` option can only 
communicate with the `gpfdists` protocol. A server that was started with 
`gpfdist` without the `--ssl` option can only communicate with the `gpfdist` 
protocol.
+
+-   The client certificate file, client.crt
+-   The client private key file, client.key
+
+Use one of the following methods to invoke the `gpfdists` protocol.
+
+-   Run `gpfdist` with the `--ssl` option and then use the `gpfdists` protocol 
in the `LOCATION` clause of a `CREATE EXTERNAL TABLE` statement.
+-   Use a `hawq load` YAML control file with the `SSL` option set to true.
+
+Using `gpfdists` requires that the following client certificates reside in the 
`$PGDATA/gpfdists` directory on each segment.
+
+-   The client certificate file, `client.crt`
+-   The client private key file, `client.key`
+-   The trusted certificate authorities, `root.crt`
+
+For an example of loading data into an external table security, see [Example 3 
- Multiple gpfdists instances](creating-external-tables-examples.html#topic47).
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-handling-errors-ext-table-data.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-handling-errors-ext-table-data.html.md.erb 
b/markdown/datamgmt/load/g-handling-errors-ext-table-data.html.md.erb
new file mode 100644
index 0000000..2b8dc78
--- /dev/null
+++ b/markdown/datamgmt/load/g-handling-errors-ext-table-data.html.md.erb
@@ -0,0 +1,9 @@
+---
+title: Handling Errors in External Table Data
+---
+
+By default, if external table data contains an error, the command fails and no 
data loads into the target database table. Define the external table with 
single row error handling to enable loading correctly formatted rows and to 
isolate data errors in external table data. See [Handling Load 
Errors](g-handling-load-errors.html#topic55).
+
+The `gpfdist` file server uses the `HTTP` protocol. External table queries 
that use `LIMIT` end the connection after retrieving the rows, causing an HTTP 
socket error. If you use `LIMIT` in queries of external tables that use the 
`gpfdist://` or `http:// protocols`, ignore these errors – data is returned 
to the database as expected.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-handling-load-errors.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-handling-load-errors.html.md.erb 
b/markdown/datamgmt/load/g-handling-load-errors.html.md.erb
new file mode 100644
index 0000000..6faf7a5
--- /dev/null
+++ b/markdown/datamgmt/load/g-handling-load-errors.html.md.erb
@@ -0,0 +1,28 @@
+---
+title: Handling Load Errors
+---
+
+Readable external tables are most commonly used to select data to load into 
regular database tables. You use the `CREATE TABLE AS SELECT` or `INSERT        
         INTO `commands to query the external table data. By default, if the 
data contains an error, the entire command fails and the data is not loaded 
into the target database table.
+
+The `SEGMENT REJECT LIMIT` clause allows you to isolate format errors in 
external table data and to continue loading correctly formatted rows. Use 
`SEGMENT REJECT LIMIT `to set an error threshold, specifying the reject limit 
`count` as number of `ROWS` (the default) or as a `PERCENT` of total rows 
(1-100).
+
+The entire external table operation is aborted, and no rows are processed, if 
the number of error rows reaches the `SEGMENT REJECT LIMIT`. The limit of error 
rows is per-segment, not per entire operation. The operation processes all good 
rows, and it discards and optionally logs formatting errors for erroneous rows, 
if the number of error rows does not reach the `SEGMENT REJECT                 
LIMIT`.
+
+The `LOG ERRORS` clause allows you to keep error rows for further examination. 
For information about the `LOG ERRORS` clause, see the `CREATE EXTERNAL TABLE` 
command.
+
+When you set `SEGMENT REJECT LIMIT`, HAWQ scans the external data in single 
row error isolation mode. Single row error isolation mode applies to external 
data rows with format errors such as extra or missing attributes, attributes of 
a wrong data type, or invalid client encoding sequences. HAWQ does not check 
constraint errors, but you can filter constraint errors by limiting the 
`SELECT` from an external table at runtime. For example, to eliminate duplicate 
key errors:
+
+``` sql
+=# INSERT INTO table_with_pkeys 
+SELECT DISTINCT * FROM external_table;
+```
+
+-   **[Define an External Table with Single Row Error 
Isolation](../../datamgmt/load/g-define-an-external-table-with-single-row-error-isolation.html)**
+
+-   **[Capture Row Formatting Errors and Declare a Reject 
Limit](../../datamgmt/load/g-create-an-error-table-and-declare-a-reject-limit.html)**
+
+-   **[Identifying Invalid CSV Files in Error Table 
Data](../../datamgmt/load/g-identifying-invalid-csv-files-in-error-table-data.html)**
+
+-   **[Moving Data between 
Tables](../../datamgmt/load/g-moving-data-between-tables.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-identifying-invalid-csv-files-in-error-table-data.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-identifying-invalid-csv-files-in-error-table-data.html.md.erb
 
b/markdown/datamgmt/load/g-identifying-invalid-csv-files-in-error-table-data.html.md.erb
new file mode 100644
index 0000000..534d530
--- /dev/null
+++ 
b/markdown/datamgmt/load/g-identifying-invalid-csv-files-in-error-table-data.html.md.erb
@@ -0,0 +1,7 @@
+---
+title: Identifying Invalid CSV Files in Error Table Data
+---
+
+If a CSV file contains invalid formatting, the *rawdata* field in the error 
table can contain several combined rows. For example, if a closing quote for a 
specific field is missing, all the following newlines are treated as embedded 
newlines. When this happens, HAWQ stops parsing a row when it reaches 64K, puts 
that 64K of data into the error table as a single row, resets the quote flag, 
and continues. If this happens three times during load processing, the load 
file is considered invalid and the entire load fails with the message 
"`rejected ` `N` ` or more rows`". See [Escaping in CSV Formatted 
Files](g-escaping-in-csv-formatted-files.html#topic101) for more information on 
the correct use of quotes in CSV files.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-importing-and-exporting-fixed-width-data.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-importing-and-exporting-fixed-width-data.html.md.erb 
b/markdown/datamgmt/load/g-importing-and-exporting-fixed-width-data.html.md.erb
new file mode 100644
index 0000000..f49cae0
--- /dev/null
+++ 
b/markdown/datamgmt/load/g-importing-and-exporting-fixed-width-data.html.md.erb
@@ -0,0 +1,38 @@
+---
+title: Importing and Exporting Fixed Width Data
+---
+
+Specify custom formats for fixed-width data with the HAWQ functions 
`fixedwith_in` and `fixedwidth_out`. These functions already exist in the file 
`$GPHOME/share/postgresql/cdb_external_extensions.sql`. The following example 
declares a custom format, then calls the `fixedwidth_in` function to format the 
data.
+
+``` sql
+CREATE READABLE EXTERNAL TABLE students (
+  name varchar(20), address varchar(30), age int)
+LOCATION ('gpfdist://mdw:8081/students.txt')
+FORMAT 'CUSTOM' (formatter=fixedwidth_in, name='20', address='30', age='4');
+```
+
+The following options specify how to import fixed width data.
+
+-   Read all the data.
+
+    To load all the fields on a line of fixed with data, you must load them in 
their physical order. You must specify the field length, but cannot specify a 
starting and ending position. The fields names in the fixed width arguments 
must match the order in the field list at the beginning of the `CREATE TABLE` 
command.
+
+-   Set options for blank and null characters.
+
+    Trailing blanks are trimmed by default. To keep trailing blanks, use the 
`preserve_blanks=on` option.You can reset the trailing blanks option to the 
default with the `preserve_blanks=off` option.
+
+    Use the null=`'null_string_value'` option to specify a value for null 
characters.
+
+-   If you specify `preserve_blanks=on`, you must also define a value for null 
characters.
+-   If you specify `preserve_blanks=off`, null is not defined, and the field 
contains only blanks, HAWQ writes a null to the table. If null is defined, HAWQ 
writes an empty string to the table.
+
+    Use the `line_delim='line_ending'` parameter to specify the line ending 
character. The following examples cover most cases. The `E` specifies an escape 
string constant.
+
+    ``` pre
+    line_delim=E'\n'
+    line_delim=E'\r'
+    line_delim=E'\r\n'
+    line_delim='abc'
+    ```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-installing-gpfdist.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-installing-gpfdist.html.md.erb 
b/markdown/datamgmt/load/g-installing-gpfdist.html.md.erb
new file mode 100644
index 0000000..85549df
--- /dev/null
+++ b/markdown/datamgmt/load/g-installing-gpfdist.html.md.erb
@@ -0,0 +1,7 @@
+---
+title: Installing gpfdist
+---
+
+You may choose to run `gpfdist` from a machine other than the HAWQ master, 
such as on a machine devoted to ETL processing. To install `gpfdist` on your 
ETL server, refer to [Client-Based HAWQ Load Tools](client-loadtools.html) for 
information related to Linux and Windows load tools installation and 
configuration.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-load-the-data.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-load-the-data.html.md.erb 
b/markdown/datamgmt/load/g-load-the-data.html.md.erb
new file mode 100644
index 0000000..4c88c9f
--- /dev/null
+++ b/markdown/datamgmt/load/g-load-the-data.html.md.erb
@@ -0,0 +1,17 @@
+---
+title: Load the Data
+---
+
+Create the tables with SQL statements based on the appropriate schema.
+
+There are no special requirements for the HAWQ tables that hold loaded data. 
In the prices example, the following command creates the appropriate table.
+
+``` sql
+CREATE TABLE prices (
+  itemnumber integer,       
+  price       decimal        
+) 
+DISTRIBUTED BY (itemnumber);
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-loading-and-unloading-data.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-loading-and-unloading-data.html.md.erb 
b/markdown/datamgmt/load/g-loading-and-unloading-data.html.md.erb
new file mode 100644
index 0000000..8ea43d5
--- /dev/null
+++ b/markdown/datamgmt/load/g-loading-and-unloading-data.html.md.erb
@@ -0,0 +1,55 @@
+---
+title: Loading and Unloading Data
+---
+
+The topics in this section describe methods for loading and writing data into 
and out of HAWQ, and how to format data files. It also covers registering HDFS 
files and folders directly into HAWQ internal tables.
+
+HAWQ supports high-performance parallel data loading and unloading, and for 
smaller amounts of data, single file, non-parallel data import and export.
+
+HAWQ can read from and write to several types of external data sources, 
including text files, Hadoop file systems, and web servers.
+
+-   The `COPY` SQL command transfers data between an external text file on the 
master host and a HAWQ database table.
+-   External tables allow you to query data outside of the database directly 
and in parallel using SQL commands such as `SELECT`, `JOIN`, or `SORT           
EXTERNAL TABLE DATA`, and you can create views for external tables. External 
tables are often used to load external data into a regular database table using 
a command such as `CREATE TABLE table AS SELECT * FROM ext_table`.
+-   External web tables provide access to dynamic data. They can be backed 
with data from URLs accessed using the HTTP protocol or by the output of an OS 
script running on one or more segments.
+-   The `gpfdist` utility is the HAWQ parallel file distribution program. It 
is an HTTP server that is used with external tables to allow HAWQ segments to 
load external data in parallel, from multiple file systems. You can run 
multiple instances of `gpfdist` on different hosts and network interfaces and 
access them in parallel.
+-   The `hawq load` utility automates the steps of a load task using a 
YAML-formatted control file.
+
+The method you choose to load data depends on the characteristics of the 
source data—its location, size, format, and any transformations required.
+
+In the simplest case, the `COPY` SQL command loads data into a table from a 
text file that is accessible to the HAWQ master instance. This requires no 
setup and provides good performance for smaller amounts of data. With the 
`COPY` command, the data copied into or out of the database passes between a 
single file on the master host and the database. This limits the total size of 
the dataset to the capacity of the file system where the external file resides 
and limits the data transfer to a single file write stream.
+
+More efficient data loading options for large datasets take advantage of the 
HAWQ MPP architecture, using the HAWQ segments to load data in parallel. These 
methods allow data to load simultaneously from multiple file systems, through 
multiple NICs, on multiple hosts, achieving very high data transfer rates. 
External tables allow you to access external files from within the database as 
if they are regular database tables. When used with `gpfdist`, the HAWQ 
parallel file distribution program, external tables provide full parallelism by 
using the resources of all HAWQ segments to load or unload data.
+
+HAWQ leverages the parallel architecture of the Hadoop Distributed File System 
to access files on that system.
+
+-   **[Working with File-Based External 
Tables](../../datamgmt/load/g-working-with-file-based-ext-tables.html)**
+
+-   **[Using the HAWQ File Server 
(gpfdist)](../../datamgmt/load/g-using-the-hawq-file-server--gpfdist-.html)**
+
+-   **[Creating and Using Web External 
Tables](../../datamgmt/load/g-creating-and-using-web-external-tables.html)**
+
+-   **[Loading Data Using an External 
Table](../../datamgmt/load/g-loading-data-using-an-external-table.html)**
+
+-   **[Registering Files into HAWQ Internal 
Tables](../../datamgmt/load/g-register_files.html)**
+
+-   **[Loading and Writing Non-HDFS Custom 
Data](../../datamgmt/load/g-loading-and-writing-non-hdfs-custom-data.html)**
+
+-   **[Creating External Tables - 
Examples](../../datamgmt/load/creating-external-tables-examples.html#topic44)**
+
+-   **[Handling Load Errors](../../datamgmt/load/g-handling-load-errors.html)**
+
+-   **[Loading Data with hawq 
load](../../datamgmt/load/g-loading-data-with-hawqload.html)**
+
+-   **[Loading Data with 
COPY](../../datamgmt/load/g-loading-data-with-copy.html)**
+
+-   **[Running COPY in Single Row Error Isolation 
Mode](../../datamgmt/load/g-running-copy-in-single-row-error-isolation-mode.html)**
+
+-   **[Optimizing Data Load and Query 
Performance](../../datamgmt/load/g-optimizing-data-load-and-query-performance.html)**
+
+-   **[Unloading Data from 
HAWQ](../../datamgmt/load/g-unloading-data-from-hawq-database.html)**
+
+-   **[Transforming XML 
Data](../../datamgmt/load/g-transforming-xml-data.html)**
+
+-   **[Formatting Data 
Files](../../datamgmt/load/g-formatting-data-files.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-loading-and-writing-non-hdfs-custom-data.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-loading-and-writing-non-hdfs-custom-data.html.md.erb 
b/markdown/datamgmt/load/g-loading-and-writing-non-hdfs-custom-data.html.md.erb
new file mode 100644
index 0000000..e826963
--- /dev/null
+++ 
b/markdown/datamgmt/load/g-loading-and-writing-non-hdfs-custom-data.html.md.erb
@@ -0,0 +1,9 @@
+---
+title: Loading and Writing Non-HDFS Custom Data
+---
+
+HAWQ supports `TEXT` and `CSV` formats for importing and exporting data. You 
can load and write the data in other formats by defining and using a custom 
format or custom protocol.
+
+-   **[Using a Custom 
Format](../../datamgmt/load/g-using-a-custom-format.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-loading-data-using-an-external-table.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-loading-data-using-an-external-table.html.md.erb 
b/markdown/datamgmt/load/g-loading-data-using-an-external-table.html.md.erb
new file mode 100644
index 0000000..32a741a
--- /dev/null
+++ b/markdown/datamgmt/load/g-loading-data-using-an-external-table.html.md.erb
@@ -0,0 +1,18 @@
+---
+title: Loading Data Using an External Table
+---
+
+Use SQL commands such as `INSERT` and `SELECT` to query a readable external 
table, the same way that you query a regular database table. For example, to 
load travel expense data from an external table, `ext_expenses`, into a 
database table,` expenses_travel`:
+
+``` sql
+=# INSERT INTO expenses_travel 
+SELECT * FROM ext_expenses WHERE category='travel';
+```
+
+To load all data into a new database table:
+
+``` sql
+=# CREATE TABLE expenses AS SELECT * FROM ext_expenses;
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-loading-data-with-copy.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-loading-data-with-copy.html.md.erb 
b/markdown/datamgmt/load/g-loading-data-with-copy.html.md.erb
new file mode 100644
index 0000000..72e5ac6
--- /dev/null
+++ b/markdown/datamgmt/load/g-loading-data-with-copy.html.md.erb
@@ -0,0 +1,11 @@
+---
+title: Loading Data with COPY
+---
+
+`COPY FROM` copies data from a file or standard input into a table and appends 
the data to the table contents. `COPY` is non-parallel: data is loaded in a 
single process using the HAWQ master instance. Using `COPY` is only recommended 
for very small data files.
+
+The `COPY` source file must be accessible to the master host. Specify the 
`COPY` source file name relative to the master host location.
+
+HAWQ copies data from `STDIN` or `STDOUT` using the connection between the 
client and the master server.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-loading-data-with-hawqload.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-loading-data-with-hawqload.html.md.erb 
b/markdown/datamgmt/load/g-loading-data-with-hawqload.html.md.erb
new file mode 100644
index 0000000..68e4459
--- /dev/null
+++ b/markdown/datamgmt/load/g-loading-data-with-hawqload.html.md.erb
@@ -0,0 +1,56 @@
+---
+title: Loading Data with hawq load
+---
+
+The HAWQ `hawq load` utility loads data using readable external tables and the 
HAWQ parallel file server ( `gpfdist` or `gpfdists`). It handles parallel 
file-based external table setup and allows users to configure their data 
format, external table definition, and `gpfdist` or `gpfdists` setup in a 
single configuration file.
+
+## <a id="topic62__du168147"></a>To use hawq load
+
+1.  Ensure that your environment is set up to run `hawq                        
 load`. Some dependent files from your HAWQ /&gt; installation are required, 
such as `gpfdist` and Python, as well as network access to the HAWQ segment 
hosts.
+2.  Create your load control file. This is a YAML-formatted file that 
specifies the HAWQ connection information, `gpfdist` configuration information, 
external table options, and data format.
+
+    For example:
+
+    ``` pre
+    ---
+    VERSION: 1.0.0.1
+    DATABASE: ops
+    USER: gpadmin
+    HOST: mdw-1
+    PORT: 5432
+    GPLOAD:
+       INPUT:
+        - SOURCE:
+             LOCAL_HOSTNAME:
+               - etl1-1
+               - etl1-2
+               - etl1-3
+               - etl1-4
+             PORT: 8081
+             FILE: 
+               - /var/load/data/*
+        - COLUMNS:
+               - name: text
+               - amount: float4
+               - category: text
+               - description: text
+               - date: date
+        - FORMAT: text
+        - DELIMITER: '|'
+        - ERROR_LIMIT: 25
+        - ERROR_TABLE: payables.err_expenses
+       OUTPUT:
+        - TABLE: payables.expenses
+        - MODE: INSERT
+    SQL:
+       - BEFORE: "INSERT INTO audit VALUES('start', current_timestamp)"
+       - AFTER: "INSERT INTO audit VALUES('end', current_timestamp)"
+    ```
+
+3.  Run `hawq load`, passing in the load control file. For example:
+
+    ``` shell
+    $ hawq load -f my_load.yml
+    ```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-moving-data-between-tables.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/datamgmt/load/g-moving-data-between-tables.html.md.erb 
b/markdown/datamgmt/load/g-moving-data-between-tables.html.md.erb
new file mode 100644
index 0000000..2603ae4
--- /dev/null
+++ b/markdown/datamgmt/load/g-moving-data-between-tables.html.md.erb
@@ -0,0 +1,12 @@
+---
+title: Moving Data between Tables
+---
+
+You can use `CREATE TABLE AS` or `INSERT...SELECT` to load external and web 
external table data into another (non-external) database table, and the data 
will be loaded in parallel according to the external or web external table 
definition.
+
+If an external table file or web external table data source has an error, one 
of the following will happen, depending on the isolation mode used:
+
+-   **Tables without error isolation mode**: any operation that reads from 
that table fails. Loading from external and web external tables without error 
isolation mode is an all or nothing operation.
+-   **Tables with error isolation mode**: the entire file will be loaded, 
except for the problematic rows (subject to the configured REJECT\_LIMIT)
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/datamgmt/load/g-optimizing-data-load-and-query-performance.html.md.erb
----------------------------------------------------------------------
diff --git 
a/markdown/datamgmt/load/g-optimizing-data-load-and-query-performance.html.md.erb
 
b/markdown/datamgmt/load/g-optimizing-data-load-and-query-performance.html.md.erb
new file mode 100644
index 0000000..ff1c230
--- /dev/null
+++ 
b/markdown/datamgmt/load/g-optimizing-data-load-and-query-performance.html.md.erb
@@ -0,0 +1,10 @@
+---
+title: Optimizing Data Load and Query Performance
+---
+
+Use the following tip to help optimize your data load and subsequent query 
performance.
+
+-   Run `ANALYZE` after loading data. If you significantly altered the data in 
a table, run `ANALYZE` or `VACUUM                     ANALYZE` (system catalog 
tables only) to update table statistics for the query optimizer. Current 
statistics ensure that the optimizer makes the best decisions during query 
planning and avoids poor performance due to inaccurate or nonexistent 
statistics.
+
+
+

Reply via email to