[21/36] incubator-hawq-docs git commit: moving book configuration to new 'book' branch, for HAWQ-1027

yozie Mon, 29 Aug 2016 09:47:45 -0700

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/overview/TableDistributionStorage.html.md.erb
----------------------------------------------------------------------
diff --git a/overview/TableDistributionStorage.html.md.erb 
b/overview/TableDistributionStorage.html.md.erb
new file mode 100755
index 0000000..8bf6542
--- /dev/null
+++ b/overview/TableDistributionStorage.html.md.erb
@@ -0,0 +1,41 @@
+---
+title: Table Distribution and Storage
+---
+
+HAWQ stores all table data, except the system table, in HDFS. When a user 
creates a table, the metadata is stored on the master's local file system and 
the table content is stored in HDFS.
+
+In order to simplify table data management, all the data of one relation are 
saved under one HDFS folder.
+
+For all HAWQ table storage formats, AO \(Append-Only\) and Parquet, the data 
files are splittable, so that HAWQ can assign multiple virtual segments to 
consume one data file concurrently. This increases the degree of query 
parallelism.
+
+## Table Distribution Policy
+
+The default table distribution policy in HAWQ is random.
+
+Randomly distributed tables have some benefits over hash distributed tables. 
For example, after cluster expansion, HAWQ can use more resources automatically 
without redistributing the data. For huge tables, redistribution is very 
expensive, and data locality for randomly distributed tables is better after 
the underlying HDFS redistributes its data during rebalance or data node 
failures. This is quite common when the cluster is large.
+
+On the other hand, for some queries, hash distributed tables are faster than 
randomly distributed tables. For example, hash distributed tables have some 
performance benefits for some TPC-H queries. You should choose the distribution 
policy that is best suited for your application's scenario.
+
+See [Choosing the Table Distribution Policy](/20/ddl/ddl-table.html) for more 
details.
+
+## Data Locality
+
+Data is distributed across HDFS DataNodes. Since remote read involves network 
I/O, a data locality algorithm improves the local read ratio. HAWQ considers 
three aspects when allocating data blocks to virtual segments:
+
+-   Ratio of local read
+-   Continuity of file read
+-   Data balance among virtual segments
+
+## External Data Access
+
+HAWQ can access data in external files using the HAWQ Extension Framework 
(PXF).
+PXF is an extensible framework that allows HAWQ to access data in external
+sources as readable or writable HAWQ tables. PXF has built-in connectors for
+accessing data inside HDFS files, Hive tables, and HBase tables. PXF also
+integrates with HCatalog to query Hive tables directly. See [Working with PXF
+and External Data](/20/pxf/HawqExtensionFrameworkPXF.html) for more
+details.
+
+Users can create custom PXF connectors to access other parallel data stores or
+processing engines. Connectors are Java plug-ins that use the PXF API. For more
+information see [PXF External Tables and 
API](/20/pxf/PXFExternalTableandAPIReference.html).


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/overview/system-overview.html.md.erb
----------------------------------------------------------------------
diff --git a/overview/system-overview.html.md.erb 
b/overview/system-overview.html.md.erb
new file mode 100644
index 0000000..9fc1c53
--- /dev/null
+++ b/overview/system-overview.html.md.erb
@@ -0,0 +1,11 @@
+---
+title: Apache HAWQ (Incubating) System Overview
+---
+* <a href="./HAWQOverview.html" class="subnav">What is HAWQ?</a>
+* <a href="./HAWQArchitecture.html" class="subnav">HAWQ Architecture</a>
+* <a href="./TableDistributionStorage.html" class="subnav">Table Distribution 
and Storage</a>
+* <a href="./ElasticSegments.html" class="subnav">Elastic Virtual Segment 
Allocation</a>
+* <a href="./ResourceManagement.html" class="subnav">Resource Management</a>
+* <a href="./HDFSCatalogCache.html" class="subnav">HDFS Catalog Cache</a>
+* <a href="./ManagementTools.html" class="subnav">Management Tools</a>
+* <a href="./RedundancyFailover.html" class="subnav">Redundancy and Fault 
Tolerance</a>

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/plext/UsingProceduralLanguages.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/UsingProceduralLanguages.html.md.erb 
b/plext/UsingProceduralLanguages.html.md.erb
new file mode 100644
index 0000000..3ffba2c
--- /dev/null
+++ b/plext/UsingProceduralLanguages.html.md.erb
@@ -0,0 +1,20 @@
+---
+title: Using Procedural Languages and Extensions in HAWQ
+---
+
+HAWQ allows user-defined functions to be written in other languages besides 
SQL and C. These other languages are generically called *procedural languages* 
(PLs).
+
+For a function written in a procedural language, the database server has no 
built-in knowledge about how to interpret the function's source text. Instead, 
the task is passed to a special handler that knows the details of the language. 
The handler could either do all the work of parsing, syntax analysis, 
execution, and so on itself, or it could serve as "glue" between HAWQ and an 
existing implementation of a programming language. The handler itself is a C 
language function compiled into a shared object and loaded on demand, just like 
any other C function.
+
+This chapter describes the following:
+
+-   <a href="using_pljava.html">Using PL/Java</a>
+-   <a href="using_plperl.html">Using PL/Perl</a>
+-   <a href="using_plpgsql.html">Using PL/pgSQL</a>
+-   <a href="using_plpython.html">Using PL/Python</a>
+-   <a href="using_plr.html">Using PL/R</a>
+-   <a href="using_pgcrypto.html">Using pgcrypto</a>
+
+
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/plext/using_pgcrypto.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/using_pgcrypto.html.md.erb b/plext/using_pgcrypto.html.md.erb
new file mode 100644
index 0000000..e3e9225
--- /dev/null
+++ b/plext/using_pgcrypto.html.md.erb
@@ -0,0 +1,32 @@
+---
+title: Enabling Cryptographic Functions for PostgreSQL (pgcrypto)
+---
+
+`pgcrypto` is a package extension included in your HAWQ distribution. You must 
explicitly enable the cryptographic functions to use this extension.
+
+## <a id="pgcryptoprereq"></a>Prerequisites 
+
+
+Before you enable the `pgcrypto` software package, make sure that your HAWQ 
database is running, you have sourced `greenplum_path.sh`, and that the 
`$GPHOME` environment variable is set.
+
+## <a id="enablepgcrypto"></a>Enable pgcrypto 
+
+On every database in which you want to enable `pgcrypto`, run the following 
command:
+
+``` shell
+$ psql -d <dbname> -f $GPHOME/share/postgresql/contrib/pgcrypto.sql
+```
+       
+Replace \<dbname\> with the name of the target database.
+       
+## <a id="uninstallpgcrypto"></a>Disable pgcrypto 
+
+The `uninstall_pgcrypto.sql` script removes `pgcrypto` objects from your 
database.  On each database in which you enabled `pgcrypto` support, execute 
the following:
+
+``` shell
+$ psql -d <dbname> -f $GPHOME/share/postgresql/contrib/uninstall_pgcrypto.sql
+```
+
+Replace \<dbname\> with the name of the target database.
+       
+**Note:**  This script does not remove dependent user-created objects.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/plext/using_pljava.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/using_pljava.html.md.erb b/plext/using_pljava.html.md.erb
new file mode 100644
index 0000000..3cce857
--- /dev/null
+++ b/plext/using_pljava.html.md.erb
@@ -0,0 +1,666 @@
+---
+title: Using PL/Java
+---
+
+This section contains an overview of the HAWQ PL/Java language. 
+
+
+## <a id="aboutpljava"></a>About PL/Java 
+
+With the HAWQ PL/Java extension, you can write Java methods using your 
favorite Java IDE and install the JAR files that implement the methods in your 
HAWQ cluster.
+
+**Note**: If building HAWQ from source, you must specify PL/Java as a build 
option when compiling HAWQ. To use PL/Java in a HAWQ deployment, you must 
explicitly enable the PL/Java extension in all desired databases.  
+
+The HAWQ PL/Java package is based on the open source PL/Java 1.4.0. HAWQ 
PL/Java provides the following features.
+
+- Ability to execute PL/Java functions with Java 1.6 or 1.7.
+- Standardized utilities (modeled after the SQL 2003 proposal) to install and 
maintain Java code in the database.
+- Standardized mappings of parameters and result. Complex types as well as 
sets are supported.
+- An embedded, high performance, JDBC driver utilizing the internal HAWQ 
Database SPI routines.
+- Metadata support for the JDBC driver. Both `DatabaseMetaData` and 
`ResultSetMetaData` are included.
+- The ability to return a `ResultSet` from a query as an alternative to 
building a ResultSet row by row.
+- Full support for savepoints and exception handling.
+- The ability to use IN, INOUT, and OUT parameters.
+- Two separate HAWQ languages:
+       - pljava, TRUSTED PL/Java language
+       - pljavau, UNTRUSTED PL/Java language
+- Transaction and Savepoint listeners enabling code execution when a 
transaction or savepoint is committed or rolled back.
+- Integration with GNU GCJ on selected platforms.
+
+A function in SQL will appoint a static method in a Java class. In order for 
the function to execute, the appointed class must available on the class path 
specified by the HAWQ server configuration parameter `pljava_classpath`. The 
PL/Java extension adds a set of functions that helps to install and maintain 
the Java classes. Classes are stored in normal Java archives, JAR files. A JAR 
file can optionally contain a deployment descriptor that in turn contains SQL 
commands to be executed when the JAR is deployed or undeployed. The functions 
are modeled after the standards proposed for SQL 2003.
+
+PL/Java implements a standard way of passing parameters and return values. 
Complex types and sets are passed using the standard JDBC ResultSet class.
+
+A JDBC driver is included in PL/Java. This driver calls HAWQ internal SPI 
routines. The driver is essential since it is common for functions to make 
calls back to the database to fetch data. When PL/Java functions fetch data, 
they must use the same transactional boundaries that are used by the main 
function that entered PL/Java execution context.
+
+PL/Java is optimized for performance. The Java virtual machine executes within 
the same process as the backend to minimize call overhead. PL/Java is designed 
with the objective to enable the power of Java to the database itself so that 
database intensive business logic can execute as close to the actual data as 
possible.
+
+The standard Java Native Interface (JNI) is used when bridging calls between 
the backend and the Java VM.
+
+
+## <a id="abouthawqpljava"></a>About HAWQ PL/Java 
+
+There are a few key differences between the implementation of PL/Java in 
standard PostgreSQL and HAWQ.
+
+### <a id="pljavafunctions"></a>Functions 
+
+The following functions are not supported in HAWQ. The classpath is handled 
differently in a distributed HAWQ environment than in the PostgreSQL 
environment.
+
+- sqlj.install_jar
+- sqlj.install_jar
+- sqlj.replace_jar
+- sqlj.remove_jar
+- sqlj.get_classpath
+- sqlj.set_classpath
+
+HAWQ uses the `pljava_classpath` server configuration parameter in place of 
the `sqlj.set_classpath` function.
+
+### <a id="serverconfigparams"></a>Server Configuration Parameters 
+
+The following server configuration parameters are used by PL/Java in HAWQ. 
These parameters replace the `pljava.*` parameters that are used in the 
standard PostgreSQL PL/Java implementation.
+
+<p class="note"><b>Note:</b> See the <a 
href="/20/reference/hawq-reference.html">HAWQ Reference</a> for information 
about HAWQ server configuration parameters.</p>
+
+#### pljava\_classpath
+
+A colon (:) separated list of the jar files containing the Java classes used 
in any PL/Java functions. The jar files must be installed in the same locations 
on all HAWQ hosts. With the trusted PL/Java language handler, jar file paths 
must be relative to the `$GPHOME/lib/postgresql/java/` directory. With the 
untrusted language handler (javaU language tag), paths may be relative to 
`$GPHOME/lib/postgresql/java/` or absolute.
+
+#### pljava\_statement\_cache\_size
+
+Sets the size in KB of the Most Recently Used (MRU) cache for prepared 
statements.
+
+#### pljava\_release\_lingering\_savepoints
+
+If TRUE, lingering savepoints will be released on function exit. If FALSE, 
they will be rolled back.
+
+#### pljava\_vmoptions
+
+Defines the start up options for the Java VM.
+
+
+## <a id="enablepljava"></a>Enabling and Removing PL/Java Support 
+
+The PL/Java extension must be explicitly enabled on each database in which it 
will be used.
+
+
+### <a id="pljavaprereq"></a>Prerequisites 
+
+Before you enable PL/Java:
+
+1. Ensure that you have installed a supported Java runtime environment and 
that the `$JAVA_HOME` variable is set to the same path on the master and all 
segment nodes.
+
+2. Perform the following step on all machines to set up `ldconfig` for JDK:
+
+       ``` shell
+       $ echo "$JAVA_HOME/jre/lib/amd64/server" > /etc/ld.so.conf.d/libjdk.conf
+       $ ldconfig
+       ```
+4. Make sure that your HAWQ cluster is running, you have sourced 
`greenplum_path.sh` and that your `$GPHOME` environment variable is set.
+
+
+### <a id="enablepljava"></a>Enable PL/Java and Install JAR Files 
+
+To use PL/Java:
+
+1. Enable the language for each database.
+1. Install user-created JAR files containing Java methods on all HAWQ hosts.
+1. Add the name of the JAR file to the HAWQ `pljava_classpath` server 
configuration parameter in `hawq-site.xml`. This parameter value should contain 
a list of the installed JAR files.
+
+#### <a id="enablepljava"></a>Enable PL/Java and Install JAR Files 
+
+Perform the following steps as the `gpadmin` user:
+
+1. Enable PL/Java by running the `$GPHOME/share/postgresql/pljava/install.sql` 
SQL script in the databases that use PL/Java. The `install.sql` script 
registers both the trusted and untrusted PL/Java. For example, the following 
command enables PL/Java on a database named `testdb`:
+
+       ``` shell
+       $ psql -d testdb -f $GPHOME/share/postgresql/pljava/install.sql
+       ```
+       
+       To enable the PL/Java extension in all new HAWQ databases, run the 
script on the `template1` database: 
+
+    ``` shell
+    $ psql -d template1 -f $GPHOME/share/postgresql/pljava/install.sql
+    ```
+
+    Use this option *only* if you are certain you want to enable PL/Java in 
all new databases.
+       
+2. Copy your Java archives (JAR files) to `$GPHOME/lib/postgresql/java/` on 
all the HAWQ hosts. This example uses the `hawq scp` utility to copy the 
`myclasses.jar` file:
+
+       ``` shell
+       $ hawq scp -f hawq_hosts myclasses.jar =:$GPHOME/lib/postgresql/java/
+       ```
+       The `hawq_hosts` file contains a list of the HAWQ hosts.
+
+3. The JAR files must be added to the `pljava_classpath` configuration 
parameter. This parameter can be set at either the database session or global 
levels.  
+
+    To affect only the *current* database session, set the `pljava_classpath` 
configuration parameter at the `psql` prompt:
+       
+        ``` sql
+        psql> set pljava_classpath='myclasses.jar';
+        ```
+
+    To affect *all* sessions, set the `pljava_classpath` server configuration 
parameter and restart the HAWQ cluster:
+
+        ``` shell
+        $ hawq config -c pljava_classpath -v \'examples.jar:myclasses.jar\' 
+        $ hawq restart cluster
+        ```
+
+5. (Optional) Your HAWQ installation includes an `examples.sql` file.  This 
script contains sample PL/Java functions that you can use for testing. Run the 
commands in this file to create and run test functions that use the Java 
classes in `examples.jar`:
+
+       ``` shell
+       $ psql -f $GPHOME/share/postgresql/pljava/examples.sql
+       ```
+
+#### Configuring PL/Java VM Options
+
+PL/Java JVM options can be configured via the `pljava_vmoptions` parameter in 
`hawq-site.xml`. For example, `pljava_vmoptions=-Xmx512M` sets the maximum heap 
size of the JVM. The default Xmx value is set to `-Xmx64M`.
+
+       
+### <a id="uninstallpljava"></a>Disable PL/Java 
+
+To disable PL/Java, you should:
+
+1. Remove PL/Java support from each database in which it was added.
+2. Uninstall the Java JAR files.
+
+#### <a id="uninstallpljavasupport"></a>Remove PL/Java Support from Databases 
+
+For a database that no long requires the PL/Java language, remove support for 
PL/Java by running the `uninstall.sql` file as the `gpadmin` user. For example, 
the following command disables the PL/Java language in the specified database:
+
+``` shell
+$ psql -d <dbname> -f $GPHOME/share/postgresql/pljava/uninstall.sql
+```
+
+Replace \<dbname\> with the name of the target database.
+
+
+#### <a id="uninstallpljavapackage"></a>Uninstall the Java JAR files 
+
+When no databases have PL/Java as a registered language, remove the Java JAR 
files:
+
+1. Remove the `pljava_classpath` server configuration parameter in the 
`hawq-site.xml` file.
+
+1. Remove the JAR files from the `$GPHOME/lib/postgresql/java/` directory of 
each HAWQ host.
+
+1. Restart the HAWQ cluster:
+
+       ``` shell
+       $ hawq restart cluster
+       ```
+
+
+## <a id="writingpljavafunc"></a>Writing PL/Java Functions 
+
+This section provides information about writing functions with PL/Java.
+
+- [SQL Declaration](#sqldeclaration)
+- [Type Mapping](#typemapping)
+- [NULL Handling](#nullhandling)
+- [Complex Types](#complextypes)
+- [Returning Complex Types](#returningcomplextypes)
+- [Functions That Return Sets](#functionreturnsets)
+- [Returning a SETOF \<scalar type\>](#returnsetofscalar)
+- [Returning a SETOF \<complex type\>](#returnsetofcomplex)
+
+
+### <a id="sqldeclaration"></a>SQL Declaration 
+
+A Java function is declared with the name of a class and a static method on 
that class. The class will be resolved using the classpath that has been 
defined for the schema where the function is declared. If no classpath has been 
defined for that schema, the public schema is used. If no classpath is found 
there either, the class is resolved using the system classloader.
+
+The following function can be declared to access the static method getProperty 
on `java.lang.System` class:
+
+```sql
+CREATE FUNCTION getsysprop(VARCHAR)
+  RETURNS VARCHAR
+  AS 'java.lang.System.getProperty'
+  LANGUAGE java;
+```
+
+Run the following command to return the Java `user.home` property:
+
+```sql
+SELECT getsysprop('user.home');
+```
+
+### <a id="typemapping"></a>Type Mapping 
+
+Scalar types are mapped in a straightforward way. This table lists the current 
mappings.
+
+***Table 1: PL/Java data type mappings***
+
+| PostgreSQL | Java |
+|------------|------|
+| bool | boolean |
+| char | byte |
+| int2 | short |
+| int4 | int |
+| int8 | long |
+| varchar | java.lang.String |
+| text | java.lang.String |
+| bytea | byte[ ] |
+| date | java.sql.Date |
+| time | java.sql.Time (stored value treated as local time) |
+| timetz | java.sql.Time |
+| timestamp    | java.sql.Timestamp (stored value treated as local time) |
+| timestampz | java.sql.Timestamp |
+| complex |    java.sql.ResultSet |
+| setof complex        | java.sql.ResultSet |
+
+All other types are mapped to `java.lang.String` and will utilize the standard 
textin/textout routines registered for respective type.
+
+### <a id="nullhandling"></a>NULL Handling 
+
+The scalar types that map to Java primitives can not be passed as NULL values. 
To pass NULL values, those types can have an alternative mapping. You enable 
this mapping by explicitly denoting it in the method reference.
+
+```sql
+CREATE FUNCTION trueIfEvenOrNull(integer)
+  RETURNS bool
+  AS 'foo.fee.Fum.trueIfEvenOrNull(java.lang.Integer)'
+  LANGUAGE java;
+```
+
+The Java code would be similar to this:
+
+```java
+package foo.fee;
+public class Fum
+{
+  static boolean trueIfEvenOrNull(Integer value)
+  {
+    return (value == null)
+      ? true
+      : (value.intValue() % 1) == 0;
+  }
+}
+```
+
+The following two statements both yield true:
+
+```sql
+SELECT trueIfEvenOrNull(NULL);
+SELECT trueIfEvenOrNull(4);
+```
+
+In order to return NULL values from a Java method, you use the object type 
that corresponds to the primitive (for example, you return `java.lang.Integer` 
instead of `int`). The PL/Java resolve mechanism finds the method regardless. 
Since Java cannot have different return types for methods with the same name, 
this does not introduce any ambiguity.
+
+### <a id="complextypes"></a>Complex Types 
+
+A complex type will always be passed as a read-only `java.sql.ResultSet` with 
exactly one row. The `ResultSet` is positioned on its row so a call to `next()` 
should not be made. The values of the complex type are retrieved using the 
standard getter methods of the `ResultSet`.
+
+Example:
+
+```sql
+CREATE TYPE complexTest
+  AS(base integer, incbase integer, ctime timestamptz);
+CREATE FUNCTION useComplexTest(complexTest)
+  RETURNS VARCHAR
+  AS 'foo.fee.Fum.useComplexTest'
+  IMMUTABLE LANGUAGE java;
+```
+
+In the Java class `Fum`, we add the following static method:
+
+```java
+public static String useComplexTest(ResultSet complexTest)
+throws SQLException
+{
+  int base = complexTest.getInt(1);
+  int incbase = complexTest.getInt(2);
+  Timestamp ctime = complexTest.getTimestamp(3);
+  return "Base = \"" + base +
+    "\", incbase = \"" + incbase +
+    "\", ctime = \"" + ctime + "\"";
+}
+```
+
+### <a id="returningcomplextypes"></a>Returning Complex Types 
+
+Java does not stipulate any way to create a `ResultSet`. Hence, returning a 
ResultSet is not an option. The SQL-2003 draft suggests that a complex return 
value should be handled as an IN/OUT parameter. PL/Java implements a 
`ResultSet` that way. If you declare a function that returns a complex type, 
you will need to use a Java method with boolean return type with a last 
parameter of type `java.sql.ResultSet`. The parameter will be initialized to an 
empty updateable ResultSet that contains exactly one row.
+
+Assume that the complexTest type in previous section has been created.
+
+```sql
+CREATE FUNCTION createComplexTest(int, int)
+  RETURNS complexTest
+  AS 'foo.fee.Fum.createComplexTest'
+  IMMUTABLE LANGUAGE java;
+```
+
+The PL/Java method resolve will now find the following method in the `Fum` 
class:
+
+```java
+public static boolean complexReturn(int base, int increment, 
+  ResultSet receiver)
+throws SQLException
+{
+  receiver.updateInt(1, base);
+  receiver.updateInt(2, base + increment);
+  receiver.updateTimestamp(3, new 
+    Timestamp(System.currentTimeMillis()));
+  return true;
+}
+```
+
+The return value denotes if the receiver should be considered as a valid tuple 
(true) or NULL (false).
+
+### <a id="functionreturnsets"></a>Functions that Return Sets 
+
+When returning result set, you should not build a result set before returning 
it, because building a large result set would consume a large amount of 
resources. It is better to produce one row at a time. Incidentally, that is 
what the HAWQ backend expects a function with SETOF return to do. You can 
return a SETOF a scalar type such as an int, float or varchar, or you can 
return a SETOF a complex type.
+
+### <a id="returnsetofscalar"></a>Returning a SETOF \<scalar type\> 
+
+In order to return a set of a scalar type, you need create a Java method that 
returns something that implements the `java.util.Iterator` interface. Here is 
an example of a method that returns a SETOF varchar:
+
+```sql
+CREATE FUNCTION javatest.getSystemProperties()
+  RETURNS SETOF varchar
+  AS 'foo.fee.Bar.getNames'
+  IMMUTABLE LANGUAGE java;
+```
+
+This simple Java method returns an iterator:
+
+```java
+package foo.fee;
+import java.util.Iterator;
+
+public class Bar
+{
+    public static Iterator getNames()
+    {
+        ArrayList names = new ArrayList();
+        names.add("Lisa");
+        names.add("Bob");
+        names.add("Bill");
+        names.add("Sally");
+        return names.iterator();
+    }
+}
+```
+
+### <a id="returnsetofcomplex"></a>Returning a SETOF \<complex type\> 
+
+A method returning a SETOF <complex type> must use either the interface 
`org.postgresql.pljava.ResultSetProvider` or 
`org.postgresql.pljava.ResultSetHandle`. The reason for having two interfaces 
is that they cater for optimal handling of two distinct use cases. The former 
is for cases when you want to dynamically create each row that is to be 
returned from the SETOF function. The latter makes is in cases where you want 
to return the result of an executed query.
+
+#### Using the ResultSetProvider Interface
+
+This interface has two methods. The boolean 
`assignRowValues(java.sql.ResultSet tupleBuilder, int rowNumber)` and the `void 
close()` method. The HAWQ query evaluator will call the `assignRowValues` 
repeatedly until it returns false or until the evaluator decides that it does 
not need any more rows. Then it calls close.
+
+You can use this interface the following way:
+
+```sql
+CREATE FUNCTION javatest.listComplexTests(int, int)
+  RETURNS SETOF complexTest
+  AS 'foo.fee.Fum.listComplexTest'
+  IMMUTABLE LANGUAGE java;
+```
+
+The function maps to a static java method that returns an instance that 
implements the `ResultSetProvider` interface.
+
+```java
+public class Fum implements ResultSetProvider
+{
+  private final int m_base;
+  private final int m_increment;
+  public Fum(int base, int increment)
+  {
+    m_base = base;
+    m_increment = increment;
+  }
+  public boolean assignRowValues(ResultSet receiver, int 
+currentRow)
+  throws SQLException
+  {
+    // Stop when we reach 12 rows.
+    //
+    if(currentRow >= 12)
+      return false;
+    receiver.updateInt(1, m_base);
+    receiver.updateInt(2, m_base + m_increment * currentRow);
+    receiver.updateTimestamp(3, new 
+Timestamp(System.currentTimeMillis()));
+    return true;
+  }
+  public void close()
+  {
+   // Nothing needed in this example
+  }
+  public static ResultSetProvider listComplexTests(int base, 
+int increment)
+  throws SQLException
+  {
+    return new Fum(base, increment);
+  }
+}
+```
+
+The `listComplextTests` method is called once. It may return NULL if no 
results are available or an instance of the `ResultSetProvider`. Here the Java 
class `Fum` implements this interface so it returns an instance of itself. The 
method `assignRowValues` will then be called repeatedly until it returns false. 
At that time, close will be called.
+
+#### Using the ResultSetHandle Interface
+
+This interface is similar to the `ResultSetProvider` interface in that it has 
a `close()` method that will be called at the end. But instead of having the 
evaluator call a method that builds one row at a time, this method has a method 
that returns a `ResultSet`. The query evaluator will iterate over this set and 
deliver the `ResultSet` contents, one tuple at a time, to the caller until a 
call to `next()` returns false or the evaluator decides that no more rows are 
needed.
+
+Here is an example that executes a query using a statement that it obtained 
using the default connection. The SQL suitable for the deployment descriptor 
looks like this:
+
+```sql
+CREATE FUNCTION javatest.listSupers()
+  RETURNS SETOF pg_user
+  AS 'org.postgresql.pljava.example.Users.listSupers'
+  LANGUAGE java;
+CREATE FUNCTION javatest.listNonSupers()
+  RETURNS SETOF pg_user
+  AS 'org.postgresql.pljava.example.Users.listNonSupers'
+  LANGUAGE java;
+```
+
+And in the Java package `org.postgresql.pljava.example` a class `Users` is 
added:
+
+```java
+public class Users implements ResultSetHandle
+{
+  private final String m_filter;
+  private Statement m_statement;
+  public Users(String filter)
+  {
+    m_filter = filter;
+  }
+  public ResultSet getResultSet()
+  throws SQLException
+  {
+    m_statement = 
+      DriverManager.getConnection("jdbc:default:connection").cr
+eateStatement();
+    return m_statement.executeQuery("SELECT * FROM pg_user 
+       WHERE " + m_filter);
+  }
+
+  public void close()
+  throws SQLException
+  {
+    m_statement.close();
+  }
+
+  public static ResultSetHandle listSupers()
+  {
+    return new Users("usesuper = true");
+  }
+
+  public static ResultSetHandle listNonSupers()
+  {
+    return new Users("usesuper = false");
+  }
+}
+```
+## <a id="usingjdbc"></a>Using JDBC 
+
+PL/Java contains a JDBC driver that maps to the PostgreSQL SPI functions. A 
connection that maps to the current transaction can be obtained using the 
following statement:
+
+```java
+Connection conn = 
+  DriverManager.getConnection("jdbc:default:connection"); 
+```
+
+After obtaining a connection, you can prepare and execute statements similar 
to other JDBC connections. These are limitations for the PL/Java JDBC driver:
+
+- The transaction cannot be managed in any way. Thus, you cannot use methods 
on the connection such as:
+   - `commit()`
+   - `rollback()`
+   - `setAutoCommit()`
+   - `setTransactionIsolation()`
+- Savepoints are available with some restrictions. A savepoint cannot outlive 
the function in which it was set and it must be rolled back or released by that 
same function.
+- A ResultSet returned from `executeQuery()` are always `FETCH_FORWARD` and 
`CONCUR_READ_ONLY`.
+- Meta-data is only available in PL/Java 1.1 or higher.
+- `CallableStatement` (for stored procedures) is not implemented.
+- The types `Clob` or `Blob` are not completely implemented, they need more 
work. The types `byte[]` and `String` can be used for `bytea` and `text` 
respectively.
+
+## <a id="exceptionhandling"></a>Exception Handling 
+
+You can catch and handle an exception in the HAWQ backend just like any other 
exception. The backend `ErrorData` structure is exposed as a property in a 
class called `org.postgresql.pljava.ServerException` (derived from 
`java.sql.SQLException`) and the Java try/catch mechanism is synchronized with 
the backend mechanism.
+
+**Important:** You will not be able to continue executing backend functions 
until your function has returned and the error has been propagated when the 
backend has generated an exception unless you have used a savepoint. When a 
savepoint is rolled back, the exceptional condition is reset and you can 
continue your execution.
+
+## <a id="savepoints"></a>Savepoints 
+
+HAWQ savepoints are exposed using the `java.sql.Connection` interface. Two 
restrictions apply.
+
+- A savepoint must be rolled back or released in the function where it was set.
+- A savepoint must not outlive the function where it was set.
+
+## <a id="logging"></a>Logging 
+
+PL/Java uses the standard Java Logger. Hence, you can write things like:
+
+```java
+Logger.getAnonymousLogger().info( "Time is " + new 
+Date(System.currentTimeMillis()));
+```
+
+At present, the logger uses a handler that maps the current state of the HAWQ 
configuration setting `log_min_messages` to a valid Logger level and that 
outputs all messages using the HAWQ backend function `elog()`.
+
+**Note:** The `log_min_messages` setting is read from the database the first 
time a PL/Java function in a session is executed. On the Java side, the setting 
does not change after the first PL/Java function execution in a specific 
session until the HAWQ session that is working with PL/Java is restarted.
+
+The following mapping apply between the Logger levels and the HAWQ backend 
levels.
+
+***Table 2: PL/Java Logging Levels Mappings***
+
+| java.util.logging.Level | HAWQ Level |
+|-------------------------|------------|
+| SEVERE ERROR | ERROR |
+| WARNING |    WARNING |
+| CONFIG |     LOG |
+| INFO | INFO |
+| FINE | DEBUG1 |
+| FINER | DEBUG2 |
+| FINEST | DEBUG3 |
+
+## <a id="security"></a>Security 
+
+This section describes security aspects of using PL/Java.
+
+### <a id="installation"></a>Installation 
+
+Only a database super user can install PL/Java. The PL/Java utility functions 
are installed using SECURITY DEFINER so that they execute with the access 
permissions that where granted to the creator of the functions.
+
+### <a id="trustedlang"></a>Trusted Language 
+
+PL/Java is a trusted language. The trusted PL/Java language has no access to 
the file system as stipulated by PostgreSQL definition of a trusted language. 
Any database user can create and access functions in a trusted language.
+
+PL/Java also installs a language handler for the language `javau`. This 
version is not trusted and only a superuser can create new functions that use 
it. Any user can call the functions.
+
+
+## <a id="pljavaexample"></a>Example 
+
+The following simple Java example creates a JAR file that contains a single 
method and runs the method.
+
+<p class="note"><b>Note:</b> The example requires Java SDK to compile the Java 
file.</p>
+
+The following method returns a substring.
+
+```java
+{
+public static String substring(String text, int beginIndex,
+  int endIndex)
+    {
+    return text.substring(beginIndex, endIndex);
+    }
+}
+```
+
+Enter the Java code in a text file `example.class`.
+
+Contents of the file `manifest.txt`:
+
+```plaintext
+Manifest-Version: 1.0
+Main-Class: Example
+Specification-Title: "Example"
+Specification-Version: "1.0"
+Created-By: 1.6.0_35-b10-428-11M3811
+Build-Date: 01/20/2013 10:09 AM
+```
+
+Compile the Java code:
+
+```shell
+$ javac *.java
+```
+
+Create a JAR archive named `analytics.jar` that contains the class file and 
the manifest file in the JAR.
+
+```shell
+$ jar cfm analytics.jar manifest.txt *.class
+```
+
+Upload the JAR file to the HAWQ master host.
+
+Run the `hawq scp` utility to copy the jar file to the HAWQ Java directory. 
Use the `-f` option to specify the file that contains a list of the master and 
segment hosts.
+
+```shell
+$ hawq scp -f hawq_hosts analytics.jar =:/usr/local/hawq/lib/postgresql/java/
+```
+
+Use the `hawq config` utility to set the HAWQ `pljava_classpath` server 
configuration parameter. The parameter lists the installed JAR files.
+
+```shell
+$ hawq config -c pljava_classpath -v \'analytics.jar\'
+```
+
+Run the `hawq restart` utility to reload the configuration files.
+
+```shell
+$ hawq restart cluster
+```
+
+From the `psql` command line, run the following command to show the installed 
JAR files.
+
+```shell
+psql# show pljava_classpath
+```
+
+The following SQL commands create a table and define a Java function to test 
the method in the JAR file:
+
+```sql
+CREATE TABLE temp (a varchar) DISTRIBUTED randomly; 
+INSERT INTO temp values ('my string'); 
+--Example function 
+CREATE OR REPLACE FUNCTION java_substring(varchar, int, int) 
+RETURNS varchar AS 'Example.substring' LANGUAGE java; 
+--Example execution 
+SELECT java_substring(a, 1, 5) FROM temp;
+```
+
+You can place the contents in a file, `mysample.sql` and run the command from 
a psql command line:
+
+```shell
+psql# \i mysample.sql 
+```
+
+The output is similar to this:
+
+```shell
+java_substring
+----------------
+ y st
+(1 row)
+```
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/plext/using_plperl.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/using_plperl.html.md.erb b/plext/using_plperl.html.md.erb
new file mode 100644
index 0000000..d6ffa04
--- /dev/null
+++ b/plext/using_plperl.html.md.erb
@@ -0,0 +1,27 @@
+---
+title: Using PL/Perl
+---
+
+This section contains an overview of the HAWQ PL/Perl language extension.
+
+## <a id="enableplperl"></a>Enabling PL/Perl
+
+If PL/Perl is enabled during HAWQ build time, HAWQ installs the PL/Perl 
language extension automatically. To use PL/Perl, you must enable it on 
specific databases.
+
+On every database where you want to enable PL/Perl, connect to the database 
using the psql client.
+
+``` shell
+$ psql -d <dbname>
+```
+
+Replace \<dbname\> with the name of the target database.
+
+Then, run the following SQL command:
+
+``` shell
+psql# CREATE LANGUAGE plperl;
+```
+
+## <a id="references"></a>References 
+
+For more information on using PL/Perl, see the PostgreSQL PL/Perl 
documentation at 
[https://www.postgresql.org/docs/8.2/static/plperl.html](https://www.postgresql.org/docs/8.2/static/plperl.html).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/plext/using_plpgsql.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/using_plpgsql.html.md.erb b/plext/using_plpgsql.html.md.erb
new file mode 100644
index 0000000..3661e9b
--- /dev/null
+++ b/plext/using_plpgsql.html.md.erb
@@ -0,0 +1,142 @@
+---
+title: Using PL/pgSQL in HAWQ
+---
+
+SQL is the language of most other relational databases use as query language. 
It is portable and easy to learn. But every SQL statement must be executed 
individually by the database server. 
+
+PL/pgSQL is a loadable procedural language. PL/SQL can do the following:
+
+-   create functions
+-   add control structures to the SQL language
+-   perform complex computations
+-   inherit all user-defined types, functions, and operators
+-   be trusted by the server
+
+You can use functions created with PL/pgSQL with any database that supports 
built-in functions. For example, it is possible to create complex conditional 
computation functions and later use them to define operators or use them in 
index expressions.
+
+Every SQL statement must be executed individually by the database server. Your 
client application must send each query to the database server, wait for it to 
be processed, receive and process the results, do some computation, then send 
further queries to the server. This requires interprocess communication and 
incurs network overhead if your client is on a different machine than the 
database server.
+
+With PL/pgSQL, you can group a block of computation and a series of queries 
inside the database server, thus having the power of a procedural language and 
the ease of use of SQL, but with considerable savings of client/server 
communication overhead.
+
+-   Extra round trips between client and server are eliminated
+-   Intermediate results that the client does not need do not have to be 
marshaled or transferred between server and client
+-   Multiple rounds of query parsing can be avoided
+
+This can result in a considerable performance increase as compared to an 
application that does not use stored functions.
+
+PL/pgSQL supports all the data types, operators, and functions of SQL.
+
+**Note:**  PL/pgSQL is automatically installed and registered in all HAWQ 
databases.
+
+## <a id="supportedargumentandresultdatatypes"></a>Supported Data Types for 
Arguments and Results 
+
+Functions written in PL/pgSQL accept as arguments any scalar or array data 
type supported by the server, and they can return a result containing this data 
type. They can also accept or return any composite type (row type) specified by 
name. It is also possible to declare a PL/pgSQL function as returning record, 
which means that the result is a row type whose columns are determined by 
specification in the calling query. See <a href="#tablefunctions" 
class="xref">Table Functions</a>.
+
+PL/pgSQL functions can be declared to accept a variable number of arguments by 
using the VARIADIC marker. This works exactly the same way as for SQL 
functions. See <a href="#sqlfunctionswithvariablenumbersofarguments" 
class="xref">SQL Functions with Variable Numbers of Arguments</a>.
+
+PL/pgSQLfunctions can also be declared to accept and return the polymorphic 
typesanyelement,anyarray,anynonarray, and anyenum. The actual data types 
handled by a polymorphic function can vary from call to call, as discussed in 
<a 
href="http://www.postgresql.org/docs/8.4/static/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC";
 class="xref">Section 34.2.5</a>. An example is shown in <a 
href="http://www.postgresql.org/docs/8.4/static/plpgsql-declarations.html#PLPGSQL-DECLARATION-ALIASES";
 class="xref">Section 38.3.1</a>.
+
+PL/pgSQL functions can also be declared to return a "set" (or table) of any 
data type that can be returned as a single instance. Such a function generates 
its output by executing RETURN NEXT for each desired element of the result set, 
or by using RETURN QUERY to output the result of evaluating a query.
+
+Finally, a PL/pgSQL function can be declared to return void if it has no 
useful return value.
+
+PL/pgSQL functions can also be declared with output parameters in place of an 
explicit specification of the return type. This does not add any fundamental 
capability to the language, but it is often convenient, especially for 
returning multiple values. The RETURNS TABLE notation can also be used in place 
of RETURNS SETOF .
+
+This topic describes the following PL/pgSQLconcepts:
+
+-   [Table Functions](#tablefunctions)
+-   [SQL Functions with Variable number of 
Arguments](#sqlfunctionswithvariablenumbersofarguments)
+-   [Polymorphic Types](#polymorphictypes)
+
+
+## <a id="tablefunctions"></a>Table Functions 
+
+
+Table functions are functions that produce a set of rows, made up of either 
base data types (scalar types) or composite data types (table rows). They are 
used like a table, view, or subquery in the FROM clause of a query. Columns 
returned by table functions can be included in SELECT, JOIN, or WHERE clauses 
in the same manner as a table, view, or subquery column.
+
+If a table function returns a base data type, the single result column name 
matches the function name. If the function returns a composite type, the result 
columns get the same names as the individual attributes of the type.
+
+A table function can be aliased in the FROM clause, but it also can be left 
unaliased. If a function is used in the FROM clause with no alias, the function 
name is used as the resulting table name.
+
+Some examples:
+
+```sql
+CREATE TABLE foo (fooid int, foosubid int, fooname text);
+
+CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
+    SELECT * FROM foo WHERE fooid = $1;
+$$ LANGUAGE SQL;
+
+SELECT * FROM getfoo(1) AS t1;
+
+SELECT * FROM foo
+    WHERE foosubid IN (
+                        SELECT foosubid
+                        FROM getfoo(foo.fooid) z
+                        WHERE z.fooid = foo.fooid
+                      );
+
+CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);
+
+SELECT * FROM vw_getfoo;
+```
+
+In some cases, it is useful to define table functions that can return 
different column sets depending on how they are invoked. To support this, the 
table function can be declared as returning the pseudotype record. When such a 
function is used in a query, the expected row structure must be specified in 
the query itself, so that the system can know how to parse and plan the query. 
Consider this example:
+
+```sql
+SELECT *
+    FROM dblink('dbname=mydb', 'SELECT proname, prosrc FROM pg_proc')
+      AS t1(proname name, prosrc text)
+    WHERE proname LIKE 'bytea%';
+```
+
+The `dblink` function executes a remote query (see `contrib/dblink`). It is 
declared to return `record` since it might be used for any kind of query. The 
actual column set must be specified in the calling query so that the parser 
knows, for example, what `*` should expand to.
+
+
+## <a id="sqlfunctionswithvariablenumbersofarguments"></a>SQL Functions with 
Variable Numbers of Arguments 
+
+SQL functions can be declared to accept variable numbers of arguments, so long 
as all the "optional" arguments are of the same data type. The optional 
arguments will be passed to the function as an array. The function is declared 
by marking the last parameter as VARIADIC; this parameter must be declared as 
being of an array type. For example:
+
+```sql
+CREATE FUNCTION mleast(VARIADIC numeric[]) RETURNS numeric AS $$
+    SELECT min($1[i]) FROM generate_subscripts($1, 1) g(i);
+$$ LANGUAGE SQL;
+
+SELECT mleast(10, -1, 5, 4.4);
+ mleast 
+--------
+     -1
+(1 row)
+```
+
+Effectively, all the actual arguments at or beyond the VARIADIC position are 
gathered up into a one-dimensional array, as if you had written
+
+```sql
+SELECT mleast(ARRAY[10, -1, 5, 4.4]);    -- doesn't work
+```
+
+You can't actually write that, though; or at least, it will not match this 
function definition. A parameter marked VARIADIC matches one or more 
occurrences of its element type, not of its own type.
+
+Sometimes it is useful to be able to pass an already-constructed array to a 
variadic function; this is particularly handy when one variadic function wants 
to pass on its array parameter to another one. You can do that by specifying 
VARIADIC in the call:
+
+```sql
+SELECT mleast(VARIADIC ARRAY[10, -1, 5, 4.4]);
+```
+
+This prevents expansion of the function's variadic parameter into its element 
type, thereby allowing the array argument value to match normally. VARIADIC can 
only be attached to the last actual argument of a function call.
+
+
+
+## <a id="polymorphictypes"></a>Polymorphic Types 
+
+Four pseudo-types of special interest are anyelement,anyarray, anynonarray, 
and anyenum, which are collectively called *polymorphic types*. Any function 
declared using these types is said to be a*polymorphic function*. A polymorphic 
function can operate on many different data types, with the specific data 
type(s) being determined by the data types actually passed to it in a 
particular call.
+
+Polymorphic arguments and results are tied to each other and are resolved to a 
specific data type when a query calling a polymorphic function is parsed. Each 
position (either argument or return value) declared as anyelement is allowed to 
have any specific actual data type, but in any given call they must all be the 
sam eactual type. Each position declared as anyarray can have any array data 
type, but similarly they must all be the same type. If there are positions 
declared anyarray and others declared anyelement, the actual array type in the 
anyarray positions must be an array whose elements are the same type appearing 
in the anyelement positions.anynonarray is treated exactly the same as 
anyelement, but adds the additional constraint that the actual type must not be 
an array type. anyenum is treated exactly the same as anyelement, but adds the 
additional constraint that the actual type must be an enum type.
+
+Thus, when more than one argument position is declared with a polymorphic 
type, the net effect is that only certain combinations of actual argument types 
are allowed. For example, a function declared as equal(anyelement, anyelement) 
will take any two input values, so long as they are of the same data type.
+
+When the return value of a function is declared as a polymorphic type, there 
must be at least one argument position that is also polymorphic, and the actual 
data type supplied as the argument determines the actual result type for that 
call. For example, if there were not already an array subscripting mechanism, 
one could define a function that implements subscripting `assubscript(anyarray, 
integer)` returns `anyelement`. This declaration constrains the actual first 
argument to be an array type, and allows the parser to infer the correct result 
type from the actual first argument's type. Another example is that a function 
declared `asf(anyarray)` returns `anyenum` will only accept arrays of `enum` 
types.
+
+Note that `anynonarray` and `anyenum` do not represent separate type 
variables; they are the same type as `anyelement`, just with an additional 
constraint. For example, declaring a function as `f(anyelement,           
anyenum)` is equivalent to declaring it as `f(anyenum, anyenum)`; both actual 
arguments have to be the same enum type.
+
+Variadic functions described in <a 
href="#sqlfunctionswithvariablenumbersofarguments" class="xref">SQL Functions 
with Variable Numbers of Arguments</a> can be polymorphic: this is accomplished 
by declaring its last parameter as `VARIADIC anyarray`. For purposes of 
argument matching and determining the actual result type, such a function 
behaves the same as if you had written the appropriate number of `anynonarray` 
parameters.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/plext/using_plpython.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/using_plpython.html.md.erb b/plext/using_plpython.html.md.erb
new file mode 100644
index 0000000..5a9123c
--- /dev/null
+++ b/plext/using_plpython.html.md.erb
@@ -0,0 +1,595 @@
+---
+title: Using PL/Python in HAWQ
+---
+
+This section contains an overview of the HAWQ PL/Python language extension.
+
+## <a id="abouthawqplpython"></a>About HAWQ PL/Python 
+
+PL/Python is a loadable procedural language. With the HAWQ PL/Python 
extension, you can write HAWQ user-defined functions in Python that take 
advantage of Python features and modules to quickly build robust database 
applications.
+
+If PL/Python is enabled during HAWQ build time, HAWQ includes both a version 
of Python and PL/Python when deployed. HAWQ uses the following Python 
installation:
+
+```shell
+$GPHOME/ext/python/
+```
+
+### <a id="hawqlimitations"></a>HAWQ PL/Python Limitations 
+
+- HAWQ does not support PL/Python triggers.
+- PL/Python is available only as a HAWQ untrusted language.
+ 
+## <a id="enableplpython"></a>Enabling and Removing PL/Python Support 
+
+If enabled as an option during HAWQ compilation, the PL/Python language is 
installed with HAWQ.
+
+**Note**: To use PL/Python in HAWQ, you must either use a pre-compiled version 
of HAWQ that includes PL/Python or specify PL/Python as a build option when 
compiling HAWQ.
+
+To create and run a PL/Python user-defined function (UDF) in a database, you 
must register the PL/Python language with the database. On every database where 
you want to install and enable PL/Python, connect to the database using the 
psql client.
+
+```shell
+$ psql -d <dbname>
+```
+
+Replace \<dbname\> with the name of the target database.
+
+Then, run the following SQL command:
+
+```shell
+psql# CREATE LANGUAGE plpythonu;
+```
+
+Note that `plpythonu` is installed as an âuntrustedâ language, meaning it 
does not offer any way of restricting what users can do in it.
+
+To remove support for plpythonu from a database, run the following SQL command:
+
+```shell
+psql# DROP LANGUAGE plpythonu;
+```
+
+## <a id="developfunctions"></a>Developing Functions with PL/Python 
+
+The body of a PL/Python user-defined function is a Python script. When the 
function is called, its arguments are passed as elements of the array `args[]`. 
Named arguments are also passed as ordinary variables to the Python script. The 
result is returned from the PL/Python function with return statement, or yield 
statement in case of a result-set statement.
+
+The HAWQ PL/Python language module imports the Python module `plpy`. The 
module plpy implements these functions:
+
+- Functions to execute SQL queries and prepare execution plans for queries.
+   - `plpy.execute`
+   - `plpy.prepare`
+   
+- Functions to manage errors and messages.
+   - `plpy.debug`
+   - `plpy.log`
+   - `plpy.info`
+   - `plpy.notice`
+   - `plpy.warning`
+   - `plpy.error`
+   - `plpy.fatal`
+   - `plpy.debug`
+   
+## <a id="executepreparesql"></a>Executing and Preparing SQL Queries 
+
+The PL/Python `plpy` module provides two Python functions to execute an SQL 
query and prepare an execution plan for a query, `plpy.execute` and 
`plpy.prepare`. Preparing the execution plan for a query is useful if you run 
the query from multiple Python functions.
+
+### <a id="plpyexecute"></a>plpy.execute 
+
+Calling `plpy.execute` with a query string and an optional limit argument 
causes the query to be run and the result to be returned in a Python result 
object. The result object emulates a list or dictionary object. The rows 
returned in the result object can be accessed by row number and column name. 
The result set row numbering starts with 0 (zero). The result object can be 
modified. The result object has these additional methods:
+
+- `nrows` that returns the number of rows returned by the query.
+- `status` which is the `SPI_execute()` return value.
+
+For example, this Python statement in a PL/Python user-defined function 
executes a query.
+
+```python
+rv = plpy.execute("SELECT * FROM my_table", 5)
+```
+
+The `plpy.execute` function returns up to 5 rows from `my_table`. The result 
set is stored in the `rv` object. If `my_table` has a column `my_column`, it 
would be accessed as:
+
+```python
+my_col_data = rv[i]["my_column"]
+```
+
+Since the function returns a maximum of 5 rows, the index `i` can be an 
integer between 0 and 4.
+
+### <a id="plpyprepare"></a>plpy.prepare 
+
+The function `plpy.prepare` prepares the execution plan for a query. It is 
called with a query string and a list of parameter types, if you have parameter 
references in the query. For example, this statement can be in a PL/Python 
user-defined function:
+
+```python
+plan = plpy.prepare("SELECT last_name FROM my_users WHERE 
+  first_name = $1", [ "text" ])
+```
+
+The string text is the data type of the variable that is passed for the 
variable `$1`. After preparing a statement, you use the function `plpy.execute` 
to run it:
+
+```python
+rv = plpy.execute(plan, [ "Fred" ], 5)
+```
+
+The third argument is the limit for the number of rows returned and is 
optional.
+
+When you prepare an execution plan using the PL/Python module the plan is 
automatically saved. See the Postgres Server Programming Interface (SPI) 
documentation for information about the execution plans 
[http://www.postgresql.org/docs/8.2/static/spi.html](http://www.postgresql.org/docs/8.2/static/spi.html).
+
+To make effective use of saved plans across function calls you use one of the 
Python persistent storage dictionaries SD or GD.
+
+The global dictionary SD is available to store data between function calls. 
This variable is private static data. The global dictionary GD is public data, 
available to all Python functions within a session. Use GD with care.
+
+Each function gets its own execution environment in the Python interpreter, so 
that global data and function arguments from myfunc are not available to 
`myfunc2`. The exception is the data in the GD dictionary, as mentioned 
previously.
+
+This example uses the SD dictionary:
+
+```sql
+CREATE FUNCTION usesavedplan() RETURNS trigger AS $$
+  if SD.has_key("plan"):
+    plan = SD["plan"]
+  else:
+    plan = plpy.prepare("SELECT 1")
+    SD["plan"] = plan
+
+  # rest of function
+
+$$ LANGUAGE plpythonu;
+```
+
+## <a id="pythonerrors"></a>Handling Python Errors and Messages 
+
+The message functions `plpy.error` and `plpy.fatal` raise a Python exception 
which, if uncaught, propagates out to the calling query, causing the current 
transaction or subtransaction to be aborted. The functions raise 
`plpy.ERROR(msg)` and raise `plpy.FATAL(msg)` are equivalent to calling 
`plpy.error` and `plpy.fatal`, respectively. The other message functions only 
generate messages of different priority levels.
+
+Whether messages of a particular priority are reported to the client, written 
to the server log, or both is controlled by the HAWQ server configuration 
parameters `log_min_messages` and `client_min_messages`. For information about 
the parameters, see the [Server Configuration Parameter 
Reference](../reference/HAWQSiteConfig.html).
+
+## <a id="dictionarygd"></a>Using the Dictionary GD to Improve PL/Python 
Performance 
+
+In terms of performance, importing a Python module is an expensive operation 
and can affect performance. If you are importing the same module frequently, 
you can use Python global variables to load the module on the first invocation 
and not require importing the module on subsequent calls. The following 
PL/Python function uses the GD persistent storage dictionary to avoid importing 
a module if it has already been imported and is in the GD.
+
+```sql
+psql=#
+   CREATE FUNCTION pytest() returns text as $$ 
+      if 'mymodule' not in GD:
+        import mymodule
+        GD['mymodule'] = mymodule
+    return GD['mymodule'].sumd([1,2,3])
+$$;
+```
+
+## <a id="installpythonmodules"></a>Installing Python Modules 
+
+When you install a Python module on HAWQ, the HAWQ Python environment must 
have the module added to it across all segment hosts in the cluster. When 
expanding HAWQ, you must add the Python modules to the new segment hosts. You 
can use the HAWQ utilities `hawq ssh` and `hawq scp` run commands on HAWQ hosts 
and copy files to the hosts. For information about the utilities, see the [HAWQ 
Management Tools Reference](../reference/cli/management_tools.html).
+
+As part of the HAWQ installation, the `gpadmin` user environment is configured 
to use Python that is installed with HAWQ.
+
+To check which Python is being used in your environment, use the `which` 
command:
+
+```bash
+$ which python
+```
+
+The command returns the location of the Python installation. The Python 
installed with HAWQ is in the HAWQ `ext/python` directory.
+
+```bash
+/$GPHOME/ext/python/bin/python
+```
+
+If you are building a Python module, you must ensure that the build creates 
the correct executable. For example on a Linux system, the build should create 
a 64-bit executable.
+
+Before building a Python module prior to installation, ensure that the 
appropriate software to build the module is installed and properly configured. 
The build environment is required only on the host where you build the module.
+
+These are examples of installing and testing Python modules:
+
+- Simple Python Module Installation Example (setuptools)
+- Complex Python Installation Example (NumPy)
+- Testing Installed Python Modules
+
+### <a id="simpleinstall"></a>Simple Python Module Installation Example 
(setuptools) 
+
+This example manually installs the Python `setuptools` module from the Python 
Package Index repository. The module lets you easily download, build, install, 
upgrade, and uninstall Python packages.
+
+This example first builds the module from a package and installs the module on 
a single host. Then the module is built and installed on segment hosts.
+
+Get the module package from the Python Package Index site. For example, run 
this `wget` command on a HAWQ host as the gpadmin user to get the tar.gz file.
+
+```bash
+$ wget --no-check-certificate 
https://pypi.python.org/packages/source/s/setuptools/setuptools-18.4.tar.gz
+```
+
+Extract the files from the tar.gz file.
+
+```bash
+$ tar -xzvf setuptools-18.4.tar.gz
+```
+
+Go to the directory that contains the package files, and run the Python 
scripts to build and install the Python package.
+
+```bash
+$ cd setuptools-18.4
+$ python setup.py build && python setup.py install
+```
+
+The following Python command returns no errors if the module is available to 
Python.
+
+```bash
+$ python -c "import setuptools"
+```
+
+Copy the package to the HAWQ hosts with the `hawq scp` utility. For example, 
this command copies the tar.gz file from the current host to the host systems 
listed in the file `hawq-hosts`.
+
+```bash
+$ hawq scp -f hawq-hosts setuptools-18.4.tar.gz =:/home/gpadmin
+```
+
+Run the commands to build, install, and test the package with `hawq ssh` 
utility on the hosts listed in the file `hawq-hosts`. The file `hawq-hosts` 
lists all the remote HAWQ segment hosts:
+
+```bash
+$ hawq ssh -f hawq-hosts
+>>> tar -xzvf setuptools-18.4.tar.gz
+>>> cd setuptools-18.4
+>>> python setup.py build && python setup.py install
+>>> python -c "import setuptools"
+>>> exit
+```
+
+The `setuptools` package installs the `easy_install` utility that lets you 
install Python packages from the Python Package Index repository. For example, 
this command installs Python PIP utility from the Python Package Index site.
+
+```shell
+$ cd setuptools-18.4
+$ easy_install pip
+```
+
+You can use the `hawq ssh` utility to run the `easy_install` command on all 
the HAWQ segment hosts.
+
+### <a id="complexinstall"></a>Complex Python Installation Example (NumPy) 
+
+This example builds and installs the Python module NumPy. NumPy is a module 
for scientific computing with Python. For information about NumPy, see 
[http://www.numpy.org/](http://www.numpy.org/).
+
+Building the NumPy package requires this software:
+- OpenBLAS libraries, an open source implementation of BLAS (Basic Linear 
Algebra Subprograms).
+- The gcc compilers: gcc, gcc-gfortran, and gcc-c++. The compilers are 
required to build the OpenBLAS libraries. See [OpenBLAS 
Prerequisites](#openblasprereq).
+
+This example process assumes `yum` is installed on all HAWQ segment hosts and 
the `gpadmin` user is a member of `sudoers` with `root` privileges on the hosts.
+
+Download the OpenBLAS and NumPy source files. For example, these `wget` 
commands download tar.gz files into the directory packages:
+
+```bash
+$ wget --directory-prefix=packages 
http://github.com/xianyi/OpenBLAS/tarball/v0.2.8
+$ wget --directory-prefix=packages 
http://sourceforge.net/projects/numpy/files/NumPy/1.8.0/numpy-1.8.0.tar.gz/download
+```
+
+Distribute the software to the HAWQ hosts. For example, if you download the 
software to `/home/gpadmin/packages`, these commands create the directory on 
the hosts and copies the software to hosts for the hosts listed in the 
`hawq-hosts` file.
+
+```bash
+$ hawq ssh -f hawq-hosts mkdir packages 
+$ hawq scp -f hawq-hosts packages/* =:/home/gpadmin/packages
+```
+
+#### <a id="openblasprereq"></a>OpenBLAS Prerequisites 
+
+1. If needed, use `yum` to install gcc compilers from system repositories. The 
compilers are required on all hosts where you compile OpenBLAS:
+
+       ```bash
+       $ sudo yum -y install gcc gcc-gfortran gcc-c++
+       ```
+
+       **Note:** If you cannot install the correct compiler versions with 
`yum`, you can download the gcc compilers, including gfortran, from source and 
install them.
+
+       These two commands download and install the compilers:
+
+       ```bash
+       $ wget http://gfortran.com/download/x86_64/snapshots/gcc-4.4.tar.xz
+       $ tar xf gcc-4.4.tar.xz -C /usr/local/
+       ```
+
+       If you installed `gcc` manually from a tar file, add the new `gcc` 
binaries to `PATH` and `LD_LIBRARY_PATH`:
+
+       ```bash
+       $ export PATH=$PATH:/usr/local/gcc-4.4/bin
+       $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/gcc-4.4/lib
+       ```
+
+2. Create a symbolic link to `g++` and call it `gxx`:
+
+       ```bash
+       $ sudo ln -s /usr/bin/g++ /usr/bin/gxx
+       ```
+
+3. You might also need to create symbolic links to any libraries that have 
different versions available for example `libppl_c.so.4` to `libppl_c.so.2`.
+
+4. If needed, you can use the `hawq scp` utility to copy files to HAWQ hosts 
and the `hawq ssh` utility to run commands on the hosts.
+
+#### <a id="buildopenblas"></a>Build and Install OpenBLAS Libraries 
+
+Before build and install the NumPy module, you install the OpenBLAS libraries. 
This section describes how to build and install the libraries on a single host.
+
+1. Extract the OpenBLAS files from the file. These commands extract the files 
from the OpenBLAS tar file and simplify the directory name that contains the 
OpenBLAS files.
+
+       ```bash
+       $ tar -xzf packages/v0.2.8 -C /home/gpadmin/packages
+       $ mv /home/gpadmin/packages/xianyi-OpenBLAS-9c51cdf 
/home/gpadmin/packages/     OpenBLAS
+       ```
+
+2. Compile OpenBLAS. These commands set the LIBRARY_PATH environment variable 
and run the make command to build OpenBLAS libraries.
+
+       ```bash
+       $ cd /home/gpadmin/packages/OpenBLAS
+       $ export LIBRARY_PATH=$LD_LIBRARY_PATH
+       $ make FC=gfortran USE_THREAD=0
+       ```
+
+3. Use these commands to install the OpenBLAS libraries in `/usr/local` as 
`root`, and then change the owner of the files to `gpadmin`.
+
+       ```bash
+       $ cd /home/gpadmin/packages/OpenBLAS/
+       $ sudo make PREFIX=/usr/local install
+       $ sudo ldconfig
+       $ sudo chown -R gpadmin /usr/local/lib
+       ```
+
+       The following libraries are installed, along with symbolic links:
+
+       ```bash
+       libopenblas.a -> libopenblas_sandybridge-r0.2.8.a
+       libopenblas_sandybridge-r0.2.8.a
+       libopenblas_sandybridge-r0.2.8.so
+       libopenblas.so -> libopenblas_sandybridge-r0.2.8.so
+       libopenblas.so.0 -> libopenblas_sandybridge-r0.2.8.so
+       ```
+
+4. You can use the `hawq ssh` utility to build and install the OpenBLAS 
libraries on multiple hosts.
+
+       All HAWQ hosts (master and segment hosts) have identical 
configurations. You can copy the OpenBLAS libraries from the system where they 
were built instead of building the OpenBlas libraries on all the hosts. For 
example, these `hawq ssh` and `hawq scp commands copy and install the OpenBLAS 
libraries on the hosts listed in the hawq-hosts file.
+
+```bash
+$ hawq ssh -f hawq-hosts -e 'sudo yum -y install gcc gcc-gfortran gcc-c++'
+$ hawq ssh -f hawq-hosts -e 'ln -s /usr/bin/g++ /usr/bin/gxx'
+$ hawq ssh -f hawq-hosts -e sudo chown gpadmin /usr/local/lib
+$ hawq scp -f hawq-hosts /usr/local/lib/libopen*sandy* =:/usr/local/lib
+```
+```bash
+$ hawq ssh -f hawq-hosts
+>>> cd /usr/local/lib
+>>> ln -s libopenblas_sandybridge-r0.2.8.a libopenblas.a
+>>> ln -s libopenblas_sandybridge-r0.2.8.so libopenblas.so
+>>> ln -s libopenblas_sandybridge-r0.2.8.so libopenblas.so.0
+>>> sudo ldconfig
+```
+
+#### Build and Install NumPy <a name="buildinstallnumpy"></a>
+
+After you have installed the OpenBLAS libraries, you can build and install 
NumPy module. These steps install the NumPy module on a single host. You can 
use the hawq ssh utility to build and install the NumPy module on multiple 
hosts.
+
+1. Go to the packages subdirectory and get the NumPy module source and extract 
the files.
+
+       ```bash
+       $ cd /home/gpadmin/packages
+       $ tar -xzf numpy-1.8.0.tar.gz
+       ```
+
+2. Set up the environment for building and installing NumPy.
+
+       ```bash
+       $ export BLAS=/usr/local/lib/libopenblas.a
+       $ export LAPACK=/usr/local/lib/libopenblas.a
+       $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
+       $ export LIBRARY_PATH=$LD_LIBRARY_PATH
+       ```
+
+3. Go to the NumPy directory and build and install NumPy. Building the NumPy 
package might take some time.
+
+       ```bash
+       $ cd numpy-1.8.0
+       $ python setup.py build
+       $ python setup.py install
+       ```
+
+       **Note:** If the NumPy module did not successfully build, the NumPy 
build process might need a site.cfg that specifies the location of the OpenBLAS 
libraries. Create the file `site.cfg` in the NumPy package directory:
+
+       ```bash
+       $ cd ~/packages/numpy-1.8.0
+       $ touch site.cfg
+       ```
+
+       Add the following to the `site.cfg` file and run the NumPy build 
command again:
+
+       <pre>
+       [default]
+       library_dirs = /usr/local/lib
+
+       [atlas]
+       atlas_libs = openblas
+       library_dirs = /usr/local/lib
+
+       [lapack]
+       lapack_libs = openblas
+       library_dirs = /usr/local/lib
+
+       # added for scikit-learn 
+       [openblas]
+       libraries = openblas
+       library_dirs = /usr/local/lib
+       include_dirs = /usr/local/include
+       </pre>
+
+4. The following Python command ensures that the module is available for 
import by Python on a host system.
+
+       ```bash
+       $ python -c "import numpy"
+       ```
+
+5. Similar to the simple module installation, use the `hawq ssh` utility to 
build, install, and test the module on HAWQ segment hosts.
+
+5. The environment variables that are require to build the NumPy module are 
also required in the gpadmin user environment when running Python NumPy 
functions. You can use the `hawq ssh` utility with the `echo` command to add 
the environment variables to the `.bashrc` file. For example, these echo 
commands add the environment variables to the `.bashrc` file in the user home 
directory.
+
+       ```bash
+       $ echo -e '\n#Needed for NumPy' >> ~/.bashrc
+       $ echo -e 'export BLAS=/usr/local/lib/libopenblas.a' >> ~/.bashrc
+       $ echo -e 'export LAPACK=/usr/local/lib/libopenblas.a' >> ~/.bashrc
+       $ echo -e 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib' >> 
~/.bashrc
+       $ echo -e 'export LIBRARY_PATH=$LD_LIBRARY_PATH' >> ~/.bashrc
+       ```
+
+## <a id="testingpythonmodules"></a>Testing Installed Python Modules 
+
+You can create a simple PL/Python user-defined function (UDF) to validate that 
Python a module is available in HAWQ. This example tests the NumPy module.
+
+This PL/Python UDF imports the NumPy module. The function returns SUCCESS if 
the module is imported, and FAILURE if an import error occurs.
+
+```sql
+CREATE OR REPLACE FUNCTION plpy_test(x int)
+returns text
+as $$
+  try:
+      from numpy import *
+      return 'SUCCESS'
+  except ImportError, e:
+      return 'FAILURE'
+$$ language plpythonu;
+```
+
+Create a table that contains data on each HAWQ segment instance. Depending on 
the size of your HAWQ installation, you might need to generate more data to 
ensure data is distributed to all segment instances.
+
+```sql
+CREATE TABLE DIST AS (SELECT x FROM generate_series(1,50) x ) DISTRIBUTED 
RANDOMLY ;
+```
+
+This SELECT command runs the UDF on the segment hosts where data is stored in 
the primary segment instances.
+
+```sql
+SELECT gp_segment_id, plpy_test(x) AS status
+  FROM dist
+  GROUP BY gp_segment_id, status
+  ORDER BY gp_segment_id, status;
+```
+
+The SELECT command returns SUCCESS if the UDF imported the Python module on 
the HAWQ segment instance. If the SELECT command returns FAILURE, you can find 
the segment host of the segment instance host. The HAWQ system table 
`gp_segment_configuration` contains information about segment configuration. 
This command returns the host name for a segment ID.
+
+```sql
+SELECT hostname, content AS seg_ID FROM gp_segment_configuration
+  WHERE content = seg_id ;
+```
+
+If FAILURE is returned, these are some possible causes:
+
+- A problem accessing required libraries. For the NumPy example, HAWQ might 
have a problem accessing the OpenBLAS libraries or the Python libraries on a 
segment host.
+
+       Make sure you get no errors when running command on the segment host as 
the gpadmin user. This hawq ssh command tests importing the NumPy module on the 
segment host mdw1.
+
+       ```shell
+       $ hawq ssh -h mdw1 python -c "import numpy"
+       ```
+
+- If the Python import command does not return an error, environment variables 
might not be configured in the HAWQ environment. For example, the variables are 
not in the `.bashrc` file, or HAWQ might not have been restarted after adding 
the environment variables to the `.bashrc` file.
+
+       Ensure sure that the environment variables are properly set and then 
restart HAWQ. For the NumPy example, ensure the environment variables listed at 
the end of the section [Build and Install NumPy](#buildinstallnumpy) are 
defined in the `.bashrc` file for the gpadmin user on the master and segment 
hosts.
+
+       **Note:** On HAWQ master and segment hosts, the `.bashrc` file for the 
gpadmin user must source the file `$GPHOME/greenplum_path.sh`.
+
+## <a id="examples"></a>Examples 
+
+This PL/Python UDF returns the maximum of two integers:
+
+```sql
+CREATE FUNCTION pymax (a integer, b integer)
+  RETURNS integer
+AS $$
+  if (a is None) or (b is None):
+      return None
+  if a > b:
+     return a
+  return b
+$$ LANGUAGE plpythonu;
+```
+
+You can use the STRICT property to perform the null handling instead of using 
the two conditional statements.
+
+```sql
+CREATE FUNCTION pymax (a integer, b integer) 
+  RETURNS integer AS $$ 
+return max(a,b) 
+$$ LANGUAGE plpythonu STRICT ;
+```
+
+You can run the user-defined function pymax with SELECT command. This example 
runs the UDF and shows the output.
+
+```sql
+SELECT ( pymax(123, 43));
+column1
+---------
+     123
+(1 row)
+```
+
+This example that returns data from an SQL query that is run against a table. 
These two commands create a simple table and add data to the table.
+
+```sql
+CREATE TABLE sales (id int, year int, qtr int, day int, region text)
+  DISTRIBUTED BY (id) ;
+
+INSERT INTO sales VALUES
+ (1, 2014, 1,1, 'usa'),
+ (2, 2002, 2,2, 'europe'),
+ (3, 2014, 3,3, 'asia'),
+ (4, 2014, 4,4, 'usa'),
+ (5, 2014, 1,5, 'europe'),
+ (6, 2014, 2,6, 'asia'),
+ (7, 2002, 3,7, 'usa') ;
+```
+
+This PL/Python UDF executes a SELECT command that returns 5 rows from the 
table. The Python function returns the REGION value from the row specified by 
the input value. In the Python function, the row numbering starts from 0. Valid 
input for the function is an integer between 0 and 4.
+
+```sql
+CREATE OR REPLACE FUNCTION mypytest(a integer) 
+  RETURNS text 
+AS $$ 
+  rv = plpy.execute("SELECT * FROM sales ORDER BY id", 5)
+  region = rv[a]["region"]
+  return region
+$$ language plpythonu;
+```
+
+Running this SELECT statement returns the REGION column value from the third 
row of the result set.
+
+```sql
+SELECT mypytest(2) ;
+```
+
+This command deletes the UDF from the database.
+
+```sql
+DROP FUNCTION mypytest(integer) ;
+```
+
+## <a id="references"></a>References 
+
+This section lists references for using PL/Python.
+
+### <a id="technicalreferences"></a>Technical References 
+
+For information about PL/Python see the PostgreSQL documentation at 
[http://www.postgresql.org/docs/8.2/static/plpython.html](http://www.postgresql.org/docs/8.2/static/plpython.html).
+
+For information about Python Package Index (PyPI), see 
[https://pypi.python.org/pypi](https://pypi.python.org/pypi).
+
+These are some Python modules that can be downloaded:
+
+- SciPy library provides user-friendly and efficient numerical routines such 
as routines for numerical integration and optimization 
[http://www.scipy.org/scipylib/index.html](http://www.scipy.org/scipylib/index.html).
 This wget command downloads the SciPy package tar file.
+
+ ```shell
+$ wget http://sourceforge.net/projects/scipy/files/scipy/0.10.1/ 
scipy-0.10.1.tar.gz/download
+```
+
+- Natural Language Toolkit (nltk) is a platform for building Python programs 
to work with human language data http://www.nltk.org/. This wget command 
downloads the nltk package tar file.
+
+ ```shell
+$ wget 
http://pypi.python.org/packages/source/n/nltk/nltk-2.0.2.tar.gz#md5=6e714ff74c3398e88be084748df4e657
+ ```
+
+ **Note:** The Python package Distribute 
[https://pypi.python.org/pypi/](https://pypi.python.org/pypi/) distribute is 
required for `nltk`. The Distribute module should be installed before the 
`ntlk` package. This wget command downloads the Distribute package tar file.
+
+```shell
+$ wget 
http://pypi.python.org/packages/source/d/distribute/distribute-0.6.21.tar.gz
+```
+
+### <a id="usefulreading"></a>Useful Reading 
+
+For information about the Python language, see 
[http://www.python.org/](http://www.python.org/).
+
+A set of slides that were used in a talk about how the Pivotal Data Science 
team uses the PyData stack in the Pivotal MPP databases and on Pivotal Cloud 
Foundry 
[http://www.slideshare.net/SrivatsanRamanujam/all-thingspythonpivotal](http://www.slideshare.net/SrivatsanRamanujam/all-thingspythonpivotal).
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/plext/using_plr.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/using_plr.html.md.erb b/plext/using_plr.html.md.erb
new file mode 100644
index 0000000..49d207f
--- /dev/null
+++ b/plext/using_plr.html.md.erb
@@ -0,0 +1,229 @@
+---
+title: Using PL/R in HAWQ
+---
+
+PL/R is a procedural language. With the HAWQ PL/R extension, you can write 
database functions in the R programming language and use R packages that 
contain R functions and data sets.
+
+**Note**: To use PL/R in HAWQ, R must be installed on each node in your HAWQ 
cluster. Additionally, you must install the PL/R package on an existing HAWQ 
deployment or have specified PL/R as a build option when compiling HAWQ.
+
+## <a id="plrexamples"></a>PL/R Examples 
+
+This section contains simple PL/R examples.
+
+### <a id="example1"></a>Example 1: Using PL/R for Single Row Operators 
+
+This function generates an array of numbers with a normal distribution using 
the R function `rnorm()`.
+
+```sql
+CREATE OR REPLACE FUNCTION r_norm(n integer, mean float8, 
+  std_dev float8) RETURNS float8[ ] AS
+$$
+  x<-rnorm(n,mean,std_dev)
+  return(x)
+$$
+LANGUAGE 'plr';
+```
+
+The following `CREATE TABLE` command uses the `r_norm` function to populate 
the table. The `r_norm` function creates an array of 10 numbers.
+
+```sql
+CREATE TABLE test_norm_var
+  AS SELECT id, r_norm(10,0,1) as x
+  FROM (SELECT generate_series(1,30:: bigint) AS ID) foo
+  DISTRIBUTED BY (id);
+```
+
+### <a id="example2"></a>Example 2: Returning PL/R data.frames in Tabular Form 
+
+Assuming your PL/R function returns an R `data.frame` as its output \(unless 
you want to use arrays of arrays\), some work is required in order for HAWQ to 
see your PL/R `data.frame` as a simple SQL table:
+
+Create a TYPE in HAWQ with the same dimensions as your R `data.frame`:
+
+```sql
+CREATE TYPE t1 AS ...
+```
+
+Use this TYPE when defining your PL/R function:
+
+```sql
+... RETURNS SET OF t1 AS ...
+```
+
+Sample SQL for this situation is provided in the next example.
+
+### <a id="example3"></a>Example 3: Process Employee Information Using PL/R 
+
+The SQL below defines a TYPE and a function to process employee information 
with `data.frame` using PL/R:
+
+```sql
+-- Create type to store employee information
+DROP TYPE IF EXISTS emp_type CASCADE;
+CREATE TYPE emp_type AS (name text, age int, salary numeric(10,2));
+
+-- Create function to process employee information and return data.frame
+DROP FUNCTION IF EXISTS get_emps();
+CREATE OR REPLACE FUNCTION get_emps() RETURNS SETOF emp_type AS '
+    names <- c("Joe","Jim","Jon")
+    ages <- c(41,25,35)
+    salaries <- c(250000,120000,50000)
+    df <- data.frame(name = names, age = ages, salary = salaries)
+
+    return(df)
+' LANGUAGE 'plr';
+
+-- Call the function
+SELECT * FROM get_emps();
+```
+
+
+## <a id="downloadinstallplrlibraries"></a>Downloading and Installing R 
Packages 
+
+R packages are modules that contain R functions and data sets. You can install 
R packages to extend R and PL/R functionality in HAWQ.
+
+**Note**: If you expand HAWQ and add segment hosts, you must install the R 
packages in the R installation of *each* of the new hosts.</p>
+
+1. For an R package, identify all dependent R packages and each package web 
URL. The information can be found by selecting the given package from the 
following navigation page:
+
+       
[http://cran.r-project.org/web/packages/available_packages_by_name.html](http://cran.r-project.org/web/packages/available_packages_by_name.html)
+
+       As an example, the page for the R package `arm` indicates that the 
package requires the following R libraries: `Matrix`, `lattice`, `lme4`, 
`R2WinBUGS`, `coda`, `abind`, `foreign`, and `MASS`.
+       
+       You can also try installing the package with `R CMD INSTALL` command to 
determine the dependent packages.
+       
+       For the R installation included with the HAWQ PL/R extension, the 
required R packages are installed with the PL/R extension. However, the Matrix 
package requires a newer version.
+       
+1. From the command line, use the `wget` utility to download the tar.gz files 
for the `arm` package to the HAWQ master host:
+
+       ```shell
+       $ wget 
http://cran.r-project.org/src/contrib/Archive/arm/arm_1.5-03.tar.gz
+       $ wget 
http://cran.r-project.org/src/contrib/Archive/Matrix/Matrix_0.9996875-1.tar.gz
+       ```
+
+1. Use the `hawq scp` utility and the `hawq_hosts` file to copy the tar.gz 
files to the same directory on all nodes of the HAWQ cluster. The `hawq_hosts` 
file contains a list of all of the HAWQ segment hosts. You might require root 
access to do this.
+
+       ```shell
+       $ hawq scp -f hosts_all Matrix_0.9996875-1.tar.gz =:/home/gpadmin 
+       $ hawq scp -f hawq_hosts arm_1.5-03.tar.gz =:/home/gpadmin
+       ```
+
+1. Use the `hawq ssh` utility in interactive mode to log into each HAWQ 
segment host (`hawq ssh -f hawq_hosts`). Install the packages from the command 
prompt using the `R CMD INSTALL` command. Note that this may require root 
access. For example, this R install command installs the packages for the `arm` 
package.
+
+       ```shell
+       $ R CMD INSTALL Matrix_0.9996875-1.tar.gz arm_1.5-03.tar.gz
+       ```
+       **Note**: Some packages require compilation. Refer to the package 
documentation for possible build requirements.
+
+1. Ensure that the R package was installed in the `/usr/lib64/R/library` 
directory on all the segments (`hawq ssh` can be used to install the package). 
For example, this `hawq ssh` command lists the contents of the R library 
directory.
+
+       ```shell
+       $ hawq ssh -f hawq_hosts "ls /usr/lib64/R/library"
+       ```
+       
+1. Verify the R package can be loaded.
+
+       This function performs a simple test to determine if an R package can 
be loaded:
+       
+       ```sql
+       CREATE OR REPLACE FUNCTION R_test_require(fname text)
+       RETURNS boolean AS
+       $BODY$
+       return(require(fname,character.only=T))
+       $BODY$
+       LANGUAGE 'plr';
+       ```
+
+       This SQL command calls the previous function to determine if the R 
package `arm` can be loaded:
+       
+       ```sql
+       SELECT R_test_require('arm');
+       ```
+
+## <a id="rlibrarydisplay"></a>Displaying R Library Information 
+
+You can use the R command line to display information about the installed 
libraries and functions on the HAWQ host. You can also add and remove libraries 
from the R installation. To start the R command line on the host, log in to the 
host as the `gpadmin` user and run the script R.
+
+``` shell
+$ R
+```
+
+This R function lists the available R packages from the R command line:
+
+```r
+> library()
+```
+
+Display the documentation for a particular R package
+
+```r
+> library(help="package_name")
+> help(package="package_name")
+```
+
+Display the help file for an R function:
+
+```r
+> help("function_name")
+> ?function_name
+```
+
+To see what packages are installed, use the R command `installed.packages()`. 
This will return a matrix with a row for each package that has been installed. 
Below, we look at the first 5 rows of this matrix.
+
+```r
+> installed.packages()
+```
+
+Any package that does not appear in the installed packages matrix must be 
installed and loaded before its functions can be used.
+
+An R package can be installed with `install.packages()`:
+
+```r
+> install.packages("package_name") 
+> install.packages("mypkg", dependencies = TRUE, type="source")
+```
+
+Load a package from the R command line.
+
+```r
+> library(" package_name ") 
+```
+An R package can be removed with remove.packages
+
+```r
+> remove.packages("package_name")
+```
+
+You can use the R command `-e` option to run functions from the command line. 
For example, this command displays help on the R package named `MASS`.
+
+```shell
+$ R -e 'help("MASS")'
+```
+
+## <a id="plrreferences"></a>References 
+
+[http://www.r-project.org/](http://www.r-project.org/) - The R Project home 
page
+
+[https://github.com/pivotalsoftware/gp-r](https://github.com/pivotalsoftware/gp-r)
 - GitHub repository that contains information about using R.
+
+[https://github.com/pivotalsoftware/PivotalR](https://github.com/pivotalsoftware/PivotalR)
 - GitHub repository for PivotalR, a package that provides an R interface to 
operate on HAWQ tables and views that is similar to the R `data.frame`. 
PivotalR also supports using the machine learning package MADlib directly from 
R.
+
+R documentation is installed with the R package:
+
+```shell
+/usr/share/doc/R-N.N.N
+```
+
+where N.N.N corresponds to the version of R installed.
+
+### <a id="rfunctions"></a>R Functions and Arguments 
+
+See 
[http://www.joeconway.com/plr/doc/plr-funcs.html](http://www.joeconway.com/plr/doc/plr-funcs.html).
+
+### <a id="passdatavalues"></a>Passing Data Values in R 
+
+See 
[http://www.joeconway.com/plr/doc/plr-data.html](http://www.joeconway.com/plr/doc/plr-data.html).
+
+### <a id="aggregatefunctions"></a>Aggregate Functions in R 
+
+See 
[http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html](http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html).
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/pxf/ConfigurePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/ConfigurePXF.html.md.erb b/pxf/ConfigurePXF.html.md.erb
new file mode 100644
index 0000000..087a89a
--- /dev/null
+++ b/pxf/ConfigurePXF.html.md.erb
@@ -0,0 +1,67 @@
+---
+title: Configuring PXF
+---
+
+This topic describes how to configure the PXF service.
+
+**Note:** After you make any changes to a PXF configuration file (such as 
`pxf-profiles.xml` for adding custom profiles), propagate the changes to all 
nodes with PXF installed, and then restart the PXF service on all nodes.
+
+## <a id="settingupthejavaclasspath"></a>Setting up the Java Classpath
+
+The classpath for the PXF service is set during the plug-in installation 
process. Administrators should only modify it when adding new PXF connectors. 
The classpath is defined in two files:
+
+1.  `/etc/pxf/conf/pxf-private.classpath`Â â contains all the required 
resources to run the PXF service, including pxf-hdfs, pxf-hbase, and pxf-hive 
plug-ins. This file must not be edited or removed.
+2.  `/etc/pxf/conf/pxf-public.classpath` â plug-in jar files and any 
dependent jar files for custom plug-ins and custom profiles should be added 
here. The classpath resources should be defined one per line. Wildcard 
characters can be used in the name of the resource, but not in the full path. 
See [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) 
for information on adding custom profiles.
+
+After changing the classpath files, the PXF service must be restarted.Â 
+
+## <a id="settingupthejvmcommandlineoptionsforpxfservice"></a>Setting up the 
JVM Command Line Options for the PXF Service
+
+The PXF service JVM command line options can be added or modified for each 
pxf-service instance in the `/var/pxf/pxf-service/bin/setenv.sh` file:
+
+Currently the `JVM_OPTS` parameter is set with the following values for 
maximum Java heap sizeÂ and thread stack size:
+
+``` shell
+JVM_OPTS="-Xmx512M -Xss256K"
+```
+
+After adding or modifyingÂ the JVM command line options, the PXF service must 
be restarted.
+
+## <a id="topic_i3f_hvm_ss"></a>Using PXF on a Secure HDFS Cluster
+
+You can use PXF on a secure HDFS cluster.Â Read, write, and analyze operations 
for PXF tables on HDFS files are enabled.Â It requires no changes to 
preexisting PXF tables from a previous version.
+
+### <a id="requirements"></a>Requirements
+
+-   Both HDFS and YARN principals are created and are properly configured.
+-   HAWQ is correctly configured to work in secure mode.
+
+Please refer to [Troubleshooting PXF](TroubleshootingPXF.html) for common 
errors related to PXF security and their meaning.
+
+## <a id="credentialsforremoteservices"></a>Credentials for Remote Services
+
+Credentials for remote services allows a PXF plug-in to access a remote 
service that requires credentials.
+
+### <a id="inhawq"></a>In HAWQ
+
+Two parameters for credentials are implemented in HAWQ:
+
+-   `pxf_remote_service_login` â a string of characters detailing 
information regarding login (i.e. user name).
+-   `pxf_remote_service_secret` â a string of characters detailing 
information that is considered secret (i.e. password).
+
+Currently, the contents of the two parameters are stored in memory, without 
any security, for the whole session.Â Leaving the session will insecurely drop 
the contents of the parameters.
+
+**Important:** These parameters are temporary and could soon be deprecated, in 
favor of a complete solution for managing credentials for remote services in 
PXF.
+
+### <a id="inapxfplugin"></a>In a PXF Plug-in
+
+In a PXF plug-in, the contents of the two credentials parameters is available 
through the following InputData API functions:
+
+``` java
+string getLogin()
+string getSecret()
+```
+
+Both functions return 'null' if the corresponding HAWQ parameter was set to an 
empty string or was not set at all.Â 
+
+

[21/36] incubator-hawq-docs git commit: moving book configuration to new 'book' branch, for HAWQ-1027

Reply via email to