Repository: incubator-hawq-docs
Updated Branches:
  refs/heads/develop 87ff8368c -> c9cb72d6a


HAWQ-1216 - clean up plpython docs (Closes #77)


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/c9cb72d6
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/c9cb72d6
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/c9cb72d6

Branch: refs/heads/develop
Commit: c9cb72d6a14aefbae9a94e26783d8da6e2a7c1e6
Parents: 87ff836
Author: Lisa Owen <[email protected]>
Authored: Thu Jan 5 15:38:30 2017 -0800
Committer: David Yozie <[email protected]>
Committed: Thu Jan 5 15:38:30 2017 -0800

----------------------------------------------------------------------
 plext/using_plpython.html.md.erb | 914 +++++++++++++++++++++-------------
 1 file changed, 564 insertions(+), 350 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/c9cb72d6/plext/using_plpython.html.md.erb
----------------------------------------------------------------------
diff --git a/plext/using_plpython.html.md.erb b/plext/using_plpython.html.md.erb
index ff7b527..063509a 100644
--- a/plext/using_plpython.html.md.erb
+++ b/plext/using_plpython.html.md.erb
@@ -2,374 +2,608 @@
 title: Using PL/Python in HAWQ
 ---
 
-This section contains an overview of the HAWQ PL/Python language extension.
+This section provides an overview of the HAWQ PL/Python procedural language 
extension.
 
 ## <a id="abouthawqplpython"></a>About HAWQ PL/Python 
 
-PL/Python is a loadable procedural language. With the HAWQ PL/Python 
extension, you can write HAWQ user-defined functions in Python that take 
advantage of Python features and modules to quickly build robust database 
applications.
+PL/Python is embedded in your HAWQ product distribution or within your HAWQ 
build if you chose to enable it as a build option. 
+
+With the HAWQ PL/Python extension, you can write user-defined functions in 
Python that take advantage of Python features and modules, enabling you to 
quickly build robust HAWQ database applications.
 
 HAWQ uses the system Python installation.
 
 ### <a id="hawqlimitations"></a>HAWQ PL/Python Limitations 
 
-- HAWQ does not support PL/Python triggers.
+- HAWQ does not support PL/Python trigger functions.
 - PL/Python is available only as a HAWQ untrusted language.
  
 ## <a id="enableplpython"></a>Enabling and Removing PL/Python Support 
 
-To use PL/Python in HAWQ, you must either use a pre-compiled version of HAWQ 
that includes PL/Python or specify PL/Python as a build option when compiling 
HAWQ.
+To use PL/Python in HAWQ, you must either install a binary version of HAWQ 
that includes PL/Python or specify PL/Python as a build option when you compile 
HAWQ from source.
+
+You must register the PL/Python language with a database before you can create 
and execute a PL/Python UDF on that database. You must be a database superuser 
to register and remove new languages in HAWQ databases.
+
+On every database to which you want to install and enable PL/Python:
+
+1. Connect to the database using the `psql` client:
+
+    ``` shell
+    gpadmin@hawq-node$ psql -d <dbname>
+    ```
+
+    Replace \<dbname\> with the name of the target database.
+
+2. Run the following SQL command to register the PL/Python procedural language:
+
+    ``` sql
+    dbname=# CREATE LANGUAGE plpythonu;
+    ```
 
-To create and run a PL/Python user-defined function (UDF) in a database, you 
must register the PL/Python language with the database. On every database where 
you want to install and enable PL/Python, connect to the database using the 
`psql` client.
+    **Note**: `plpythonu` is installed as an *untrusted* language; it offers 
no way of restricting what you can program in UDFs created with the language. 
Creating and executing PL/Python UDFs is permitted only by database superusers 
and other database users explicitly `GRANT`ed the permissions.
 
-```shell
-$ psql -d <dbname>
+To remove support for `plpythonu` from a database, run the following SQL 
command; you must be a database superuser to remove a registered procedural 
language:
+
+``` sql
+dbname=# DROP LANGUAGE plpythonu;
 ```
 
-Replace \<dbname\> with the name of the target database.
+## <a id="developfunctions"></a>Developing Functions with PL/Python 
+
+PL/Python functions are defined using the standard SQL [CREATE 
FUNCTION](../reference/sql/CREATE-FUNCTION.html) syntax.
+
+The body of a PL/Python user-defined function is a Python script. When the 
function is called, its arguments are passed as elements of the array `args[]`. 
You can also pass named arguments as ordinary variables to the Python script. 
 
-Then, run the following SQL command:
+PL/Python function results are returned with a `return` statement, or a 
`yield` statement in the case of a result-set statement.
 
-```shell
-psql# CREATE LANGUAGE plpythonu;
+The following PL/Python function computes and returns the maximum of two 
integers:
+
+``` sql
+=# CREATE FUNCTION mypymax (a integer, b integer)
+     RETURNS integer
+   AS $$
+     if (a is None) or (b is None):
+       return None
+     if a > b:
+       return a
+     return b
+   $$ LANGUAGE plpythonu;
 ```
 
-Note that `plpythonu` is installed as an “untrusted” language, meaning it 
does not offer any way of restricting what users can do in it.
+To execute the `mypymax` function:
 
-To remove support for `plpythonu` from a database, run the following SQL 
command:
+``` sql
+=# SELECT mypymax(5, 7);
+ mypymax 
+---------
+       7
+(1 row)
+```
+
+Adding the `STRICT` keyword to the `LANGUAGE` subclause instructs HAWQ to 
return null when any of the input arguments are null. When created as `STRICT`, 
the function itself need not perform null checks.
 
-```shell
-psql# DROP LANGUAGE plpythonu;
+The following example uses an unnamed argument, the built-in Python `max()` 
function, and the `STRICT` keyword to create a UDF named `mypymax2`:
+
+``` sql
+=# CREATE FUNCTION mypymax2 (a integer, integer)
+     RETURNS integer AS $$ 
+   return max(a, args[0]) 
+   $$ LANGUAGE plpythonu STRICT;
+=# SELECT mypymax(5, 3);
+ mypymax2
+----------
+        5
+(1 row)
+=# SELECT mypymax(5, null);
+ mypymax2
+----------
+       
+(1 row)
 ```
 
-## <a id="developfunctions"></a>Developing Functions with PL/Python 
+## <a id="example_createtbl"></a>Creating the Sample Data
 
-The body of a PL/Python user-defined function is a Python script. When the 
function is called, its arguments are passed as elements of the array `args[]`. 
Named arguments are also passed as ordinary variables to the Python script. The 
result is returned from the PL/Python function with return statement, or yield 
statement in case of a result-set statement.
+Perform the following steps to create, and insert data into, a simple table. 
This table will be used in later exercises.
 
-The HAWQ PL/Python language module imports the Python module `plpy`. The 
module `plpy` implements these functions:
+1. Create a database named `testdb`:
 
-- Functions to execute SQL queries and prepare execution plans for queries.
-   - `plpy.execute`
-   - `plpy.prepare`
-   
-- Functions to manage errors and messages.
-   - `plpy.debug`
-   - `plpy.log`
-   - `plpy.info`
-   - `plpy.notice`
-   - `plpy.warning`
-   - `plpy.error`
-   - `plpy.fatal`
-   - `plpy.debug`
+    ``` shell
+    gpadmin@hawq-node$ createdb testdb
+    ```
+
+1. Create a table named `sales`:
+
+    ``` shell
+    gpadmin@hawq-node$ psql -d testdb
+    ```
+    ``` sql
+    testdb=> CREATE TABLE sales (id int, year int, qtr int, day int, region 
text)
+               DISTRIBUTED BY (id);
+    ```
+
+2. Insert data into the table:
+
+    ``` sql
+    testdb=> INSERT INTO sales VALUES
+     (1, 2014, 1,1, 'usa'),
+     (2, 2002, 2,2, 'europe'),
+     (3, 2014, 3,3, 'asia'),
+     (4, 2014, 4,4, 'usa'),
+     (5, 2014, 1,5, 'europe'),
+     (6, 2014, 2,6, 'asia'),
+     (7, 2002, 3,7, 'usa') ;
+    ```
+
+## <a id="pymod_intro"></a>Python Modules 
+A Python module is a text file containing Python statements and definitions. 
Python modules are named, with the file name for a module following the 
`<python-module-name>.py` naming convention.
+
+Should you need to build a Python module, ensure that the appropriate software 
is installed on the build system. Also be sure that you are building for the 
correct deployment architecture, i.e. 64-bit.
+
+### <a id="pymod_intro_hawq"></a>HAWQ Considerations 
+
+When installing a Python module in HAWQ, you must add the module to all 
segment nodes in the cluster. You must also add all Python modules to any new 
segment hosts when you expand your HAWQ cluster.
+
+PL/Python supports the built-in HAWQ Python module named `plpy`.  You can also 
install 3rd party Python modules.
+
+
+## <a id="modules_plpy"></a>plpy Module 
+
+The HAWQ PL/Python procedural language extension automatically imports the 
Python module `plpy`. `plpy` implements functions to execute SQL queries and 
prepare execution plans for queries.  The `plpy` module also includes functions 
to manage errors and messages.
    
-## <a id="executepreparesql"></a>Executing and Preparing SQL Queries 
+### <a id="executepreparesql"></a>Executing and Preparing SQL Queries 
 
-The PL/Python `plpy` module provides two Python functions to execute an SQL 
query and prepare an execution plan for a query, `plpy.execute` and 
`plpy.prepare`. Preparing the execution plan for a query is useful if you run 
the query from multiple Python functions.
+Use the PL/Python `plpy` module `plpy.execute()` function to execute a SQL 
query. Use the `plpy.prepare()` function to prepare an execution plan for a 
query. Preparing the execution plan for a query is useful if you want to run 
the query from multiple Python functions.
 
-### <a id="plpyexecute"></a>plpy.execute 
+#### <a id="plpyexecute"></a>plpy.execute() 
 
-Calling `plpy.execute` with a query string and an optional limit argument 
causes the query to be run and the result to be returned in a Python result 
object. The result object emulates a list or dictionary object. The rows 
returned in the result object can be accessed by row number and column name. 
The result set row numbering starts with 0 (zero). The result object can be 
modified. The result object has these additional methods:
+Invoking `plpy.execute()` with a query string and an optional limit argument 
runs the query, returning the result in a Python result object. This result 
object:
 
-- `nrows` that returns the number of rows returned by the query.
-- `status` which is the `SPI_execute()` return value.
+- emulates a list or dictionary object
+- returns rows that can be accessed by row number and column name; row 
numbering starts with 0 (zero)
+- can be modified
+- includes an `nrows()` method that returns the number of rows returned by the 
query
+- includes a `status()` method that returns the `SPI_execute()` return value
 
-For example, this Python statement in a PL/Python user-defined function 
executes a query.
+For example, the following Python statement when present in a PL/Python 
user-defined function will execute a `SELECT * FROM mytable` query:
 
-```python
-rv = plpy.execute("SELECT * FROM my_table", 5)
+``` python
+rv = plpy.execute("SELECT * FROM my_table", 3)
 ```
 
-The `plpy.execute` function returns up to 5 rows from `my_table`. The result 
set is stored in the `rv` object. If `my_table` has a column `my_column`, it 
would be accessed as:
+As instructed by the limit argument `3`, the `plpy.execute` function will 
return up to 3 rows from `my_table`. The result set is stored in the `rv` 
object.
+
+Access specific columns in the table by name. For example, if `my_table` has a 
column named `my_column`:
 
-```python
+``` python
 my_col_data = rv[i]["my_column"]
 ```
 
-Since the function returns a maximum of 5 rows, the index `i` can be an 
integer between 0 and 4.
+You specified that the function return a maximum of 3 rows in the 
`plpy.execute()` command above. As such, the index `i` used to access the 
result value `rv` must specify an integer between 0 and 2, inclusive.
+
+##### <a id="plpyexecute_example"></a>Example: plpy.execute()
+
+Example: Use `plpy.execute()` to run a similar query on the `sales` table you 
created in an earlier section:
+
+1. Define a PL/Python UDF that executes a query to return at most 5 rows from 
the `sales` table:
+
+    ``` sql
+    =# CREATE OR REPLACE FUNCTION mypytest(a integer) 
+         RETURNS text 
+       AS $$ 
+         rv = plpy.execute("SELECT * FROM sales ORDER BY id", 5)
+         region = rv[a-1]["region"]
+         return region
+       $$ LANGUAGE plpythonu;
+    ```
+
+    When executed, this UDF returns the `region` value from the `id` 
identified by the input value `a`. Since row numbering of the result set starts 
at 0, you must access the result set with index `a - 1`. 
+    
+    Specifying the `ORDER BY id` clause in the `SELECT` statement ensures that 
subsequent invocations of `mypytest` with the same input argument will return 
identical result sets.
+
+3. Run `mypytest` with an argument identifying `id` `3`:
+
+    ```sql
+    =# SELECT mypytest(3);
+     mypytest 
+    ----------
+     asia
+    (1 row)
+    ```
+    
+    Recall that the row numbering starts from 0 in a Python returned result 
set. The valid input argument for the `mypytest2` function is an integer 
between 0 and 4, inclusive.
+
+    The query returns the `region` from the row with `id = 3`, `asia`.
+    
+Note: This example demonstrates some of the concepts discussed previously. It 
may not be the ideal way to return a specific column value.
 
-### <a id="plpyprepare"></a>plpy.prepare 
+#### <a id="plpyprepare"></a>plpy.prepare() 
 
-The function `plpy.prepare` prepares the execution plan for a query. It is 
called with a query string and a list of parameter types, if you have parameter 
references in the query. For example, this statement can be in a PL/Python 
user-defined function:
+The function `plpy.prepare()` prepares the execution plan for a query. 
Preparing the execution plan for a query is useful if you plan to run the query 
from multiple Python functions.
 
-```python
-plan = plpy.prepare("SELECT last_name FROM my_users WHERE 
-  first_name = $1", [ "text" ])
+You invoke `plpy.prepare()` with a query string. Also include a list of 
parameter types if you are using parameter references in the query. For 
example, the following statement in a PL/Python user-defined function returns 
the execution plan for a query:
+
+``` python
+plan = plpy.prepare("SELECT * FROM sales ORDER BY id WHERE 
+  region = $1", [ "text" ])
 ```
 
-The string text is the data type of the variable that is passed for the 
variable `$1`. After preparing a statement, you use the function `plpy.execute` 
to run it:
+The string `text` identifies the data type of the variable `$1`. 
+
+After preparing an execution plan, you use the function `plpy.execute()` to 
run it.  For example:
 
-```python
-rv = plpy.execute(plan, [ "Fred" ], 5)
+``` python
+rv = plpy.execute(plan, [ "usa" ])
 ```
 
-The third argument is the limit for the number of rows returned and is 
optional.
+When executed, `rv` will include all rows in the `sales` table where `region = 
usa`.
+
+Read on for a description of how one passes data between PL/Python function 
calls.
+
+##### <a id="plpyprepare_dictionaries"></a>Saving Execution Plans
 
-When you prepare an execution plan using the PL/Python module the plan is 
automatically saved. See the Postgres Server Programming Interface (SPI) 
documentation for information about the execution plans 
[http://www.postgresql.org/docs/8.2/static/spi.html](http://www.postgresql.org/docs/8.2/static/spi.html).
+When you prepare an execution plan using the PL/Python module, the plan is 
automatically saved. See the [Postgres Server Programming Interface 
(SPI)](http://www.postgresql.org/docs/8.2/static/spi.html) documentation for 
information about execution plans.
 
-To make effective use of saved plans across function calls you use one of the 
Python persistent storage dictionaries SD or GD.
+To make effective use of saved plans across function calls, you use one of the 
Python persistent storage dictionaries, SD or GD.
 
-The global dictionary SD is available to store data between function calls. 
This variable is private static data. The global dictionary GD is public data, 
available to all Python functions within a session. Use GD with care.
+The global dictionary SD is available to store data between function calls. 
This variable is private static data. The global dictionary GD is public data, 
and is available to all Python functions within a session. *Use GD with care*.
 
-Each function gets its own execution environment in the Python interpreter, so 
that global data and function arguments from myfunc are not available to 
`myfunc2`. The exception is the data in the GD dictionary, as mentioned 
previously.
+Each function gets its own execution environment in the Python interpreter, so 
that global data and function arguments from `myfunc1` are not available to 
`myfunc2`. The exception is the data in the GD dictionary, as mentioned 
previously.
 
-This example uses the SD dictionary:
+This example saves an execution plan to the SD dictionary and then executes 
the plan:
 
 ```sql
-CREATE FUNCTION usesavedplan() RETURNS trigger AS $$
-  if SD.has_key("plan"):
-    plan = SD["plan"]
-  else:
-    plan = plpy.prepare("SELECT 1")
-    SD["plan"] = plan
+=# CREATE FUNCTION usesavedplan() RETURNS text AS $$
+     select1plan = plpy.prepare("SELECT region FROM sales WHERE id=1")
+     SD["s1plan"] = select1plan
+     # other function processing
+     # execute the saved plan
+     rv = plpy.execute(SD["s1plan"])
+     return rv[0]["region"]
+   $$ LANGUAGE plpythonu;
+=# SELECT usesavedplan();
+```
+
+##### <a id="plpyprepare_example"></a>Example: plpy.prepare()
+
+Example: Use `plpy.prepare()` and `plpy.execute()` to prepare and run an 
execution plan using the GD dictionary:
+
+1. Define a PL/Python UDF to prepare and save an execution plan to the GD. 
Also  return the name of the plan:
+
+    ``` sql
+    =# CREATE OR REPLACE FUNCTION mypy_prepplan() 
+         RETURNS text 
+       AS $$ 
+         plan = plpy.prepare("SELECT * FROM sales WHERE region = $1 ORDER BY 
id", [ "text" ])
+         GD["getregionplan"] = plan
+         return "getregionplan"
+       $$ LANGUAGE plpythonu;
+    ```
+
+    This UDF, when run, will return the name (key) of the execution plan 
generated from the `plpy.prepare()` call.
+
+1. Define a PL/Python UDF to run the execution plan; this function will take 
the plan name and `region` name as an input:
 
-  # rest of function
+    ``` sql
+    =# CREATE OR REPLACE FUNCTION mypy_execplan(planname text, regionname text)
+         RETURNS integer 
+       AS $$ 
+         rv = plpy.execute(GD[planname], [ regionname ], 5)
+         year = rv[0]["year"]
+         return year
+       $$ LANGUAGE plpythonu STRICT;
+    ```
 
-$$ LANGUAGE plpythonu;
-```
+    This UDF executes the `planname` plan that was previously saved to the GD. 
You will call `mypy_execplan()` with the `planname` returned from the 
`plpy.prepare()` call.
 
-## <a id="pythonerrors"></a>Handling Python Errors and Messages 
+3. Execute the `mypy_prepplan()` and `mypy_execplan()` UDFs, passing `region` 
`usa`:
 
-The message functions `plpy.error` and `plpy.fatal` raise a Python exception 
which, if uncaught, propagates out to the calling query, causing the current 
transaction or subtransaction to be aborted. The functions raise 
`plpy.ERROR(msg)` and raise `plpy.FATAL(msg)` are equivalent to calling 
`plpy.error` and `plpy.fatal`, respectively. The other message functions only 
generate messages of different priority levels.
+    ``` sql
+    =# SELECT mypy_execplan( mypy_prepplan(), 'usa' );
+     mypy_execplan
+    ---------------
+         2014
+    (1 row)
+    ```
 
-Whether messages of a particular priority are reported to the client, written 
to the server log, or both is controlled by the HAWQ server configuration 
parameters `log_min_messages` and `client_min_messages`. For information about 
the parameters, see the [Server Configuration Parameter 
Reference](../reference/HAWQSiteConfig.html).
+### <a id="pythonerrors"></a>Handling Python Errors and Messages 
 
-## <a id="dictionarygd"></a>Using the Dictionary GD to Improve PL/Python 
Performance 
+The `plpy` module implements the following message- and error-related 
functions, each of which takes a message string as an argument:
 
-In terms of performance, importing a Python module is an expensive operation 
and can affect performance. If you are importing the same module frequently, 
you can use Python global variables to load the module on the first invocation 
and not require importing the module on subsequent calls. The following 
PL/Python function uses the GD persistent storage dictionary to avoid importing 
a module if it has already been imported and is in the GD.
+- `plpy.debug(msg)`
+- `plpy.log(msg)`
+- `plpy.info(msg)`
+- `plpy.notice(msg)`
+- `plpy.warning(msg)`
+- `plpy.error(msg)`
+- `plpy.fatal(msg)`
 
-```sql
-psql=#
-   CREATE FUNCTION pytest() RETURNS text AS $$ 
-      if 'mymodule' not in GD:
-        import mymodule
-        GD['mymodule'] = mymodule
-    return GD['mymodule'].sumd([1,2,3])
-$$ LANGUAGE plpythonu;
-```
+`plpy.error()` and `plpy.fatal()` raise a Python exception which, if uncaught, 
propagates out to the calling query, possibly aborting the current transaction 
or subtransaction. `raise plpy.ERROR(msg)` and `raise plpy.FATAL(msg)` are 
equivalent to calling `plpy.error()` and `plpy.fatal()`, respectively. Use the 
other message functions to generate messages of different priority levels.
 
-## <a id="installpythonmodules"></a>Installing Python Modules 
+Messages may be reported to the client and/or written to the HAWQ server log 
file.  The HAWQ server configuration parameters 
[`log_min_messages`](../reference/guc/parameter_definitions.html#log_min_messages)
 and 
[`client_min_messages`](../reference/guc/parameter_definitions.html#client_min_messages)
 control where messages are reported.
 
-HAWQ is configured to use the system Python. When you install a Python module 
on HAWQ, you must add the module to all segment hosts in the cluster. When 
expanding HAWQ, you must add Python modules to the new segment hosts. You can 
use the HAWQ utilities `hawq ssh` and `hawq scp` to run commands on HAWQ hosts 
and copy files to the hosts. For information about the utilities, see the [HAWQ 
Management Tools Reference](../reference/cli/management_tools.html).
- 
-If you are building a Python module, you must ensure that the build creates 
the correct executable. For example on a Linux system, the build should create 
a 64-bit executable.
+#### <a id="plpymessages_example"></a>Example: Generating Messages
 
-Before building a Python module prior to installation, ensure that the 
appropriate software to build the module is installed and properly configured. 
The build environment is required only on the host where you build the module.
+In this example, you will create a PL/Python UDF that includes some debug log 
messages. You will also configure your `psql` session to enable debug-level 
client logging.
 
-These are examples of installing and testing Python modules:
+1. Define a PL/Python UDF that executes a query that will return at most 5 
rows from the `sales` table. Invoke the `plpy.debug()` method to display some 
additional information:
 
-- Simple Python Module Installation Example (setuptools)
-- Complex Python Installation Example (NumPy)
-- Testing Installed Python Modules
+    ``` sql
+    =# CREATE OR REPLACE FUNCTION mypytest_debug(a integer) 
+         RETURNS text 
+       AS $$ 
+         plpy.debug('mypytest_debug executing query:  SELECT * FROM sales 
ORDER BY id')
+         rv = plpy.execute("SELECT * FROM sales ORDER BY id", 5)
+         plpy.debug('mypytest_debug: query returned ' + str(rv.nrows()) + ' 
rows')
+         region = rv[a]["region"]
+         return region
+       $$ LANGUAGE plpythonu;
+    ```
 
-### <a id="simpleinstall"></a>Simple Python Module Installation Example 
(setuptools) 
+2. Execute the `mypytest_debug()` UDF, passing the integer `2` as an argument:
 
-This example manually installs the Python `setuptools` module from the Python 
Package Index repository. The module lets you easily download, build, install, 
upgrade, and uninstall Python packages.
+    ```sql
+    =# SELECT mypytest_debug(2);
+     mypytest_debug 
+    ----------------
+     asia
+    (1 row)
+    ```
 
-This example first builds the module from a package and installs the module on 
a single host. Then the module is built and installed on segment hosts.
+3. Enable `DEBUG2` level client logging:
 
-Get the module package from the Python Package Index site. For example, run 
this `wget` command on a HAWQ host as the gpadmin user to get the tar.gz file.
+    ``` sql
+    =# SET client_min_messages=DEBUG2;
+    ```
+    
+2. Execute the `mypytest_debug()` UDF again:
 
-```bash
-$ wget --no-check-certificate 
https://pypi.python.org/packages/source/s/setuptools/setuptools-18.4.tar.gz
-```
+    ```sql
+    =# SELECT mypytest_debug(2);
+    ...
+    DEBUG2:  mypytest_debug executing query:  SELECT * FROM sales ORDER BY id
+    ...
+    DEBUG2:  mypytest_debug: query returned 5 rows
+    ...
+    ```
 
-Extract the files from the tar.gz file.
+    Debug output is very verbose. You will parse a lot of output to find the 
`mypytest_debug` messages. *Hint*: look both near the start and end of the 
output.
+    
+6. Turn off client-level debug logging:
 
-```bash
-$ tar -xzvf setuptools-18.4.tar.gz
-```
+    ```sql
+    =# SET client_min_messages=NOTICE;
+    ```
 
-Go to the directory that contains the package files, and run the Python 
scripts to build and install the Python package.
+## <a id="pythonmodules-3rdparty"></a>3rd-Party Python Modules 
 
-```bash
-$ cd setuptools-18.4
-$ python setup.py build && python setup.py install
-```
+PL/Python supports installation and use of 3rd-party Python Modules. This 
section includes examples for installing the `setuptools` and NumPy Python 
modules.
 
-The following Python command returns no errors if the module is available to 
Python.
+**Note**: You must have superuser privileges to install Python modules to the 
system Python directories.
 
-```bash
-$ python -c "import setuptools"
-```
+### <a id="simpleinstall"></a>Example: Installing setuptools 
 
-Copy the package to the HAWQ hosts with the `hawq scp` utility. For example, 
this command copies the tar.gz file from the current host to the host systems 
listed in the file `hawq-hosts`.
+In this example, you will manually install the Python `setuptools` module from 
the Python Package Index repository. `setuptools` enables you to easily 
download, build, install, upgrade, and uninstall Python packages.
 
-```bash
-$ hawq scp -f hawq-hosts setuptools-18.4.tar.gz =:/home/gpadmin
-```
+You will first build the module from the downloaded package, installing it on 
a single host. You will then build and install the module on all segment nodes 
in your HAWQ cluster.
 
-Run the commands to build, install, and test the package with `hawq ssh` 
utility on the hosts listed in the file `hawq-hosts`. The file `hawq-hosts` 
lists all the remote HAWQ segment hosts:
+1. Download the `setuptools` module package from the Python Package Index 
site. For example, run this `wget` command on a HAWQ node as the `gpadmin` user:
 
-```bash
-$ hawq ssh -f hawq-hosts
->>> tar -xzvf setuptools-18.4.tar.gz
->>> cd setuptools-18.4
->>> python setup.py build && python setup.py install
->>> python -c "import setuptools"
->>> exit
-```
+    ``` shell
+    $ ssh gpadmin@<hawq-node>
+    gpadmin@hawq-node$ . /usr/local/hawq/greenplum_path.sh
+    gpadmin@hawq-node$ mkdir plpython_pkgs
+    gpadmin@hawq-node$ cd plpython_pkgs
+    gpadmin@hawq-node$ export PLPYPKGDIR=`pwd`
+    gpadmin@hawq-node$ wget --no-check-certificate 
https://pypi.python.org/packages/source/s/setuptools/setuptools-18.4.tar.gz
+    ```
 
-The `setuptools` package installs the `easy_install` utility that lets you 
install Python packages from the Python Package Index repository. For example, 
this command installs Python PIP utility from the Python Package Index site.
+2. Extract the files from the `tar.gz` package:
 
-```shell
-$ cd setuptools-18.4
-$ easy_install pip
-```
+    ``` shell
+    gpadmin@hawq-node$ tar -xzvf setuptools-18.4.tar.gz
+    ```
 
-You can use the `hawq ssh` utility to run the `easy_install` command on all 
the HAWQ segment hosts.
+3. Run the Python scripts to build and install the Python package; you must 
have superuser privileges to install Python modules to the system Python 
installation:
 
-### <a id="complexinstall"></a>Complex Python Installation Example (NumPy) 
+    ``` shell
+    gpadmin@hawq-node$ cd setuptools-18.4
+    gpadmin@hawq-node$ python setup.py build 
+    gpadmin@hawq-node$ sudo python setup.py install
+    ```
 
-This example builds and installs the Python module NumPy. NumPy is a module 
for scientific computing with Python. For information about NumPy, see 
[http://www.numpy.org/](http://www.numpy.org/).
+4. Run the following command to verify the module is available to Python:
 
-Building the NumPy package requires this software:
-- OpenBLAS libraries, an open source implementation of BLAS (Basic Linear 
Algebra Subprograms).
-- The gcc compilers: gcc, gcc-gfortran, and gcc-c++. The compilers are 
required to build the OpenBLAS libraries. See [OpenBLAS 
Prerequisites](#openblasprereq).
+    ``` shell
+    gpadmin@hawq-node$ python -c "import setuptools"
+    ```
+    
+    If no error is returned, the `setuptools` module was successfully imported.
 
-This example process assumes `yum` is installed on all HAWQ segment hosts and 
the `gpadmin` user is a member of `sudoers` with `root` privileges on the hosts.
+5. The `setuptools` package installs the `easy_install` utility. This utility 
enables you to install Python packages from the Python Package Index 
repository. For example, this command installs the Python `pip` utility from 
the Python Package Index site:
 
-Download the OpenBLAS and NumPy source files. For example, these `wget` 
commands download tar.gz files into the directory packages:
+    ``` shell
+    gpadmin@hawq-node$ sudo easy_install pip
+    ```
 
-```bash
-$ wget --directory-prefix=packages 
http://github.com/xianyi/OpenBLAS/tarball/v0.2.8
-$ wget --directory-prefix=packages 
http://sourceforge.net/projects/numpy/files/NumPy/1.8.0/numpy-1.8.0.tar.gz/download
-```
+5. Copy the `setuptools` package to all HAWQ nodes in your cluster. For 
example, this command copies the `tar.gz` file from the current host to the 
host systems listed in the file `hawq-hosts`:
 
-Distribute the software to the HAWQ hosts. For example, if you download the 
software to `/home/gpadmin/packages`, these commands create the directory on 
the hosts and copies the software to hosts for the hosts listed in the 
`hawq-hosts` file.
+    ``` shell
+    gpadmin@hawq-node$ cd $PLPYPKGDIR
+    gpadmin@hawq-node$ hawq scp -f hawq-hosts setuptools-18.4.tar.gz 
=:/home/gpadmin
+    ```
 
-```bash
-$ hawq ssh -f hawq-hosts mkdir packages 
-$ hawq scp -f hawq-hosts packages/* =:/home/gpadmin/packages
-```
+6. Run the commands to build, install, and test the `setuptools` package you 
just copied to all hosts in your HAWQ cluster. For example:
 
-#### <a id="openblasprereq"></a>OpenBLAS Prerequisites 
+    ``` shell
+    gpadmin@hawq-node$ hawq ssh -f hawq-hosts
+    >>> mkdir plpython_pkgs
+    >>> cd plpython_pkgs
+    >>> tar -xzvf ../setuptools-18.4.tar.gz
+    >>> cd setuptools-18.4
+    >>> python setup.py build 
+    >>> sudo python setup.py install
+    >>> python -c "import setuptools"
+    >>> exit
+    ```
 
-1. If needed, use `yum` to install gcc compilers from system repositories. The 
compilers are required on all hosts where you compile OpenBLAS:
+### <a id="complexinstall"></a>Example: Installing NumPy 
 
-       ```bash
-       $ sudo yum -y install gcc gcc-gfortran gcc-c++
-       ```
+In this example, you will build and install the Python module NumPy. NumPy is 
a module for scientific computing with Python. For additional information about 
NumPy, refer to [http://www.numpy.org/](http://www.numpy.org/).
 
-       **Note:** If you cannot install the correct compiler versions with 
`yum`, you can download the gcc compilers, including gfortran, from source and 
install them.
+This example assumes `yum` is installed on all HAWQ segment nodes and that the 
`gpadmin` user is a member of `sudoers` with `root` privileges on the nodes.
 
-       These two commands download and install the compilers:
+#### <a id="complexinstall_prereq"></a>Prerequisites
+Building the NumPy package requires the following software:
 
-       ```bash
-       $ wget http://gfortran.com/download/x86_64/snapshots/gcc-4.4.tar.xz
-       $ tar xf gcc-4.4.tar.xz -C /usr/local/
-       ```
+- OpenBLAS libraries - an open source implementation of BLAS (Basic Linear 
Algebra Subprograms)
+- Python development packages - python-devel
+- gcc compilers - gcc, gcc-gfortran, and gcc-c++
+
+Perform the following steps to set up the OpenBLAS compilation environment on 
each HAWQ node:
 
-       If you installed `gcc` manually from a tar file, add the new `gcc` 
binaries to `PATH` and `LD_LIBRARY_PATH`:
+1. Use `yum` to install gcc compilers from system repositories. The compilers 
are required on all hosts where you compile OpenBLAS.  For example:
 
-       ```bash
-       $ export PATH=$PATH:/usr/local/gcc-4.4/bin
-       $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/gcc-4.4/lib
+       ``` shell
+       root@hawq-node$ yum -y install gcc gcc-gfortran gcc-c++ python-devel
        ```
 
-2. Create a symbolic link to `g++` and call it `gxx`:
+2. (Optionally required) If you cannot install the correct compiler versions 
with `yum`, you have the option to download the gcc compilers, including 
`gfortran`, from source and build and install them manually. Refer to [Building 
gfortran from Source](https://gcc.gnu.org/wiki/GFortranBinaries#FromSource) for 
`gfortran` build and install information.
 
-       ```bash
-       $ sudo ln -s /usr/bin/g++ /usr/bin/gxx
+2. Create a symbolic link to `g++`, naming it `gxx`:
+
+       ``` bash
+       root@hawq-node$ ln -s /usr/bin/g++ /usr/bin/gxx
        ```
 
-3. You might also need to create symbolic links to any libraries that have 
different versions available for example `libppl_c.so.4` to `libppl_c.so.2`.
+3. You may also need to create symbolic links to any libraries that have 
different versions available; for example, linking `libppl_c.so.4` to 
`libppl_c.so.2`.
 
-4. If needed, you can use the `hawq scp` utility to copy files to HAWQ hosts 
and the `hawq ssh` utility to run commands on the hosts.
+4. You can use the `hawq scp` utility to copy files to HAWQ hosts and the 
`hawq ssh` utility to run commands on those hosts.
 
-#### <a id="buildopenblas"></a>Build and Install OpenBLAS Libraries 
 
-Before build and install the NumPy module, you install the OpenBLAS libraries. 
This section describes how to build and install the libraries on a single host.
+#### <a id="complexinstall_downdist"></a>Obtaining Packages
 
-1. Extract the OpenBLAS files from the file. These commands extract the files 
from the OpenBLAS tar file and simplify the directory name that contains the 
OpenBLAS files.
+Perform the following steps to download and distribute the OpenBLAS and NumPy 
source packages:
 
-       ```bash
-       $ tar -xzf packages/v0.2.8 -C /home/gpadmin/packages
-       $ mv /home/gpadmin/packages/xianyi-OpenBLAS-9c51cdf 
/home/gpadmin/packages/     OpenBLAS
-       ```
+1. Download the OpenBLAS and NumPy source files. For example, these `wget` 
commands download `tar.gz` files into a `packages` directory in the current 
working directory:
 
-2. Compile OpenBLAS. These commands set the LIBRARY_PATH environment variable 
and run the make command to build OpenBLAS libraries.
+    ``` shell
+    $ ssh gpadmin@<hawq-node>
+    gpadmin@hawq-node$ wget --directory-prefix=packages 
http://github.com/xianyi/OpenBLAS/tarball/v0.2.8
+    gpadmin@hawq-node$ wget --directory-prefix=packages 
http://sourceforge.net/projects/numpy/files/NumPy/1.8.0/numpy-1.8.0.tar.gz/download
+    ```
 
-       ```bash
-       $ cd /home/gpadmin/packages/OpenBLAS
-       $ export LIBRARY_PATH=$LD_LIBRARY_PATH
-       $ make FC=gfortran USE_THREAD=0
-       ```
+2. Distribute the software to all nodes in your HAWQ cluster. For example, if 
you downloaded the software to `/home/gpadmin/packages`, these commands create 
the `packages` directory on all nodes and copies the software to the nodes 
listed in the `hawq-hosts` file:
 
-3. Use these commands to install the OpenBLAS libraries in `/usr/local` as 
`root`, and then change the owner of the files to `gpadmin`.
+    ``` shell
+    gpadmin@hawq-node$ hawq ssh -f hawq-hosts mkdir packages 
+    gpadmin@hawq-node$ hawq scp -f hawq-hosts packages/* 
=:/home/gpadmin/packages
+    ```
+
+#### <a id="buildopenblas"></a>Build and Install OpenBLAS Libraries 
 
-       ```bash
-       $ cd /home/gpadmin/packages/OpenBLAS/
-       $ sudo make PREFIX=/usr/local install
-       $ sudo ldconfig
-       $ sudo chown -R gpadmin /usr/local/lib
+Before building and installing the NumPy module, you must first build and 
install the OpenBLAS libraries. This section describes how to build and install 
the libraries on a single HAWQ node.
+
+1. Extract the OpenBLAS files from the file:
+
+       ``` shell
+       $ ssh gpadmin@<hawq-node>
+       gpadmin@hawq-node$ cd packages
+       gpadmin@hawq-node$ tar xzf v0.2.8 -C /home/gpadmin/packages
+       gpadmin@hawq-node$ mv /home/gpadmin/packages/xianyi-OpenBLAS-9c51cdf 
/home/gpadmin/packages/OpenBLAS
        ```
+       
+       These commands extract the OpenBLAS tar file and simplify the unpacked 
directory name.
 
-       The following libraries are installed, along with symbolic links:
+2. Compile OpenBLAS. You must set the `LIBRARY_PATH` environment variable to 
the current `$LD_LIBRARY_PATH`. For example:
 
-       ```bash
-       libopenblas.a -> libopenblas_sandybridge-r0.2.8.a
-       libopenblas_sandybridge-r0.2.8.a
-       libopenblas_sandybridge-r0.2.8.so
-       libopenblas.so -> libopenblas_sandybridge-r0.2.8.so
-       libopenblas.so.0 -> libopenblas_sandybridge-r0.2.8.so
+       ``` shell
+       gpadmin@hawq-node$ cd OpenBLAS
+       gpadmin@hawq-node$ export LIBRARY_PATH=$LD_LIBRARY_PATH
+       gpadmin@hawq-node$ make FC=gfortran USE_THREAD=0 TARGET=SANDYBRIDGE
        ```
+       
+       Replace the `TARGET` argument with the target appropriate for your 
hardware. The `TargetList.txt` file identifies the list of supported OpenBLAS 
targets.
+       
+       Compiling OpenBLAS make take some time.
 
-4. You can use the `hawq ssh` utility to build and install the OpenBLAS 
libraries on multiple hosts.
+3. Install the OpenBLAS libraries in `/usr/local` and then change the owner of 
the files to `gpadmin`. You must have `root` privileges. For example:
 
-       All HAWQ hosts (master and segment hosts) have identical 
configurations. You can copy the OpenBLAS libraries from the system where they 
were built instead of building the OpenBlas libraries on all the hosts. For 
example, these `hawq ssh` and `hawq scp` commands copy and install the OpenBLAS 
libraries on the hosts listed in the `hawq-hosts` file.
+       ``` shell
+       gpadmin@hawq-node$ sudo make PREFIX=/usr/local install
+       gpadmin@hawq-node$ sudo ldconfig
+       gpadmin@hawq-node$ sudo chown -R gpadmin /usr/local/lib
+       ```
 
-```bash
-$ hawq ssh -f hawq-hosts -e 'sudo yum -y install gcc gcc-gfortran gcc-c++'
-$ hawq ssh -f hawq-hosts -e 'ln -s /usr/bin/g++ /usr/bin/gxx'
-$ hawq ssh -f hawq-hosts -e sudo chown gpadmin /usr/local/lib
-$ hawq scp -f hawq-hosts /usr/local/lib/libopen*sandy* =:/usr/local/lib
-```
-```bash
-$ hawq ssh -f hawq-hosts
->>> cd /usr/local/lib
->>> ln -s libopenblas_sandybridge-r0.2.8.a libopenblas.a
->>> ln -s libopenblas_sandybridge-r0.2.8.so libopenblas.so
->>> ln -s libopenblas_sandybridge-r0.2.8.so libopenblas.so.0
->>> sudo ldconfig
-```
+       The following libraries are installed to `/usr/local/lib`, along with 
symbolic links:
+
+       ``` shell
+       gpadmin@hawq-node$ ls -l gpadmin@hawq-node$
+           ...
+           libopenblas.a -> libopenblas_sandybridge-r0.2.8.a
+           libopenblas_sandybridge-r0.2.8.a
+           libopenblas_sandybridge-r0.2.8.so
+           libopenblas.so -> libopenblas_sandybridge-r0.2.8.so
+           libopenblas.so.0 -> libopenblas_sandybridge-r0.2.8.so
+           ...
+       ```
+
+4. Install the OpenBLAS libraries on all nodes in your HAWQ cluster. You can 
use the `hawq ssh` utility to similarly build and install the OpenBLAS 
libraries on each of the nodes. 
+
+    Or, you may choose to copy the OpenBLAS libraries you just built to all of 
the HAWQ cluster nodes. For example, these `hawq ssh` and `hawq scp` commands 
install prerequisite packages, and copy and install the OpenBLAS libraries on 
the hosts listed in the `hawq-hosts` file.
+
+    ``` shell
+    $ hawq ssh -f hawq-hosts -e 'sudo yum -y install gcc gcc-gfortran gcc-c++ 
python-devel'
+    $ hawq ssh -f hawq-hosts -e 'ln -s /usr/bin/g++ /usr/bin/gxx'
+    $ hawq ssh -f hawq-hosts -e sudo chown gpadmin /usr/local/lib
+    $ hawq scp -f hawq-hosts /usr/local/lib/libopen*sandy* =:/usr/local/lib
+    ```
+    ``` shell
+    $ hawq ssh -f hawq-hosts
+    >>> cd /usr/local/lib
+    >>> ln -s libopenblas_sandybridge-r0.2.8.a libopenblas.a
+    >>> ln -s libopenblas_sandybridge-r0.2.8.so libopenblas.so
+    >>> ln -s libopenblas_sandybridge-r0.2.8.so libopenblas.so.0
+    >>> sudo ldconfig
+   ```
 
 #### Build and Install NumPy <a name="buildinstallnumpy"></a>
 
 After you have installed the OpenBLAS libraries, you can build and install 
NumPy module. These steps install the NumPy module on a single host. You can 
use the `hawq ssh` utility to build and install the NumPy module on multiple 
hosts.
 
-1. Go to the packages subdirectory and get the NumPy module source and extract 
the files.
+1. Extract the NumPy module source files:
 
-       ```bash
-       $ cd /home/gpadmin/packages
-       $ tar -xzf numpy-1.8.0.tar.gz
+       ``` shell
+       gpadmin@hawq-node$ cd /home/gpadmin/packages
+       gpadmin@hawq-node$ tar xzf numpy-1.8.0.tar.gz
        ```
+       
+       Unpacking the `numpy-1.8.0.tar.gz` file creates a directory named 
`numpy-1.8.0` in the current directory.
 
-2. Set up the environment for building and installing NumPy.
+2. Set up the environment for building and installing NumPy:
 
-       ```bash
-       $ export BLAS=/usr/local/lib/libopenblas.a
-       $ export LAPACK=/usr/local/lib/libopenblas.a
-       $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
-       $ export LIBRARY_PATH=$LD_LIBRARY_PATH
+       ``` shell
+       gpadmin@hawq-node$ export BLAS=/usr/local/lib/libopenblas.a
+       gpadmin@hawq-node$ export LAPACK=/usr/local/lib/libopenblas.a
+       gpadmin@hawq-node$ export 
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
+       gpadmin@hawq-node$ export LIBRARY_PATH=$LD_LIBRARY_PATH
        ```
 
-3. Go to the NumPy directory and build and install NumPy. Building the NumPy 
package might take some time.
+3. Build and install NumPy. (Building the NumPy package might take some time.)
 
-       ```bash
-       $ cd numpy-1.8.0
-       $ python setup.py build
-       $ python setup.py install
+       ``` shell
+       gpadmin@hawq-node$ cd numpy-1.8.0
+       gpadmin@hawq-node$ python setup.py build
+       gpadmin@hawq-node$ sudo python setup.py install
        ```
 
-       **Note:** If the NumPy module did not successfully build, the NumPy 
build process might need a site.cfg that specifies the location of the OpenBLAS 
libraries. Create the file `site.cfg` in the NumPy package directory:
+       **Note:** If the NumPy module did not successfully build, the NumPy 
build process might need a `site.cfg` file that specifies the location of the 
OpenBLAS libraries. Create the `site.cfg` file in the NumPy package directory:
 
-       ```bash
-       $ cd ~/packages/numpy-1.8.0
-       $ touch site.cfg
+       ``` shell
+       gpadmin@hawq-node$ touch site.cfg
        ```
 
        Add the following to the `site.cfg` file and run the NumPy build 
command again:
 
-       <pre>
+       ``` pre
        [default]
        library_dirs = /usr/local/lib
 
@@ -386,19 +620,22 @@ After you have installed the OpenBLAS libraries, you can 
build and install NumPy
        libraries = openblas
        library_dirs = /usr/local/lib
        include_dirs = /usr/local/include
-       </pre>
+       ```
 
-4. The following Python command ensures that the module is available for 
import by Python on a host system.
+4. Verify that the NumPy module is available for import by Python:
 
-       ```bash
-       $ python -c "import numpy"
+       ``` shell
+       gpadmin@hawq-node$ cd $HOME
+       gpadmin@hawq-node$ python -c "import numpy"
        ```
+       
+       If no error is returned, the NumPy module was successfully imported.
 
-5. Similar to the simple module installation, use the `hawq ssh` utility to 
build, install, and test the module on HAWQ segment hosts.
+5. As performed in the `setuptools` Python module installation, use the `hawq 
ssh` utility to build, install, and test the NumPy module on all HAWQ nodes.
 
-5. The environment variables that are require to build the NumPy module are 
also required in the gpadmin user environment when running Python NumPy 
functions. You can use the `hawq ssh` utility with the `echo` command to add 
the environment variables to the `.bashrc` file. For example, these echo 
commands add the environment variables to the `.bashrc` file in the user home 
directory.
+5. The environment variables that were required to build the NumPy module are 
also required in the `gpadmin` runtime environment to run Python NumPy 
functions. You can use the `echo` command to add the environment variables to 
`gpadmin`'s `.bashrc` file. For example, the following `echo` commands add the 
environment variables to the `.bashrc` file in `gpadmin`'s home directory:
 
-       ```bash
+       ``` shell
        $ echo -e '\n#Needed for NumPy' >> ~/.bashrc
        $ echo -e 'export BLAS=/usr/local/lib/libopenblas.a' >> ~/.bashrc
        $ echo -e 'export LAPACK=/usr/local/lib/libopenblas.a' >> ~/.bashrc
@@ -406,136 +643,111 @@ After you have installed the OpenBLAS libraries, you 
can build and install NumPy
        $ echo -e 'export LIBRARY_PATH=$LD_LIBRARY_PATH' >> ~/.bashrc
        ```
 
-## <a id="testingpythonmodules"></a>Testing Installed Python Modules 
+    You can use the `hawq ssh` utility with these `echo` commands to add the 
environment variables to the `.bashrc` file on all nodes in your HAWQ cluster.
 
-You can create a simple PL/Python user-defined function (UDF) to validate that 
Python a module is available in HAWQ. This example tests the NumPy module.
+### <a id="testingpythonmodules"></a>Testing Installed Python Modules 
 
-This PL/Python UDF imports the NumPy module. The function returns SUCCESS if 
the module is imported, and FAILURE if an import error occurs.
+You can create a simple PL/Python user-defined function (UDF) to validate that 
a Python module is available in HAWQ. This example tests the NumPy module.
 
-```sql
-CREATE OR REPLACE FUNCTION plpy_test(x int)
-RETURNS text
-AS $$
-  try:
-      from numpy import *
-      return 'SUCCESS'
-  except ImportError, e:
-      return 'FAILURE'
-$$ language plpythonu;
-```
+1. Create a PL/Python UDF that imports the NumPy module:
 
-Create a table that contains data on each HAWQ segment instance. Depending on 
the size of your HAWQ installation, you might need to generate more data to 
ensure data is distributed to all segment instances.
+    ``` shell
+    gpadmin@hawq_node$ psql -d testdb
+    ```
+    ``` sql
+    =# CREATE OR REPLACE FUNCTION test_importnumpy(x int)
+       RETURNS text
+       AS $$
+         try:
+             from numpy import *
+             return 'SUCCESS'
+         except ImportError, e:
+             return 'FAILURE'
+       $$ LANGUAGE plpythonu;
+    ```
 
-```sql
-CREATE TABLE DIST AS (SELECT x FROM generate_series(1,50) x ) DISTRIBUTED 
RANDOMLY ;
-```
+    The function returns SUCCESS if the module is imported, and FAILURE if an 
import error occurs.
 
-This SELECT command runs the UDF on the segment hosts where data is stored in 
the primary segment instances.
+2. Create a table that loads data on each HAWQ segment instance:
 
-```sql
-SELECT gp_segment_id, plpy_test(x) AS status
-  FROM dist
-  GROUP BY gp_segment_id, status
-  ORDER BY gp_segment_id, status;
-```
+    ``` sql
+    => CREATE TABLE disttbl AS (SELECT x FROM generate_series(1,50) x ) 
DISTRIBUTED BY (x);
+    ```
+    
+    Depending upon the size of your HAWQ installation, you may need to 
generate a larger series to ensure data is distributed to all segment instances.
 
-The SELECT command returns SUCCESS if the UDF imported the Python module on 
the HAWQ segment instance. If the SELECT command returns FAILURE, you can find 
the segment host of the segment instance host. The HAWQ system table 
`gp_segment_configuration` contains information about segment configuration. 
This command returns the host name for a segment ID.
+3. Run the UDF on the segment nodes where data is stored in the primary 
segment instances.
 
-```sql
-SELECT hostname, content AS seg_ID FROM gp_segment_configuration
-  WHERE content = seg_id ;
-```
+    ``` sql
+    =# SELECT gp_segment_id, test_importnumpy(1) AS status
+         FROM disttbl
+         GROUP BY gp_segment_id, status
+         ORDER BY gp_segment_id, status;
+    ```
 
-If FAILURE is returned, these are some possible causes:
+    The `SELECT` command returns SUCCESS if the UDF imported the Python module 
on the HAWQ segment instance. FAILURE is returned if the Python module could 
not be imported.
+   
+
+#### <a id="testingpythonmodules"></a>Troubleshooting Python Module Import 
Failures
+
+Possible causes of a Python module import failure include:
 
 - A problem accessing required libraries. For the NumPy example, HAWQ might 
have a problem accessing the OpenBLAS libraries or the Python libraries on a 
segment host.
 
-       Make sure you get no errors when running command on the segment host as 
the gpadmin user. This hawq ssh command tests importing the NumPy module on the 
segment host mdw1.
+       *Try*: Test importing the module on the segment host. This `hawq ssh` 
command tests importing the NumPy module on the segment host named mdw1.
 
-       ```shell
-       $ hawq ssh -h mdw1 python -c "import numpy"
+       ``` shell
+       gpadmin@hawq-node$ hawq ssh -h mdw1 python -c "import numpy"
        ```
 
-- If the Python import command does not return an error, environment variables 
might not be configured in the HAWQ environment. For example, the variables are 
not in the `.bashrc` file, or HAWQ might not have been restarted after adding 
the environment variables to the `.bashrc` file.
+- Environment variables may not be configured in the HAWQ environment. The 
Python import command may not return an error in this case.
 
-       Ensure sure that the environment variables are properly set and then 
restart HAWQ. For the NumPy example, ensure the environment variables listed at 
the end of the section [Build and Install NumPy](#buildinstallnumpy) are 
defined in the `.bashrc` file for the gpadmin user on the master and segment 
hosts.
+       *Try*: Ensure that the environment variables are properly set. For the 
NumPy example, ensure that the environment variables listed at the end of the 
section [Build and Install NumPy](#buildinstallnumpy) are defined in the 
`.bashrc` file for the `gpadmin` user on the master and all segment nodes.
+       
+       **Note:** The `.bashrc` file for the `gpadmin` user on the HAWQ master 
and all segment nodes must source the `greenplum_path.sh` file.
 
-       **Note:** On HAWQ master and segment hosts, the `.bashrc` file for the 
gpadmin user must source the file `$GPHOME/greenplum_path.sh`.
+       
+- HAWQ might not have been restarted after adding environment variable 
settings to the `.bashrc` file. Again, the Python import command may not return 
an error in this case.
 
-## <a id="examples"></a>Examples 
+       *Try*: Ensure that you have restarted HAWQ.
+       
+       ``` shell
+       gpadmin@master$ hawq restart cluster
+       ```
 
-This PL/Python UDF returns the maximum of two integers:
+## <a id="dictionarygd"></a>Using the GD Dictionary to Improve PL/Python 
Performance 
 
-```sql
-CREATE FUNCTION pymax (a integer, b integer)
-  RETURNS integer
-AS $$
-  if (a is None) or (b is None):
-      return None
-  if a > b:
-     return a
-  return b
-$$ LANGUAGE plpythonu;
-```
+Importing a Python module is an expensive operation that can adversely affect 
performance. If you are importing the same module frequently, you can use 
Python global variables to import the module on the first invocation and forego 
loading the module on subsequent imports. 
 
-You can use the STRICT property to perform the null handling instead of using 
the two conditional statements.
+The following PL/Python function uses the GD persistent storage dictionary to 
avoid importing the module NumPy if it has already been imported in the GD. The 
UDF includes a call to `plpy.notice()` to display a message when importing the 
module.
 
-```sql
-CREATE FUNCTION pymax (a integer, b integer) 
-  RETURNS integer AS $$ 
-return max(a,b) 
-$$ LANGUAGE plpythonu STRICT ;
+``` sql
+=# CREATE FUNCTION mypy_import2gd() RETURNS text AS $$ 
+     if 'numpy' not in GD:
+       plpy.notice('mypy_import2gd: importing module numpy')
+       import numpy
+       GD['numpy'] = numpy
+     return 'numpy'
+   $$ LANGUAGE plpythonu;
 ```
-
-You can run the user-defined function pymax with SELECT command. This example 
runs the UDF and shows the output.
-
-```sql
-SELECT ( pymax(123, 43));
-column1
----------
-     123
+``` sql
+=# SELECT mypy_import2gd();
+NOTICE:  mypy_import2gd: importing module numpy
+CONTEXT:  PL/Python function "mypy_import2gd"
+ mypy_import2gd 
+----------------
+ numpy
 (1 row)
 ```
-
-This example that returns data from an SQL query that is run against a table. 
These two commands create a simple table and add data to the table.
-
-```sql
-CREATE TABLE sales (id int, year int, qtr int, day int, region text)
-  DISTRIBUTED BY (id) ;
-
-INSERT INTO sales VALUES
- (1, 2014, 1,1, 'usa'),
- (2, 2002, 2,2, 'europe'),
- (3, 2014, 3,3, 'asia'),
- (4, 2014, 4,4, 'usa'),
- (5, 2014, 1,5, 'europe'),
- (6, 2014, 2,6, 'asia'),
- (7, 2002, 3,7, 'usa') ;
-```
-
-This PL/Python UDF executes a SELECT command that returns 5 rows from the 
table. The Python function returns the REGION value from the row specified by 
the input value. In the Python function, the row numbering starts from 0. Valid 
input for the function is an integer between 0 and 4.
-
-```sql
-CREATE OR REPLACE FUNCTION mypytest(a integer) 
-  RETURNS text 
-AS $$ 
-  rv = plpy.execute("SELECT * FROM sales ORDER BY id", 5)
-  region = rv[a]["region"]
-  return region
-$$ language plpythonu;
-```
-
-Running this SELECT statement returns the REGION column value from the third 
row of the result set.
-
-```sql
-SELECT mypytest(2) ;
+``` sql
+=# SELECT mypy_import2gd();
+ mypy_import2gd 
+----------------
+ numpy
+(1 row)
 ```
 
-This command deletes the UDF from the database.
-
-```sql
-DROP FUNCTION mypytest(integer) ;
-```
+The second `SELECT` call does not include the `NOTICE` message, indicating 
that the module was obtained from the GD.
 
 ## <a id="references"></a>References 
 
@@ -543,29 +755,31 @@ This section lists references for using PL/Python.
 
 ### <a id="technicalreferences"></a>Technical References 
 
-For information about PL/Python see the PostgreSQL documentation at 
[http://www.postgresql.org/docs/8.2/static/plpython.html](http://www.postgresql.org/docs/8.2/static/plpython.html).
+For information about PL/Python in HAWQ, see the [PL/Python - Python 
Procedural Language](http://www.postgresql.org/docs/8.2/static/plpython.html) 
PostgreSQL documentation.
 
-For information about Python Package Index (PyPI), see 
[https://pypi.python.org/pypi](https://pypi.python.org/pypi).
+For information about Python Package Index (PyPI), refer to [PyPI - the Python 
Package Index](https://pypi.python.org/pypi).
 
-These are some Python modules that can be downloaded:
+The following Python modules may be of interest:
 
-- SciPy library provides user-friendly and efficient numerical routines such 
as routines for numerical integration and optimization 
[http://www.scipy.org/scipylib/index.html](http://www.scipy.org/scipylib/index.html).
 This wget command downloads the SciPy package tar file.
+- [SciPy library](http://www.scipy.org/scipylib/index.html) provides 
user-friendly and efficient numerical routines including those for numerical 
integration and optimization. To download the SciPy package tar file:
 
- ```shell
-$ wget http://sourceforge.net/projects/scipy/files/scipy/0.10.1/ 
scipy-0.10.1.tar.gz/download
-```
+    ``` shell
+    hawq-node$ wget 
http://sourceforge.net/projects/scipy/files/scipy/0.10.1/scipy-0.10.1.tar.gz
+    ```
 
-- Natural Language Toolkit (nltk) is a platform for building Python programs 
to work with human language data http://www.nltk.org/. This wget command 
downloads the nltk package tar file.
+- [Natural Language Toolkit](http://www.nltk.org/) (`nltk`) is a platform for 
building Python programs to work with human language data. 
 
- ```shell
-$ wget 
http://pypi.python.org/packages/source/n/nltk/nltk-2.0.2.tar.gz#md5=6e714ff74c3398e88be084748df4e657
- ```
+    The Python [`distribute`](https://pypi.python.org/pypi/distribute/0.6.21) 
package is required for `nltk`. The `distribute` package should be installed 
before installing `ntlk`. To download the `distribute` package tar file:
 
- **Note:** The Python package Distribute 
[https://pypi.python.org/pypi/](https://pypi.python.org/pypi/) distribute is 
required for `nltk`. The Distribute module should be installed before the 
`ntlk` package. This wget command downloads the Distribute package tar file.
+    ``` shell
+    hawq-node$ wget 
http://pypi.python.org/packages/source/d/distribute/distribute-0.6.21.tar.gz
+    ```
 
-```shell
-$ wget 
http://pypi.python.org/packages/source/d/distribute/distribute-0.6.21.tar.gz
-```
+    To download the `nltk` package tar file:
+
+    ``` shell
+    hawq-node$ wget 
http://pypi.python.org/packages/source/n/nltk/nltk-2.0.2.tar.gz#md5=6e714ff74c3398e88be084748df4e657
+    ```
 
 ### <a id="usefulreading"></a>Useful Reading 
 

Reply via email to