Github user dyozie commented on a diff in the pull request:
https://github.com/apache/incubator-hawq-docs/pull/77#discussion_r94512620
--- Diff: plext/using_plpython.html.md.erb ---
@@ -2,374 +2,608 @@
title: Using PL/Python in HAWQ
---
-This section contains an overview of the HAWQ PL/Python language extension.
+This section provides an overview of the HAWQ PL/Python procedural
language extension.
## <a id="abouthawqplpython"></a>About HAWQ PL/Python
-PL/Python is a loadable procedural language. With the HAWQ PL/Python
extension, you can write HAWQ user-defined functions in Python that take
advantage of Python features and modules to quickly build robust database
applications.
+PL/Python is embedded in your HAWQ product distribution or within your
HAWQ build if you chose to enable it as a build option.
+
+With the HAWQ PL/Python extension, you can write user-defined functions in
Python that take advantage of Python features and modules, enabling you to
quickly build robust HAWQ database applications.
HAWQ uses the system Python installation.
### <a id="hawqlimitations"></a>HAWQ PL/Python Limitations
-- HAWQ does not support PL/Python triggers.
+- HAWQ does not support PL/Python trigger functions.
- PL/Python is available only as a HAWQ untrusted language.
## <a id="enableplpython"></a>Enabling and Removing PL/Python Support
-To use PL/Python in HAWQ, you must either use a pre-compiled version of
HAWQ that includes PL/Python or specify PL/Python as a build option when
compiling HAWQ.
+To use PL/Python in HAWQ, you must either install a binary version of HAWQ
that includes PL/Python or specify PL/Python as a build option when compiling
HAWQ from source.
+
+PL/Python user-defined functions (UDFs) are registered at the database
level. To create and run a PL/Python UDF on a database, you must register the
PL/Python language with the database.
+
+On every database to which you want to install and enable PL/Python:
+
+1. Connect to the database using the `psql` client:
+
+ ``` shell
+ $ psql -d <dbname>
+ ```
+
+ Replace \<dbname\> with the name of the target database.
+
+2. Run the following SQL command to register the PL/Python procedural
language; you must be a database superuser to register new languages:
+
+ ``` sql
+ dbname=# CREATE LANGUAGE plpythonu;
+ ```
-To create and run a PL/Python user-defined function (UDF) in a database,
you must register the PL/Python language with the database. On every database
where you want to install and enable PL/Python, connect to the database using
the `psql` client.
+ **Note**: `plpythonu` is installed as an *untrusted* language; it
offers no way of restricting what you can program in UDFs created with the
language.
-```shell
-$ psql -d <dbname>
+To remove support for `plpythonu` from a database, run the following SQL
command; you must be a database superuser to remove a registered procedural
language:
+
+``` sql
+dbname=# DROP LANGUAGE plpythonu;
```
-Replace \<dbname\> with the name of the target database.
+## <a id="developfunctions"></a>Developing Functions with PL/Python
+
+PL/Python functions are defined using the standard SQL [CREATE
FUNCTION](../reference/sql/CREATE-FUNCTION.html) syntax.
+
+The body of a PL/Python user-defined function is a Python script. When the
function is called, its arguments are passed as elements of the array `args[]`.
You can also pass named arguments as ordinary variables to the Python script.
-Then, run the following SQL command:
+PL/Python function results are returned with a `return` statement, or a
`yield` statement in the case of a result-set statement.
-```shell
-psql# CREATE LANGUAGE plpythonu;
+The following PL/Python function computes and returns the maximum of two
integers:
+
+``` sql
+=> CREATE FUNCTION mypymax (a integer, b integer)
+ RETURNS integer
+ AS $$
+ if (a is None) or (b is None):
+ return None
+ if a > b:
+ return a
+ return b
+ $$ LANGUAGE plpythonu;
```
-Note that `plpythonu` is installed as an âuntrustedâ language, meaning
it does not offer any way of restricting what users can do in it.
+To execute the `mypymax` function:
-To remove support for `plpythonu` from a database, run the following SQL
command:
+``` sql
+=> SELECT mypymax(5, 7);
+ mypymax
+---------
+ 7
+(1 row)
+```
+
+Adding the `STRICT` keyword to the `LANGUAGE` subclause instructs HAWQ to
return null when any of the input arguments are null. When created as `STRICT`,
the function itself need not perform null checks.
-```shell
-psql# DROP LANGUAGE plpythonu;
+The following example uses an unnamed argument, the built-in Python
`max()` function, and the `STRICT` keyword to create a UDF named `mypymax2`:
+
+``` sql
+=> CREATE FUNCTION mypymax2 (a integer, integer)
+ RETURNS integer AS $$
+ return max(a, args[0])
+ $$ LANGUAGE plpythonu STRICT;
+=> SELECT mypymax(5, 3);
+ mypymax2
+----------
+ 5
+(1 row)
+=> SELECT mypymax(5, null);
+ mypymax2
+----------
+
+(1 row)
```
-## <a id="developfunctions"></a>Developing Functions with PL/Python
+## <a id="example_createtbl"></a>Preparing For Exercises
-The body of a PL/Python user-defined function is a Python script. When the
function is called, its arguments are passed as elements of the array `args[]`.
Named arguments are also passed as ordinary variables to the Python script. The
result is returned from the PL/Python function with return statement, or yield
statement in case of a result-set statement.
+Perform the following steps to create, and insert data into, a simple
table. This table will be used in later exercises.
-The HAWQ PL/Python language module imports the Python module `plpy`. The
module `plpy` implements these functions:
+1. Create a database named `testdb`:
-- Functions to execute SQL queries and prepare execution plans for queries.
- - `plpy.execute`
- - `plpy.prepare`
-
-- Functions to manage errors and messages.
- - `plpy.debug`
- - `plpy.log`
- - `plpy.info`
- - `plpy.notice`
- - `plpy.warning`
- - `plpy.error`
- - `plpy.fatal`
- - `plpy.debug`
+ ``` shell
+ gpadmin@hawq-node$ createdb testdb
+ ```
+
+1. Create a table named `sales`:
+
+ ``` shell
+ gpadmin@hawq-node$ psql -d testdb
+ ```
+ ``` sql
+ testdb=> CREATE TABLE sales (id int, year int, qtr int, day int,
region text)
+ DISTRIBUTED BY (id);
+ ```
+
+2. Insert data into the table:
+
+ ``` sql
+ testdb=> INSERT INTO sales VALUES
+ (1, 2014, 1,1, 'usa'),
+ (2, 2002, 2,2, 'europe'),
+ (3, 2014, 3,3, 'asia'),
+ (4, 2014, 4,4, 'usa'),
+ (5, 2014, 1,5, 'europe'),
+ (6, 2014, 2,6, 'asia'),
+ (7, 2002, 3,7, 'usa') ;
+ ```
+
+## <a id="pymod_intro"></a>Python Modules
+A Python module is a text file containing Python statements and
definitions. Python modules are named, with the file name for a module
following the `<python-module-name>.py` naming convention.
+
+Should you need to build a Python module, ensure that the appropriate
software is installed on the build system. Also be sure that you are building
for the correct deployment architecture, i.e. 64-bit.
+
+### <a id="pymod_intro_hawq"></a>HAWQ Considerations
+
+When installing a Python module in HAWQ, you must add the module to all
segment nodes in the cluster. You must also add all Python modules to any new
segment hosts when you expand your HAWQ cluster.
+
+PL/Python supports the built-in HAWQ Python module named `plpy`. You can
also install 3rd party Python modules.
+
+
+## <a id="modules_plpy"></a>plpy Module
+
+The HAWQ PL/Python procedural language extension automatically imports the
Python module `plpy`. `plpy` implements functions to execute SQL queries and
prepare execution plans for queries. The `plpy` module also includes functions
to manage errors and messages.
-## <a id="executepreparesql"></a>Executing and Preparing SQL Queries
+### <a id="executepreparesql"></a>Executing and Preparing SQL Queries
-The PL/Python `plpy` module provides two Python functions to execute an
SQL query and prepare an execution plan for a query, `plpy.execute` and
`plpy.prepare`. Preparing the execution plan for a query is useful if you run
the query from multiple Python functions.
+Use the PL/Python `plpy` module `plpy.execute()` function to execute an
SQL query. Use the `plpy.prepare()` function to prepare an execution plan for a
query. Preparing the execution plan for a query is useful if you plan to run
the query from multiple Python functions.
--- End diff --
Change "an SQL" -> "a SQL" I'm pretty sure the HAWQ docs are otherwise
consistent in this usage.
Also, change "if you plan to" to "if you want to"
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---