Repository: incubator-hawq-docs Updated Branches: refs/heads/develop dc9869600 -> 740b6ee69
add ambari procedures for all "hawq config" references Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/740b6ee6 Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/740b6ee6 Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/740b6ee6 Branch: refs/heads/develop Commit: 740b6ee698461c73ddfa2fd2cfdce889b1c13f42 Parents: dc98696 Author: Lisa Owen <[email protected]> Authored: Thu Nov 10 09:48:19 2016 -0800 Committer: David Yozie <[email protected]> Committed: Tue Nov 15 11:56:15 2016 -0800 ---------------------------------------------------------------------- clientaccess/client_auth.html.md.erb | 41 ++-- clientaccess/roles_privs.html.md.erb | 61 +++--- datamgmt/about_statistics.html.md.erb | 38 +++- ddl/ddl-table.html.md.erb | 16 +- plext/using_pljava.html.md.erb | 225 +++++++++++++--------- query/gporca/query-gporca-enable.html.md.erb | 44 ++++- reference/sql/VACUUM.html.md.erb | 2 +- 7 files changed, 260 insertions(+), 167 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/740b6ee6/clientaccess/client_auth.html.md.erb ---------------------------------------------------------------------- diff --git a/clientaccess/client_auth.html.md.erb b/clientaccess/client_auth.html.md.erb index d162e71..a13f4e1 100644 --- a/clientaccess/client_auth.html.md.erb +++ b/clientaccess/client_auth.html.md.erb @@ -80,11 +80,11 @@ This example shows how to edit the `pg_hba.conf` file of the master to allow rem HAWQ allocates some resources on a per-connection basis, so setting the maximum number of connections allowed is recommended. -To limit the number of active concurrent sessions to your HAWQ system, you can configure the `max_connections` server configuration parameter on master or the `seg_max_connections` server configuration parameter on segments. These parameters are *local* parameters, meaning that you must set them in the `hawq-site.xml` file of all HAWQ instances. The current recommendation is to set the value of `seg_max_connections` to 5-10 the value of `max_connections`. +To limit the number of active concurrent sessions to your HAWQ system, you can configure the `max_connections` server configuration parameter on master or the `seg_max_connections` server configuration parameter on segments. These parameters are *local* parameters, meaning that you must set them in the `hawq-site.xml` file of all HAWQ instances. When you set `max_connections`, you must also set the dependent parameter `max_prepared_transactions`. This value must be at least as large as the value of `max_connections`, and all HAWQ instances should be set to the same value. -For example, in `$GPHOME/etc/hawq-site.xml`: +Example `$GPHOME/etc/hawq-site.xml` configuration: ``` xml <property> @@ -101,42 +101,43 @@ For example, in `$GPHOME/etc/hawq-site.xml`: </property> ``` -The following steps set the parameter values with the HAWQ utility `hawq config`. +**Note:** Raising the values of these parameters may cause HAWQ to request more shared memory. To mitigate this effect, consider decreasing other memory-related server configuration parameters such as [gp\_cached\_segworkers\_threshold](../reference/guc/parameter_definitions.html#gp_cached_segworkers_threshold). -### <a id="ip142411"></a>To change the number of allowed connections -1. Log into the HAWQ master host as the HAWQ administrator and source the file `/usr/local/hawq/greenplum_path.sh`. -2. Set the value of the `max_connections` parameter. This `hawq config` command sets the value to 100 on all HAWQ instances. +### <a id="ip142411"></a>Setting the number of allowed connections - ``` bash - $ hawq config -c max_connections -v 100 - ``` +You will perform different procedures to set connection-related server configuration parameters for your HAWQ cluster depending upon whether you manage your cluster from the command line or use Ambari. If you use Ambari to manage your HAWQ cluster, you must ensure that you update server configuration parameters only via the Ambari Web UI. If you manage your HAWQ cluster from the command line, you will use the `hawq config` command line utility to set server configuration parameters. - This configuration command sets the value for the master. +If you use Ambari to manage your cluster: -3. Set the value of the `seg_max_connections` parameter. This `hawq config` command sets the value to 6400 on all HAWQ instances. +1. Set the `max_connections`, `seg_max_connections`, and `max_prepared_transactions` configuration properties via the HAWQ service **Configs > Advanced > Custom hawq-site** drop down. +2. Select **Service Actions > Restart All** to load the updated configuration. - ``` bash - $ hawq config -c seg_max_connections -v 6400 - ``` +If you manage your cluster from the command line: - This configuration command sets the value for all segment instances. +1. Log in to the HAWQ master host as a HAWQ administrator and source the file `/usr/local/hawq/greenplum_path.sh`. -4. Set the value of the `max_prepared_transactions` parameter. This `hawq config` command sets the value to 200 for the master and all segments. + ``` shell + $ source /usr/local/hawq/greenplum_path.sh + ``` + +2. Use the `hawq config` utility to set the values of the `max_connections`, `seg_max_connections`, and `max_prepared_transactions` parameters to values appropriate for your deployment. For example: ``` bash + $ hawq config -c max_connections -v 100 + $ hawq config -c seg_max_connections -v 6400 $ hawq config -c max_prepared_transactions -v 200 ``` The value of `max_prepared_transactions` must be greater than or equal to `max_connections`. -5. Stop and restart your HAWQ system. +5. Load the new configuration values by restarting your HAWQ cluster: ``` bash - $ hawq restart + $ hawq restart cluster ``` -6. You can check the value of parameters on the master and segments with the `hawq config``-s` option. These `hawq config` commands display the values of the `max_connections` and `seg_max_connections` parameter. +6. Use the `-s` option to `hawq config` to display server configuration parameter values: ``` bash $ hawq config -s max_connections @@ -144,8 +145,6 @@ The following steps set the parameter values with the HAWQ utility `hawq config` ``` -**Note:** Raising the values of these parameters may cause HAWQ to request more shared memory. To mitigate this effect, consider decreasing other memory-related server configuration parameters such as [gp\_cached\_segworkers\_threshold](../reference/guc/parameter_definitions.html#gp_cached_segworkers_threshold). - ## <a id="topic5"></a>Encrypting Client/Server Connections Enable SSL for client connections to HAWQ to encrypt the data passed over the network between the client and the database. http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/740b6ee6/clientaccess/roles_privs.html.md.erb ---------------------------------------------------------------------- diff --git a/clientaccess/roles_privs.html.md.erb b/clientaccess/roles_privs.html.md.erb index 2738dd3..66f56d7 100644 --- a/clientaccess/roles_privs.html.md.erb +++ b/clientaccess/roles_privs.html.md.erb @@ -159,65 +159,68 @@ You can set your chosen encryption method system-wide or on a per-session basis. #### <a id="topic11"></a>System-wide -To set the `password_hash_algorithm` server parameter on a complete HAWQ system \(master and its segments\): +You will perform different procedures to set the encryption method (`password_hash_algorithm` server parameter) system-wide depending upon whether you manage your cluster from the command line or use Ambari. If you use Ambari to manage your HAWQ cluster, you must ensure that you update encryption method configuration parameters only via the Ambari Web UI. If you manage your HAWQ cluster from the command line, you will use the `hawq config` command line utility to set encryption method configuration parameters. -1. Log into your HAWQ instance as a superuser. -2. Execute `hawq config` with the `password_hash_algorithm` set to `SHA-256` \(or `SHA-256-FIPS` to use the FIPS-compliant libraries for SHA-256\) +If you use Ambari to manage your HAWQ cluster: - ``` bash - $ hawq config -c password_hash_algorithm -v 'SHA-256' - ``` +1. Set the `password_hash_algorithm` configuration property via the HAWQ service **Configs > Advanced > Custom hawq-site** drop down. Valid values include `SHA-256` \(or `SHA-256-FIPS` to use the FIPS-compliant libraries for SHA-256\). +2. Select **Service Actions > Restart All** to load the updated configuration. - or: +If you manage your HAWQ cluster from the command line: - ``` bash - $ hawq config -c password_hash_algorithm -v 'SHA-256-FIPS' +1. Log in to the HAWQ master host as a HAWQ administrator and source the file `/usr/local/hawq/greenplum_path.sh`. + + ``` shell + $ source /usr/local/hawq/greenplum_path.sh ``` -3. Verify the setting: +1. Use the `hawq config` utility to set `password_hash_algorithm` to `SHA-256` \(or `SHA-256-FIPS` to use the FIPS-compliant libraries for SHA-256\): - ``` bash - $ hawq config -s + ``` shell + $ hawq config -c password_hash_algorithm -v 'SHA-256' + ``` + + Or: + + ``` shell + $ hawq config -c password_hash_algorithm -v 'SHA-256-FIPS' ``` - You will see: +2. Reload the HAWQ configuration: - ``` - Master value: SHA-256 - Segment value: SHA-256 + ``` shell + $ hawq stop cluster -u ``` - or: +3. Verify the setting: + ``` bash + $ hawq config -s password_hash_algorithm ``` - Master value: SHA-256-FIPS - Segment value: SHA-256-FIPS - ``` - #### <a id="topic12"></a>Individual Session -To set the `password_hash_algorithm` server parameter for an individual session: +To set the `password_hash_algorithm` server parameter for an individual database session: -1. Log into your HAWQ instance as a superuser. +1. Log in to your HAWQ instance as a superuser. 2. Set the `password_hash_algorithm` to `SHA-256` \(or `SHA-256-FIPS` to use the FIPS-compliant libraries for SHA-256\): ``` sql - # SET password_hash_algorithm = 'SHA-256' + =# SET password_hash_algorithm = 'SHA-256' SET ``` or: ``` sql - # SET password_hash_algorithm = 'SHA-256-FIPS' + =# SET password_hash_algorithm = 'SHA-256-FIPS' SET ``` 3. Verify the setting: ``` sql - # SHOW password_hash_algorithm; + =# SHOW password_hash_algorithm; password_hash_algorithm ``` @@ -240,7 +243,7 @@ To set the `password_hash_algorithm` server parameter for an individual session: 4. Login in as a super user and verify the password hash algorithm setting: ``` sql - # SHOW password_hash_algorithm + =# SHOW password_hash_algorithm password_hash_algorithm ------------------------------- SHA-256-FIPS @@ -249,7 +252,7 @@ To set the `password_hash_algorithm` server parameter for an individual session: 5. Create a new role with password that has login privileges. ``` sql - CREATE ROLE testdb WITH PASSWORD 'testdb12345#' LOGIN; + =# CREATE ROLE testdb WITH PASSWORD 'testdb12345#' LOGIN; ``` 6. Change the client authentication method to allow for storage of SHA-256 encrypted passwords: @@ -276,7 +279,7 @@ To set the `password_hash_algorithm` server parameter for an individual session: 2. Execute the following: ``` sql - # SELECT rolpassword FROM pg_authid WHERE rolname = 'testdb'; + =# SELECT rolpassword FROM pg_authid WHERE rolname = 'testdb'; Rolpassword ----------- sha256<64 hexidecimal characters> http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/740b6ee6/datamgmt/about_statistics.html.md.erb ---------------------------------------------------------------------- diff --git a/datamgmt/about_statistics.html.md.erb b/datamgmt/about_statistics.html.md.erb index 25b3b24..5e2184a 100644 --- a/datamgmt/about_statistics.html.md.erb +++ b/datamgmt/about_statistics.html.md.erb @@ -142,11 +142,35 @@ There are several options for configuring HAWQ statistics collection. The statistics target is the size of the `most_common_vals`, `most_common_freqs`, and `histogram_bounds` arrays for an individual column. By default, the target is 25. The default target can be changed by setting a server configuration parameter and the target can be set for any column using the `ALTER TABLE` command. Larger values increase the time needed to do `ANALYZE`, but may improve the quality of the legacy query optimizer (planner) estimates. -Set the system default statistics target to a different value by setting the `default_statistics_target` server configuration parameter. The default value is usually sufficient, and you should only raise or lower it if your tests demonstrate that query plans improve with the new target. For example, to raise the default statistics target from 25 to 50 you can use the `hawq config` utility: +Set the system default statistics target to a different value by setting the `default_statistics_target` server configuration parameter. The default value is usually sufficient, and you should only raise or lower it if your tests demonstrate that query plans improve with the new target. -``` shell -$ hawq config -c default_statistics_target -v 50 -``` +You will perform different procedures to set server configuration parameters for your whole HAWQ cluster depending upon whether you manage your cluster from the command line or use Ambari. If you use Ambari to manage your HAWQ cluster, you must ensure that you update server configuration parameters via the Ambari Web UI only. If you manage your HAWQ cluster from the command line, you will use the `hawq config` command line utility to set server configuration parameters. + +The following examples show how to raise the default statistics target from 25 to 50. + +If you use Ambari to manage your HAWQ cluster: + +1. Set the `default_statistics_target` configuration property to `50` via the HAWQ service **Configs > Advanced > Custom hawq-site** drop down. +2. Select **Service Actions > Restart All** to load the updated configuration. + +If you manage your HAWQ cluster from the command line: + +1. Log in to the HAWQ master host as a HAWQ administrator and source the file `/usr/local/hawq/greenplum_path.sh`. + + ``` shell + $ source /usr/local/hawq/greenplum_path.sh + ``` + +1. Use the `hawq config` utility to set `default_statistics_target`: + + ``` shell + $ hawq config -c default_statistics_target -v 50 + ``` +2. Reload the HAWQ configuration: + + ``` shell + $ hawq stop cluster -u + ``` The statististics target for individual columns can be set with the `ALTER TABLE` command. For example, some queries can be improved by increasing the target for certain columns, especially columns that have irregular distributions. You can set the target to zero for columns that never contribute to query optimization. When the target is 0, `ANALYZE` ignores the column. For example, the following `ALTER TABLE` command sets the statistics target for the `notes` column in the `emp` table to zero: @@ -174,12 +198,12 @@ The automatic statistics collection mode is set separately for commands that occ With the `on_change` mode, `ANALYZE` is triggered only if the number of rows affected exceeds the threshold defined by the `gp_autostats_on_change_threshold` configuration parameter. The default value for this parameter is a very high value, 2147483647, which effectively disables automatic statistics collection; you must set the threshold to a lower number to enable it. The `on_change` mode could trigger large, unexpected analyze operations that could disrupt the system, so it is not recommended to set it globally. It could be useful in a session, for example to automatically analyze a table following a load. -To disable automatic statistics collection outside of functions, set the `gp_autostats_mode` parameter to `none`: +To disable automatic statistics collection outside of functions, set the `gp_autostats_mode` parameter to `none`. For a command-line-managed HAWQ cluster: ``` shell $ hawq configure -c gp_autostats_mode -v none ``` -Set the `log_autostats` system configuration parameter to on if you want to log automatic statistics collection operations. - +For an Ambari-managed cluster, set `gp_autostats_mode` via the Ambari Web UI. +Set the `log_autostats` system configuration parameter to `on` if you want to log automatic statistics collection operations. http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/740b6ee6/ddl/ddl-table.html.md.erb ---------------------------------------------------------------------- diff --git a/ddl/ddl-table.html.md.erb b/ddl/ddl-table.html.md.erb index d0220d7..bc4f0c4 100644 --- a/ddl/ddl-table.html.md.erb +++ b/ddl/ddl-table.html.md.erb @@ -93,14 +93,14 @@ For any specific query, the first four factors are fixed values, while the confi The `bucketnum` for a hash table specifies the number of hash buckets to be used in creating virtual segments. A HASH distributed table is created with `default_hash_table_bucket_number` buckets. The default bucket value can be changed in session level or in the `CREATE TABLE` DDL by using the `bucketnum` storage parameter. -When initializing a cluster, you can use the `hawq init --bucket_number` parameter to explcitly set the default bucket number \(`default_hash_table_bucket_number`\). +In an Ambari-managed HAWQ cluster, the default bucket number \(`default_hash_table_bucket_number`\) is derived from the number of segment nodes. In command-line-managed HAWQ environments, you can use the `--bucket_number` option of `hawq init` to explicitly set `default_hash_table_bucket_number` during cluster initialization. -**Note:** For best performance with large tables, the number of buckets should not exceed the value of the `default_hash_table_bucket_number` parameter. Small tables can use one segment node, `with bucketnum=1`. For larger tables, the bucketnum is set to a multiple of the number of segment nodes, for the best load balancing on different segment nodes. The elastic runtime will attempt to find the optimal number of buckets for the number of nodes being processed. Larger tables need more virtual segments , and hence use larger numbers of buckets. +**Note:** For best performance with large tables, the number of buckets should not exceed the value of the `default_hash_table_bucket_number` parameter. Small tables can use one segment node, `WITH bucketnum=1`. For larger tables, the `bucketnum` is set to a multiple of the number of segment nodes, for the best load balancing on different segment nodes. The elastic runtime will attempt to find the optimal number of buckets for the number of nodes being processed. Larger tables need more virtual segments, and hence use larger numbers of buckets. The following statement creates a table âsalesâ with 8 buckets, which would be similar to a hash-distributed table on 8 segments. ``` sql -create table sales(id int, profit float) with (bucketnum=8) distributed by (id); +=> CREATE TABLE sales(id int, profit float) WITH (bucketnum=8) DISTRIBUTED BY (id); ``` There are four ways of creating a table from an origin table. The ways in which the new table is generated from the original table are listed below. @@ -110,10 +110,10 @@ There are four ways of creating a table from an origin table. The ways in which <th></th> <th>Syntax</th> </tr> - <tr><td>INHERITS</td><td><pre><code>create table new_table inherit (origintable) [with(bucketnum=x)] <br/>[distributed by col]</code></pre></td></tr> - <tr><td>LIKE</td><td><pre><code>create table new_table (like origintable) [with(bucketnum=x)] <br/>[distributed by col]</code></pre></td></tr> - <tr><td>AS</td><td><pre><code>create table new_table [with(bucketnum=x)] as subquery [distributed by col]</code></pre></td></tr> - <tr><td>SELECT INTO</td><td><pre><code>create table origintable [with(bucketnum=x)] [distributed by col]; select * <br/>into new_table from origintable;</code></pre></td></tr> + <tr><td>INHERITS</td><td><pre><code>CREATE TABLE new_table INHERITS (origintable) [WITH(bucketnum=x)] <br/>[DISTRIBUTED BY col]</code></pre></td></tr> + <tr><td>LIKE</td><td><pre><code>CREATE TABLE new_table (LIKE origintable) [WITH(bucketnum=x)] <br/>[DISTRIBUTED BY col]</code></pre></td></tr> + <tr><td>AS</td><td><pre><code>CREATE TABLE new_table [WITH(bucketnum=x)] AS SUBQUERY [DISTRIBUTED BY col]</code></pre></td></tr> + <tr><td>SELECT INTO</td><td><pre><code>CREATE TABLE origintable [WITH(bucketnum=x)] [DISTRIBUTED BY col]; SELECT * <br/>INTO new_table FROM origintable;</code></pre></td></tr> </table> The optional `INHERITS` clause specifies a list of tables from which the new table automatically inherits all columns. Hash tables inherit bucketnumbers from their origin table if not otherwise specified. If `WITH` specifies `bucketnum` in creating a hash-distributed table, it will be copied. If distribution is specified by column, the table will inherit it. Otherwise, the table will use default distribution from `default_hash_table_bucket_number`. @@ -132,7 +132,7 @@ Policies for different application scenarios can be specified to optimize perfor Adjusting the values of the configuration parameters `default_hash_table_bucket_number` and `hawq_rm_nvseg_perquery_limit` can tune performance by controlling the number of virtual segments being used. In most circumstances, HAWQ's elastic runtime will dynamically allocate virtual segments to optimize performance, so further tuning should not be needed.. -Hash tables are created using the value specified in `default_hash_table_bucket_number`. Explicitly setting this value can be useful in managing resources, as queries for hash tables use a fixed number of buckets, regardless of the amount of data present. If a larger or smaller number of hash buckets are desired, set this value before you CREATE TABLE. Resources are dynamically allocated to a multiple of the number of nodes. If setting the value of `default_hash_table_bucket_number` with hawq init --bucket\_number, the value should not exceed the value of `hawq_rm_nvseg_perquery_limit`, which defines the maximum number of virtual segments that can be used for a query \(default = 512, with a maximum of 65535\). Modifying the value to greater than 1000 segments is not recommended. +Hash tables are created using the value specified in `default_hash_table_bucket_number`. Queries for hash tables use a fixed number of buckets, regardless of the amount of data present. Explicitly setting `default_hash_table_bucket_number` can be useful in managing resources. If you desire a larger or smaller number of hash buckets, set this value before you create tables. Resources are dynamically allocated to a multiple of the number of nodes. If you use `hawq init --bucket_number` to set the value of `default_hash_table_bucket_number` during cluster initialization or expansion, the value should not exceed the value of `hawq_rm_nvseg_perquery_limit`. This server parameter defines the maximum number of virtual segments that can be used for a query \(default = 512, with a maximum of 65535\). Modifying the value to greater than 1000 segments is not recommended. The following per-node guidelines apply to values for `default_hash_table_bucket_number`. http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/740b6ee6/plext/using_pljava.html.md.erb ---------------------------------------------------------------------- diff --git a/plext/using_pljava.html.md.erb b/plext/using_pljava.html.md.erb index bab31dc..99b5767 100644 --- a/plext/using_pljava.html.md.erb +++ b/plext/using_pljava.html.md.erb @@ -57,9 +57,9 @@ HAWQ uses the `pljava_classpath` server configuration parameter in place of the ### <a id="serverconfigparams"></a>Server Configuration Parameters -The following server configuration parameters are used by PL/Java in HAWQ. These parameters replace the `pljava.*` parameters that are used in the standard PostgreSQL PL/Java implementation. +PL/Java uses server configuration parameters to configure classpath, Java VM, and other options. Refer to the [Server Configuration Parameter Reference](../reference/HAWQSiteConfig.html) for general information about HAWQ server configuration parameters. -<p class="note"><b>Note:</b> See the <a href="../reference/hawq-reference.html">HAWQ Reference</a> for information about HAWQ server configuration parameters.</p> +The following server configuration parameters are used by PL/Java in HAWQ. These parameters replace the `pljava.*` parameters that are used in the standard PostgreSQL PL/Java implementation. #### pljava\_classpath @@ -77,6 +77,48 @@ If TRUE, lingering savepoints will be released on function exit. If FALSE, they Defines the start up options for the Java VM. +### <a id="setting_serverconfigparams"></a>Setting PL/Java Configuration Parameters + +You can set PL/Java server configuration parameters at the session level, or globally across your whole cluster. Your HAWQ cluster configuration must be reloaded after setting a server configuration value globally. + +#### <a id="setsrvrcfg_global"></a>Cluster Level + +You will perform different procedures to set a PL/Java server configuration parameter for your whole HAWQ cluster depending upon whether you manage your cluster from the command line or use Ambari. If you use Ambari to manage your HAWQ cluster, you must ensure that you update server configuration parameters only via the Ambari Web UI. If you manage your HAWQ cluster from the command line, you will use the `hawq config` command line utility to set PL/Java server configuration parameters. + +The following examples add a JAR file named `myclasses.jar` to the `pljava_classpath` server configuration parameter for the entire HAWQ cluster. + +If you use Ambari to manage your HAWQ cluster: + +1. Set the `pljava_classpath` configuration property to include `myclasses.jar` via the HAWQ service **Configs > Advanced > Custom hawq-site** drop down. +2. Select **Service Actions > Restart All** to load the updated configuration. + +If you manage your HAWQ cluster from the command line: + +1. Log in to the HAWQ master host as a HAWQ administrator and source the file `/usr/local/hawq/greenplum_path.sh`. + + ``` shell + $ source /usr/local/hawq/greenplum_path.sh + ``` + +1. Use the `hawq config` utility to set `pljava_classpath`: + + ``` shell + $ hawq config -c pljava_classpath -v \'myclasses.jar\' + ``` +2. Reload the HAWQ configuration: + + ``` shell + $ hawq stop cluster -u + ``` + +#### <a id="setsrvrcfg_session"></a>Session Level + +To set a PL/Java server configuration parameter for only the *current* database session, set the parameter within the `psql` subsystem. For example, to set `pljava_classpath`: + +``` sql +=> SET pljava_classpath='myclasses.jar'; +``` + ## <a id="enablepljava"></a>Enabling and Removing PL/Java Support @@ -89,7 +131,7 @@ Before you enable PL/Java: 1. Ensure that you have installed a supported Java runtime environment and that the `$JAVA_HOME` variable is set to the same path on the master and all segment nodes. -2. Perform the following step on all machines to set up `ldconfig` for JDK: +2. Perform the following step on all machines to set up `ldconfig` for the installed JDK: ``` shell $ echo "$JAVA_HOME/jre/lib/amd64/server" > /etc/ld.so.conf.d/libjdk.conf @@ -103,14 +145,14 @@ Before you enable PL/Java: To use PL/Java: 1. Enable the language for each database. -1. Install user-created JAR files containing Java methods on all HAWQ hosts. -1. Add the name of the JAR file to the HAWQ `pljava_classpath` server configuration parameter in `hawq-site.xml`. This parameter value should contain a list of the installed JAR files. +1. Install user-created JAR files on all HAWQ hosts. +1. Add the names of the JAR files to the HAWQ `pljava_classpath` server configuration parameter. This parameter value should identify a list of the installed JAR files. #### <a id="enablepljava"></a>Enable PL/Java and Install JAR Files Perform the following steps as the `gpadmin` user: -1. Enable PL/Java by running the `$GPHOME/share/postgresql/pljava/install.sql` SQL script in the databases that use PL/Java. The `install.sql` script registers both the trusted and untrusted PL/Java. For example, the following command enables PL/Java on a database named `testdb`: +1. Enable PL/Java by running the `$GPHOME/share/postgresql/pljava/install.sql` SQL script in the databases that will use PL/Java. The `install.sql` script registers both the trusted and untrusted PL/Java languages. For example, the following command enables PL/Java on a database named `testdb`: ``` shell $ psql -d testdb -f $GPHOME/share/postgresql/pljava/install.sql @@ -124,27 +166,14 @@ Perform the following steps as the `gpadmin` user: Use this option *only* if you are certain you want to enable PL/Java in all new databases. -2. Copy your Java archives (JAR files) to `$GPHOME/lib/postgresql/java/` on all the HAWQ hosts. This example uses the `hawq scp` utility to copy the `myclasses.jar` file: +2. Copy your Java archives (JAR files) to `$GPHOME/lib/postgresql/java/` on all HAWQ hosts. This example uses the `hawq scp` utility to copy the `myclasses.jar` file located in the current directory: ``` shell $ hawq scp -f hawq_hosts myclasses.jar =:$GPHOME/lib/postgresql/java/ ``` The `hawq_hosts` file contains a list of the HAWQ hosts. -3. The JAR files must be added to the `pljava_classpath` configuration parameter. This parameter can be set at either the database session or global levels. - - To affect only the *current* database session, set the `pljava_classpath` configuration parameter at the `psql` prompt: - - ``` sql - psql> SET pljava_classpath='myclasses.jar'; - ``` - - To affect *all* sessions, set the `pljava_classpath` server configuration parameter and restart the HAWQ cluster: - - ``` shell - $ hawq config -c pljava_classpath -v \'examples.jar:myclasses.jar\' - $ hawq restart cluster - ``` +3. Add the JAR files to the `pljava_classpath` configuration parameter. Refer to [Setting PL/Java Configuration Parameters](#setting_serverconfigparams) for the specific procedure. 5. (Optional) Your HAWQ installation includes an `examples.sql` file. This script contains sample PL/Java functions that you can use for testing. Run the commands in this file to create and run test functions that use the Java classes in `examples.jar`: @@ -154,7 +183,9 @@ Perform the following steps as the `gpadmin` user: #### Configuring PL/Java VM Options -PL/Java JVM options can be configured via the `pljava_vmoptions` parameter in `hawq-site.xml`. For example, `pljava_vmoptions=-Xmx512M` sets the maximum heap size of the JVM. The default Xmx value is set to `-Xmx64M`. +PL/Java JVM options can be configured via the `pljava_vmoptions` server configuration parameter. For example, `pljava_vmoptions=-Xmx512M` sets the maximum heap size of the JVM. The default `-Xmx` value is `64M`. + +Refer to [Setting PL/Java Configuration Parameters](#setting_serverconfigparams) for the specific procedure to set PL/Java server configuration parameters. ### <a id="uninstallpljava"></a>Disable PL/Java @@ -166,7 +197,7 @@ To disable PL/Java, you should: #### <a id="uninstallpljavasupport"></a>Remove PL/Java Support from Databases -For a database that no long requires the PL/Java language, remove support for PL/Java by running the `uninstall.sql` file as the `gpadmin` user. For example, the following command disables the PL/Java language in the specified database: +For a database that no longer requires the PL/Java language, remove support for PL/Java by running the `uninstall.sql` script as the `gpadmin` user. For example, the following command disables the PL/Java language in the specified database: ``` shell $ psql -d <dbname> -f $GPHOME/share/postgresql/pljava/uninstall.sql @@ -177,17 +208,38 @@ Replace \<dbname\> with the name of the target database. #### <a id="uninstallpljavapackage"></a>Uninstall the Java JAR files -When no databases have PL/Java as a registered language, remove the Java JAR files: +When no databases have PL/Java as a registered language, remove the Java JAR files. -1. Remove the `pljava_classpath` server configuration parameter in the `hawq-site.xml` file. +If you use Ambari to manage your cluster: -1. Remove the JAR files from the `$GPHOME/lib/postgresql/java/` directory of each HAWQ host. +1. Remove the `pljava_classpath` configuration property via the HAWQ service **Configs > Advanced > Custom hawq-site** drop down. -1. Restart the HAWQ cluster: +2. Remove the JAR files from the `$GPHOME/lib/postgresql/java/` directory of each HAWQ host. - ``` shell - $ hawq restart cluster - ``` +3. Select **Service Actions > Restart All** to restart your HAWQ cluster. + + +If you manage your cluster from the command line: + +1. Log in to the HAWQ master host as a HAWQ administrator and source the file `/usr/local/hawq/greenplum_path.sh`. + + ``` shell + $ source /usr/local/hawq/greenplum_path.sh + ``` + +1. Use the `hawq config` utility to remove `pljava_classpath`: + + ``` shell + $ hawq config -r pljava_classpath + ``` + +2. Remove the JAR files from the `$GPHOME/lib/postgresql/java/` directory of each HAWQ host. + +3. If you manage your cluster from the command line, run: + + ``` shell + $ hawq restart cluster + ``` ## <a id="writingpljavafunc"></a>Writing PL/Java Functions @@ -211,16 +263,16 @@ A Java function is declared with the name of a class and a static method on that The following function can be declared to access the static method getProperty on `java.lang.System` class: ```sql -CREATE FUNCTION getsysprop(VARCHAR) - RETURNS VARCHAR - AS 'java.lang.System.getProperty' - LANGUAGE java; +=> CREATE FUNCTION getsysprop(VARCHAR) + RETURNS VARCHAR + AS 'java.lang.System.getProperty' + LANGUAGE java; ``` Run the following command to return the Java `user.home` property: ```sql -SELECT getsysprop('user.home'); +=> SELECT getsysprop('user.home'); ``` ### <a id="typemapping"></a>Type Mapping @@ -254,10 +306,10 @@ All other types are mapped to `java.lang.String` and will utilize the standard t The scalar types that map to Java primitives can not be passed as NULL values. To pass NULL values, those types can have an alternative mapping. You enable this mapping by explicitly denoting it in the method reference. ```sql -CREATE FUNCTION trueIfEvenOrNull(integer) - RETURNS bool - AS 'foo.fee.Fum.trueIfEvenOrNull(java.lang.Integer)' - LANGUAGE java; +=> CREATE FUNCTION trueIfEvenOrNull(integer) + RETURNS bool + AS 'foo.fee.Fum.trueIfEvenOrNull(java.lang.Integer)' + LANGUAGE java; ``` The Java code would be similar to this: @@ -278,8 +330,8 @@ public class Fum The following two statements both yield true: ```sql -SELECT trueIfEvenOrNull(NULL); -SELECT trueIfEvenOrNull(4); +=> SELECT trueIfEvenOrNull(NULL); +=> SELECT trueIfEvenOrNull(4); ``` In order to return NULL values from a Java method, you use the object type that corresponds to the primitive (for example, you return `java.lang.Integer` instead of `int`). The PL/Java resolve mechanism finds the method regardless. Since Java cannot have different return types for methods with the same name, this does not introduce any ambiguity. @@ -291,12 +343,12 @@ A complex type will always be passed as a read-only `java.sql.ResultSet` with ex Example: ```sql -CREATE TYPE complexTest - AS(base integer, incbase integer, ctime timestamptz); -CREATE FUNCTION useComplexTest(complexTest) - RETURNS VARCHAR - AS 'foo.fee.Fum.useComplexTest' - IMMUTABLE LANGUAGE java; +=> CREATE TYPE complexTest + AS(base integer, incbase integer, ctime timestamptz); +=> CREATE FUNCTION useComplexTest(complexTest) + RETURNS VARCHAR + AS 'foo.fee.Fum.useComplexTest' + IMMUTABLE LANGUAGE java; ``` In the Java class `Fum`, we add the following static method: @@ -321,10 +373,10 @@ Java does not stipulate any way to create a `ResultSet`. Hence, returning a Resu Assume that the complexTest type in previous section has been created. ```sql -CREATE FUNCTION createComplexTest(int, int) - RETURNS complexTest - AS 'foo.fee.Fum.createComplexTest' - IMMUTABLE LANGUAGE java; +=> CREATE FUNCTION createComplexTest(int, int) + RETURNS complexTest + AS 'foo.fee.Fum.createComplexTest' + IMMUTABLE LANGUAGE java; ``` The PL/Java method resolve will now find the following method in the `Fum` class: @@ -353,10 +405,10 @@ When returning result set, you should not build a result set before returning it In order to return a set of a scalar type, you need create a Java method that returns something that implements the `java.util.Iterator` interface. Here is an example of a method that returns a SETOF varchar: ```sql -CREATE FUNCTION javatest.getSystemProperties() - RETURNS SETOF varchar - AS 'foo.fee.Bar.getNames' - IMMUTABLE LANGUAGE java; +=> CREATE FUNCTION javatest.getSystemProperties() + RETURNS SETOF varchar + AS 'foo.fee.Bar.getNames' + IMMUTABLE LANGUAGE java; ``` This simple Java method returns an iterator: @@ -390,10 +442,10 @@ This interface has two methods. The boolean `assignRowValues(java.sql.ResultSet You can use this interface the following way: ```sql -CREATE FUNCTION javatest.listComplexTests(int, int) - RETURNS SETOF complexTest - AS 'foo.fee.Fum.listComplexTest' - IMMUTABLE LANGUAGE java; +=> CREATE FUNCTION javatest.listComplexTests(int, int) + RETURNS SETOF complexTest + AS 'foo.fee.Fum.listComplexTest' + IMMUTABLE LANGUAGE java; ``` The function maps to a static java method that returns an instance that implements the `ResultSetProvider` interface. @@ -444,14 +496,14 @@ This interface is similar to the `ResultSetProvider` interface in that it has a Here is an example that executes a query using a statement that it obtained using the default connection. The SQL suitable for the deployment descriptor looks like this: ```sql -CREATE FUNCTION javatest.listSupers() - RETURNS SETOF pg_user - AS 'org.postgresql.pljava.example.Users.listSupers' - LANGUAGE java; -CREATE FUNCTION javatest.listNonSupers() - RETURNS SETOF pg_user - AS 'org.postgresql.pljava.example.Users.listNonSupers' - LANGUAGE java; +=> CREATE FUNCTION javatest.listSupers() + RETURNS SETOF pg_user + AS 'org.postgresql.pljava.example.Users.listSupers' + LANGUAGE java; +=> CREATE FUNCTION javatest.listNonSupers() + RETURNS SETOF pg_user + AS 'org.postgresql.pljava.example.Users.listNonSupers' + LANGUAGE java; ``` And in the Java package `org.postgresql.pljava.example` a class `Users` is added: @@ -606,7 +658,7 @@ Compile the Java code: $ javac *.java ``` -Create a JAR archive named `analytics.jar` that contains the class file and the manifest file in the JAR. +Create a JAR archive named `analytics.jar` that contains the class file and the manifest file in the JAR: ```shell $ jar cfm analytics.jar manifest.txt *.class @@ -614,46 +666,37 @@ $ jar cfm analytics.jar manifest.txt *.class Upload the JAR file to the HAWQ master host. -Run the `hawq scp` utility to copy the jar file to the HAWQ Java directory. Use the `-f` option to specify the file that contains a list of the master and segment hosts. +Run the `hawq scp` utility to copy the jar file to the HAWQ Java directory. Use the `-f` option to specify the file that contains a list of the master and segment hosts: ```shell $ hawq scp -f hawq_hosts analytics.jar =:/usr/local/hawq/lib/postgresql/java/ ``` -Use the `hawq config` utility to set the HAWQ `pljava_classpath` server configuration parameter. The parameter lists the installed JAR files. - -```shell -$ hawq config -c pljava_classpath -v \'analytics.jar\' -``` +Add the `analytics.jar` JAR file to the `pljava_classpath` configuration parameter. Refer to [Setting PL/Java Configuration Parameters](#setting_serverconfigparams) for the specific procedure. -Run the `hawq restart` utility to reload the configuration files. +From the `psql` subsystem, run the following command to show the installed JAR files: -```shell -$ hawq restart cluster -``` - -From the `psql` command line, run the following command to show the installed JAR files. - -```shell -psql# SHOW pljava_classpath +``` sql +=> SHOW pljava_classpath ``` The following SQL commands create a table and define a Java function to test the method in the JAR file: ```sql -CREATE TABLE temp (a varchar) DISTRIBUTED randomly; -INSERT INTO temp values ('my string'); +=> CREATE TABLE temp (a varchar) DISTRIBUTED randomly; +=> INSERT INTO temp values ('my string'); --Example function -CREATE OR REPLACE FUNCTION java_substring(varchar, int, int) -RETURNS varchar AS 'Example.substring' LANGUAGE java; +=> CREATE OR REPLACE FUNCTION java_substring(varchar, int, int) + RETURNS varchar AS 'Example.substring' + LANGUAGE java; --Example execution -SELECT java_substring(a, 1, 5) FROM temp; +=> SELECT java_substring(a, 1, 5) FROM temp; ``` -You can place the contents in a file, `mysample.sql` and run the command from a psql command line: +If you add these SQL commands to a file named `mysample.sql`, you can run the commands from the `psql` subsystem using the `\i` meta-command: -```shell -psql# \i mysample.sql +``` sql +=> \i mysample.sql ``` The output is similar to this: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/740b6ee6/query/gporca/query-gporca-enable.html.md.erb ---------------------------------------------------------------------- diff --git a/query/gporca/query-gporca-enable.html.md.erb b/query/gporca/query-gporca-enable.html.md.erb index 5dcaf33..e8cc93f 100644 --- a/query/gporca/query-gporca-enable.html.md.erb +++ b/query/gporca/query-gporca-enable.html.md.erb @@ -20,14 +20,27 @@ As part of routine database maintenance, you should refresh statistics on the ro When the configuration parameter `optimizer_analyze_root_partition` is set to `on`, root partition statistics will be collected when `ANALYZE` is run on a partitioned table. Root partition statistics are required by GPORCA. -1. Log into the HAWQ master host as `gpadmin`, the HAWQ administrator. -2. Set the values of the server configuration parameters. These HAWQ `hawq config` utility commands sets the value of the parameters to `on`: +You will perform different procedures to set optimizer configuration parameters for your whole HAWQ cluster depending upon whether you manage your cluster from the command line or use Ambari. If you use Ambari to manage your HAWQ cluster, you must ensure that you update server configuration parameters only via the Ambari Web UI. If you manage your HAWQ cluster from the command line, you will use the `hawq config` command line utility to set optimizer server configuration parameters. + +If you use Ambari to manage your HAWQ cluster: + +1. Set the `optimizer_analyze_root_partition` configuration property to `on` via the HAWQ service **Configs > Advanced > Custom hawq-site** drop down. +2. Select **Service Actions > Restart All** to load the updated configuration. + +If you manage your HAWQ cluster from the command line: + +1. Log in to the HAWQ master host as a HAWQ administrator and source the file `/usr/local/hawq/greenplum_path.sh`. ``` shell - $ hawq config -c optimizer_analyze_root_partition -v on + $ source /usr/local/hawq/greenplum_path.sh ``` -3. Reload the modified configuration in HAWQ. This HAWQ `hawq stop` utility command reloads the `hawq-site.xml` file without shutting down HAWQ: +1. Use the `hawq config` utility to set `optimizer_analyze_root_partition`: + + ``` shell + $ hawq config -c optimizer_analyze_root_partition -v on + ``` +2. Reload the HAWQ configuration: ``` shell $ hawq stop cluster -u @@ -37,14 +50,25 @@ When the configuration parameter `optimizer_analyze_root_partition` is set to `o Set the server configuration parameter `optimizer` for the HAWQ system. -1. Log into the HAWQ master host as `gpadmin`, the HAWQ administrator. -2. Set the value of the server configuration `optimizer` parameter. This HAWQ `hawq config` utility command sets the value to `on`: +If you use Ambari to manage your HAWQ cluster: + +1. Set the `optimizer` configuration property to `on` via the HAWQ service **Configs > Advanced > Custom hawq-site** drop down. +2. Select **Service Actions > Restart All** to load the updated configuration. + +If you manage your HAWQ cluster from the command line: + +1. Log in to the HAWQ master host as a HAWQ administrator and source the file `/usr/local/hawq/greenplum_path.sh`. ``` shell - $ hawq config -c optimizer -v on + $ source /usr/local/hawq/greenplum_path.sh ``` -3. Reload the modified configuration in HAWQ: +1. Use the `hawq config` utility to set `optimizer`: + + ``` shell + $ hawq config -c optimizer -v on + ``` +2. Reload the HAWQ configuration: ``` shell $ hawq stop cluster -u @@ -55,7 +79,7 @@ Set the server configuration parameter `optimizer` for the HAWQ system. Set the server configuration parameter `optimizer` for individual HAWQ databases with the `ALTER DATABASE` command. For example, this command enables GPORCA for the database *test\_db*. ``` sql -> ALTER DATABASE test_db SET optimizer = ON ; +=> ALTER DATABASE test_db SET optimizer = ON ; ``` ## <a id="topic_lx4_vqk_br"></a>Enabling GPORCA for a Session or a Query @@ -63,7 +87,7 @@ Set the server configuration parameter `optimizer` for individual HAWQ databases You can use the `SET` command to set `optimizer` server configuration parameter for a session. For example, after you use the `psql` utility to connect to HAWQ, this `SET` command enables GPORCA: ``` sql -> SET optimizer = on ; +=> SET optimizer = on ; ``` To set the parameter for a specific query, include the `SET` command prior to running the query. http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/740b6ee6/reference/sql/VACUUM.html.md.erb ---------------------------------------------------------------------- diff --git a/reference/sql/VACUUM.html.md.erb b/reference/sql/VACUUM.html.md.erb index 9b39a32..57ccc0e 100644 --- a/reference/sql/VACUUM.html.md.erb +++ b/reference/sql/VACUUM.html.md.erb @@ -36,7 +36,7 @@ When `VERBOSE` is specified, `VACUUM` emits progress messages to indicate which **Note:** A VACUUM FULL is not recommended in HAWQ. See [Notes](#topic1__section6).</dd> <dt>FREEZE </dt> -<dd>Specifying `FREEZE` is equivalent to performing `VACUUM` with the `vacuum_freeze_min_age` server configuration parameter set to zero. The `FREEZE` option is deprecated and will be removed in a future release. Set the parameter by using `hawq config` or by modifying your `hawq-site.xml` configuration file instead.</dd> +<dd>Specifying `FREEZE` is equivalent to performing `VACUUM` with the `vacuum_freeze_min_age` server configuration parameter set to zero. The `FREEZE` option is deprecated and will be removed in a future release.</dd> <dt>VERBOSE </dt> <dd>Prints a detailed vacuum activity report for each table.</dd>
