moving book configuration to new 'book' branch, for HAWQ-1027
Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/7514e193 Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/7514e193 Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/7514e193 Branch: refs/heads/book Commit: 7514e19340ada920cc746f93ae13d79cddf4d934 Parents: Author: David Yozie <[email protected]> Authored: Mon Aug 29 09:41:19 2016 -0700 Committer: David Yozie <[email protected]> Committed: Mon Aug 29 09:41:19 2016 -0700 ---------------------------------------------------------------------- Gemfile | 5 + Gemfile.lock | 203 ++ LICENSE | 176 + NOTICE | 15 + README.md | 52 + ...ckingUpandRestoringHAWQDatabases.html.md.erb | 373 +++ admin/ClusterExpansion.html.md.erb | 224 ++ admin/ClusterShrink.html.md.erb | 55 + admin/FaultTolerance.html.md.erb | 52 + ...esandHighAvailabilityEnabledHDFS.html.md.erb | 225 ++ admin/HighAvailability.html.md.erb | 37 + admin/MasterMirroring.html.md.erb | 135 + admin/RecommendedMonitoringTasks.html.md.erb | 261 ++ admin/RunningHAWQ.html.md.erb | 22 + admin/ambari-admin.html.md.erb | 435 +++ admin/maintain.html.md.erb | 31 + admin/monitor.html.md.erb | 424 +++ admin/startstop.html.md.erb | 157 + .../HAWQBestPracticesOverview.html.md.erb | 31 + bestpractices/general_bestpractices.html.md.erb | 24 + .../managing_data_bestpractices.html.md.erb | 47 + ...managing_resources_bestpractices.html.md.erb | 144 + .../operating_hawq_bestpractices.html.md.erb | 289 ++ .../querying_data_bestpractices.html.md.erb | 25 + bestpractices/secure_bestpractices.html.md.erb | 11 + clientaccess/client_auth.html.md.erb | 193 ++ clientaccess/disable-kerberos.html.md.erb | 90 + clientaccess/g-connecting-with-psql.html.md.erb | 35 + ...-database-application-interfaces.html.md.erb | 17 + ...-establishing-a-database-session.html.md.erb | 17 + ...lum-database-client-applications.html.md.erb | 23 + .../g-supported-client-applications.html.md.erb | 8 + ...ubleshooting-connection-problems.html.md.erb | 12 + clientaccess/index.md.erb | 17 + clientaccess/kerberos.html.md.erb | 306 ++ clientaccess/ldap.html.md.erb | 116 + clientaccess/roles_privs.html.md.erb | 288 ++ config.yml | 22 + datamgmt/BasicDataOperations.html.md.erb | 62 + datamgmt/ConcurrencyControl.html.md.erb | 31 + .../HAWQInputFormatforMapReduce.html.md.erb | 304 ++ datamgmt/Transactions.html.md.erb | 58 + datamgmt/about_statistics.html.md.erb | 187 ++ datamgmt/dml.html.md.erb | 35 + datamgmt/load/client-loadtools.html.md.erb | 88 + ...reating-external-tables-examples.html.md.erb | 117 + ...ut-gpfdist-setup-and-performance.html.md.erb | 22 + datamgmt/load/g-character-encoding.html.md.erb | 11 + ...ommand-based-web-external-tables.html.md.erb | 26 + .../g-configuration-file-format.html.md.erb | 66 + ...-controlling-segment-parallelism.html.md.erb | 11 + ...table-and-declare-a-reject-limit.html.md.erb | 11 + ...ng-and-using-web-external-tables.html.md.erb | 13 + ...-with-single-row-error-isolation.html.md.erb | 24 + ...ased-writable-external-web-table.html.md.erb | 43 + ...le-based-writable-external-table.html.md.erb | 16 + ...ermine-the-transformation-schema.html.md.erb | 33 + ...-web-or-writable-external-tables.html.md.erb | 11 + ...-escaping-in-csv-formatted-files.html.md.erb | 29 + ...escaping-in-text-formatted-files.html.md.erb | 31 + datamgmt/load/g-escaping.html.md.erb | 16 + ...e-publications-in-demo-directory.html.md.erb | 29 + ...le-greenplum-file-server-gpfdist.html.md.erb | 13 + ...-mef-xml-files-in-demo-directory.html.md.erb | 54 + ...e-witsml-files-in-demo-directory.html.md.erb | 54 + ...g-examples-read-fixed-width-data.html.md.erb | 37 + datamgmt/load/g-external-tables.html.md.erb | 44 + datamgmt/load/g-formatting-columns.html.md.erb | 19 + .../load/g-formatting-data-files.html.md.erb | 17 + datamgmt/load/g-formatting-rows.html.md.erb | 7 + datamgmt/load/g-gpfdist-protocol.html.md.erb | 15 + datamgmt/load/g-gpfdists-protocol.html.md.erb | 37 + ...g-handling-errors-ext-table-data.html.md.erb | 9 + .../load/g-handling-load-errors.html.md.erb | 28 + ...id-csv-files-in-error-table-data.html.md.erb | 7 + ...g-and-exporting-fixed-width-data.html.md.erb | 38 + datamgmt/load/g-installing-gpfdist.html.md.erb | 7 + datamgmt/load/g-load-the-data.html.md.erb | 17 + .../g-loading-and-unloading-data.html.md.erb | 53 + ...and-writing-non-hdfs-custom-data.html.md.erb | 9 + ...ing-data-using-an-external-table.html.md.erb | 18 + .../load/g-loading-data-with-copy.html.md.erb | 11 + .../g-loading-data-with-hawqload.html.md.erb | 56 + .../g-moving-data-between-tables.html.md.erb | 12 + ...-data-load-and-query-performance.html.md.erb | 10 + .../load/g-representing-null-values.html.md.erb | 7 + ...-single-row-error-isolation-mode.html.md.erb | 17 + .../g-starting-and-stopping-gpfdist.html.md.erb | 42 + .../g-transfer-and-store-the-data.html.md.erb | 16 + .../load/g-transforming-with-gpload.html.md.erb | 30 + ...ing-with-insert-into-select-from.html.md.erb | 22 + .../load/g-transforming-xml-data.html.md.erb | 34 + .../load/g-troubleshooting-gpfdist.html.md.erb | 23 + ...ing-data-from-greenplum-database.html.md.erb | 17 + ...-using-a-writable-external-table.html.md.erb | 17 + .../g-unloading-data-using-copy.html.md.erb | 12 + .../g-url-based-web-external-tables.html.md.erb | 24 + .../load/g-using-a-custom-format.html.md.erb | 23 + ...m-parallel-file-server--gpfdist-.html.md.erb | 19 + ...rking-with-file-based-ext-tables.html.md.erb | 21 + datamgmt/load/g-write-a-transform.html.md.erb | 48 + ...-write-the-gpfdist-configuration.html.md.erb | 61 + .../g-xml-transformation-examples.html.md.erb | 13 + ddl/ddl-database.html.md.erb | 78 + ddl/ddl-partition.html.md.erb | 483 +++ ddl/ddl-schema.html.md.erb | 95 + ddl/ddl-storage.html.md.erb | 74 + ddl/ddl-table.html.md.erb | 149 + ddl/ddl-tablespace.html.md.erb | 154 + ddl/ddl-view.html.md.erb | 25 + ddl/ddl.html.md.erb | 19 + hawq-book/Gemfile | 5 + hawq-book/Gemfile.lock | 203 ++ hawq-book/config.yml | 22 + .../master_middleman/source/images/favicon.ico | Bin 0 -> 1150 bytes .../master_middleman/source/javascripts/book.js | 16 + .../source/javascripts/waypoints/context.js | 300 ++ .../source/javascripts/waypoints/group.js | 105 + .../javascripts/waypoints/noframeworkAdapter.js | 213 ++ .../source/javascripts/waypoints/sticky.js | 63 + .../source/javascripts/waypoints/waypoint.js | 160 + .../master_middleman/source/layouts/_title.erb | 6 + .../patch/dynamic_variable_interpretation.py | 192 ++ .../source/stylesheets/book-styles.css.scss | 3 + .../stylesheets/partials/_book-base-values.scss | 0 .../source/stylesheets/partials/_book-vars.scss | 19 + .../source/subnavs/apache-hawq-nav.erb | 955 ++++++ hawq-book/redirects.rb | 6 + images/02-pipeline.png | Bin 0 -> 40864 bytes images/03-gpload-files.jpg | Bin 0 -> 38954 bytes images/basic_query_flow.png | Bin 0 -> 74709 bytes images/ext-tables-xml.png | Bin 0 -> 92048 bytes images/ext_tables.jpg | Bin 0 -> 65371 bytes images/ext_tables_multinic.jpg | Bin 0 -> 24394 bytes images/gangs.jpg | Bin 0 -> 30405 bytes images/gporca.png | Bin 0 -> 53323 bytes images/hawq_hcatalog.png | Bin 0 -> 120047 bytes images/slice_plan.jpg | Bin 0 -> 53086 bytes install/aws-config.html.md.erb | 135 + install/select-hosts.html.md.erb | 19 + master_middleman/source/images/favicon.ico | Bin 0 -> 1150 bytes master_middleman/source/javascripts/book.js | 16 + .../source/javascripts/waypoints/context.js | 300 ++ .../source/javascripts/waypoints/group.js | 105 + .../javascripts/waypoints/noframeworkAdapter.js | 213 ++ .../source/javascripts/waypoints/sticky.js | 63 + .../source/javascripts/waypoints/waypoint.js | 160 + master_middleman/source/layouts/_title.erb | 6 + .../patch/dynamic_variable_interpretation.py | 192 ++ .../source/stylesheets/book-styles.css.scss | 3 + .../stylesheets/partials/_book-base-values.scss | 0 .../source/stylesheets/partials/_book-vars.scss | 19 + .../source/subnavs/apache-hawq-nav.erb | 955 ++++++ mdimages/02-pipeline.png | Bin 0 -> 40864 bytes mdimages/03-gpload-files.jpg | Bin 0 -> 38954 bytes mdimages/1-assign-masters.tiff | Bin 0 -> 248134 bytes mdimages/1-choose-services.tiff | Bin 0 -> 258298 bytes mdimages/3-assign-slaves-and-clients.tiff | Bin 0 -> 199176 bytes mdimages/4-customize-services-hawq.tiff | Bin 0 -> 241800 bytes mdimages/5-customize-services-pxf.tiff | Bin 0 -> 192364 bytes mdimages/6-review.tiff | Bin 0 -> 230890 bytes mdimages/7-install-start-test.tiff | Bin 0 -> 204112 bytes mdimages/ext-tables-xml.png | Bin 0 -> 92048 bytes mdimages/ext_tables.jpg | Bin 0 -> 65371 bytes mdimages/ext_tables_multinic.jpg | Bin 0 -> 24394 bytes mdimages/gangs.jpg | Bin 0 -> 30405 bytes mdimages/gp_orca_fallback.png | Bin 0 -> 14683 bytes mdimages/gpfdist_instances.png | Bin 0 -> 26236 bytes mdimages/gpfdist_instances_backup.png | Bin 0 -> 48414 bytes mdimages/gporca.png | Bin 0 -> 53323 bytes mdimages/hawq_architecture_components.png | Bin 0 -> 99650 bytes mdimages/hawq_hcatalog.png | Bin 0 -> 120047 bytes mdimages/hawq_high_level_architecture.png | Bin 0 -> 491840 bytes mdimages/partitions.jpg | Bin 0 -> 43514 bytes mdimages/piv-opt.png | Bin 0 -> 4823 bytes mdimages/resource_queues.jpg | Bin 0 -> 18793 bytes mdimages/slice_plan.jpg | Bin 0 -> 53086 bytes mdimages/source/gporca.graffle | Bin 0 -> 2814 bytes mdimages/source/hawq_hcatalog.graffle | Bin 0 -> 2967 bytes mdimages/standby_master.jpg | Bin 0 -> 18180 bytes mdimages/svg/hawq_architecture_components.svg | 1083 ++++++ mdimages/svg/hawq_hcatalog.svg | 3 + mdimages/svg/hawq_resource_management.svg | 621 ++++ mdimages/svg/hawq_resource_queues.svg | 340 ++ overview/ElasticSegments.html.md.erb | 31 + overview/HAWQArchitecture.html.md.erb | 69 + overview/HAWQOverview.html.md.erb | 43 + overview/HDFSCatalogCache.html.md.erb | 7 + overview/ManagementTools.html.md.erb | 9 + overview/RedundancyFailover.html.md.erb | 29 + overview/ResourceManagement.html.md.erb | 14 + overview/TableDistributionStorage.html.md.erb | 41 + overview/system-overview.html.md.erb | 11 + plext/UsingProceduralLanguages.html.md.erb | 20 + plext/using_pgcrypto.html.md.erb | 32 + plext/using_pljava.html.md.erb | 666 ++++ plext/using_plperl.html.md.erb | 27 + plext/using_plpgsql.html.md.erb | 142 + plext/using_plpython.html.md.erb | 595 ++++ plext/using_plr.html.md.erb | 229 ++ pxf/ConfigurePXF.html.md.erb | 67 + pxf/HBasePXF.html.md.erb | 105 + pxf/HDFSFileDataPXF.html.md.erb | 507 +++ pxf/HawqExtensionFrameworkPXF.html.md.erb | 41 + pxf/HivePXF.html.md.erb | 417 +++ pxf/InstallPXFPlugins.html.md.erb | 141 + pxf/JsonPXF.html.md.erb | 197 ++ pxf/PXFExternalTableandAPIReference.html.md.erb | 1311 ++++++++ pxf/ReadWritePXF.html.md.erb | 123 + pxf/TroubleshootingPXF.html.md.erb | 134 + query/HAWQQueryProcessing.html.md.erb | 60 + query/defining-queries.html.md.erb | 528 +++ query/functions-operators.html.md.erb | 437 +++ query/gporca/query-gporca-changed.html.md.erb | 17 + query/gporca/query-gporca-enable.html.md.erb | 71 + query/gporca/query-gporca-fallback.html.md.erb | 142 + query/gporca/query-gporca-features.html.md.erb | 215 ++ .../gporca/query-gporca-limitations.html.md.erb | 37 + query/gporca/query-gporca-notes.html.md.erb | 28 + query/gporca/query-gporca-optimizer.html.md.erb | 39 + query/gporca/query-gporca-overview.html.md.erb | 23 + query/query-performance.html.md.erb | 143 + query/query-profiling.html.md.erb | 238 ++ query/query.html.md.erb | 37 + redirects.rb | 6 + .../CharacterSetSupportReference.html.md.erb | 439 +++ reference/HAWQDataTypes.html.md.erb | 137 + reference/HAWQEnvironmentVariables.html.md.erb | 105 + reference/HAWQSampleSiteConfig.html.md.erb | 120 + reference/HAWQSiteConfig.html.md.erb | 19 + ...SConfigurationParameterReference.html.md.erb | 257 ++ reference/SQLCommandReference.html.md.erb | 163 + reference/catalog/catalog_ref-html.html.md.erb | 143 + .../catalog/catalog_ref-tables.html.md.erb | 68 + reference/catalog/catalog_ref-views.html.md.erb | 21 + reference/catalog/catalog_ref.html.md.erb | 21 + .../gp_configuration_history.html.md.erb | 23 + .../catalog/gp_distribution_policy.html.md.erb | 18 + .../catalog/gp_global_sequence.html.md.erb | 16 + .../catalog/gp_master_mirroring.html.md.erb | 19 + .../gp_persistent_database_node.html.md.erb | 71 + .../gp_persistent_filespace_node.html.md.erb | 83 + .../gp_persistent_relation_node.html.md.erb | 85 + .../gp_persistent_relfile_node.html.md.erb | 96 + .../gp_persistent_tablespace_node.html.md.erb | 72 + reference/catalog/gp_relfile_node.html.md.erb | 19 + .../gp_segment_configuration.html.md.erb | 25 + .../catalog/gp_version_at_initdb.html.md.erb | 17 + reference/catalog/pg_aggregate.html.md.erb | 25 + reference/catalog/pg_am.html.md.erb | 38 + reference/catalog/pg_amop.html.md.erb | 20 + reference/catalog/pg_amproc.html.md.erb | 19 + reference/catalog/pg_appendonly.html.md.erb | 29 + reference/catalog/pg_attrdef.html.md.erb | 19 + reference/catalog/pg_attribute.html.md.erb | 32 + .../catalog/pg_attribute_encoding.html.md.erb | 18 + reference/catalog/pg_auth_members.html.md.erb | 19 + reference/catalog/pg_authid.html.md.erb | 36 + reference/catalog/pg_cast.html.md.erb | 23 + reference/catalog/pg_class.html.md.erb | 213 ++ reference/catalog/pg_compression.html.md.erb | 22 + reference/catalog/pg_constraint.html.md.erb | 30 + reference/catalog/pg_conversion.html.md.erb | 22 + reference/catalog/pg_database.html.md.erb | 26 + reference/catalog/pg_depend.html.md.erb | 26 + reference/catalog/pg_description.html.md.erb | 17 + reference/catalog/pg_exttable.html.md.erb | 23 + reference/catalog/pg_filespace.html.md.erb | 19 + .../catalog/pg_filespace_entry.html.md.erb | 18 + reference/catalog/pg_index.html.md.erb | 23 + reference/catalog/pg_inherits.html.md.erb | 16 + reference/catalog/pg_language.html.md.erb | 21 + reference/catalog/pg_largeobject.html.md.erb | 19 + reference/catalog/pg_listener.html.md.erb | 20 + reference/catalog/pg_locks.html.md.erb | 35 + reference/catalog/pg_namespace.html.md.erb | 18 + reference/catalog/pg_opclass.html.md.erb | 22 + reference/catalog/pg_operator.html.md.erb | 32 + reference/catalog/pg_partition.html.md.erb | 20 + .../catalog/pg_partition_columns.html.md.erb | 20 + .../catalog/pg_partition_encoding.html.md.erb | 18 + reference/catalog/pg_partition_rule.html.md.erb | 28 + .../catalog/pg_partition_templates.html.md.erb | 30 + reference/catalog/pg_partitions.html.md.erb | 30 + reference/catalog/pg_pltemplate.html.md.erb | 22 + reference/catalog/pg_proc.html.md.erb | 36 + reference/catalog/pg_resqueue.html.md.erb | 30 + .../catalog/pg_resqueue_status.html.md.erb | 94 + reference/catalog/pg_rewrite.html.md.erb | 20 + reference/catalog/pg_roles.html.md.erb | 31 + reference/catalog/pg_shdepend.html.md.erb | 28 + reference/catalog/pg_shdescription.html.md.erb | 18 + reference/catalog/pg_stat_activity.html.md.erb | 30 + .../catalog/pg_stat_last_operation.html.md.erb | 21 + .../pg_stat_last_shoperation.html.md.erb | 23 + .../catalog/pg_stat_operations.html.md.erb | 87 + .../pg_stat_partition_operations.html.md.erb | 28 + reference/catalog/pg_statistic.html.md.erb | 30 + reference/catalog/pg_stats.html.md.erb | 27 + reference/catalog/pg_tablespace.html.md.erb | 22 + reference/catalog/pg_trigger.html.md.erb | 114 + reference/catalog/pg_type.html.md.erb | 176 + reference/catalog/pg_type_encoding.html.md.erb | 15 + reference/catalog/pg_window.html.md.erb | 97 + .../cli/admin_utilities/analyzedb.html.md.erb | 160 + .../cli/admin_utilities/gpfdist.html.md.erb | 157 + .../cli/admin_utilities/gplogfilter.html.md.erb | 180 + .../admin_utilities/hawqactivate.html.md.erb | 85 + .../cli/admin_utilities/hawqcheck.html.md.erb | 123 + .../admin_utilities/hawqcheckperf.html.md.erb | 137 + .../cli/admin_utilities/hawqconfig.html.md.erb | 132 + .../cli/admin_utilities/hawqextract.html.md.erb | 293 ++ .../admin_utilities/hawqfilespace.html.md.erb | 147 + .../cli/admin_utilities/hawqinit.html.md.erb | 150 + .../cli/admin_utilities/hawqload.html.md.erb | 420 +++ .../admin_utilities/hawqregister.html.md.erb | 191 ++ .../cli/admin_utilities/hawqrestart.html.md.erb | 112 + .../cli/admin_utilities/hawqscp.html.md.erb | 95 + .../admin_utilities/hawqssh-exkeys.html.md.erb | 102 + .../cli/admin_utilities/hawqssh.html.md.erb | 105 + .../cli/admin_utilities/hawqstart.html.md.erb | 119 + .../cli/admin_utilities/hawqstate.html.md.erb | 65 + .../cli/admin_utilities/hawqstop.html.md.erb | 103 + .../cli/client_utilities/createdb.html.md.erb | 105 + .../cli/client_utilities/createuser.html.md.erb | 158 + .../cli/client_utilities/dropdb.html.md.erb | 86 + .../cli/client_utilities/dropuser.html.md.erb | 78 + .../cli/client_utilities/pg_dump.html.md.erb | 252 ++ .../cli/client_utilities/pg_dumpall.html.md.erb | 180 + .../cli/client_utilities/pg_restore.html.md.erb | 256 ++ reference/cli/client_utilities/psql.html.md.erb | 760 +++++ .../cli/client_utilities/vacuumdb.html.md.erb | 120 + reference/cli/management_tools.html.md.erb | 63 + reference/guc/guc_category-list.html.md.erb | 404 +++ reference/guc/guc_config.html.md.erb | 77 + reference/guc/parameter_definitions.html.md.erb | 3158 ++++++++++++++++++ reference/hawq-reference.html.md.erb | 43 + reference/sql/ABORT.html.md.erb | 37 + reference/sql/ALTER-AGGREGATE.html.md.erb | 68 + reference/sql/ALTER-DATABASE.html.md.erb | 52 + reference/sql/ALTER-FUNCTION.html.md.erb | 108 + reference/sql/ALTER-OPERATOR-CLASS.html.md.erb | 43 + reference/sql/ALTER-OPERATOR.html.md.erb | 50 + reference/sql/ALTER-RESOURCE-QUEUE.html.md.erb | 132 + reference/sql/ALTER-ROLE.html.md.erb | 182 + reference/sql/ALTER-TABLE.html.md.erb | 422 +++ reference/sql/ALTER-TABLESPACE.html.md.erb | 55 + reference/sql/ALTER-TYPE.html.md.erb | 54 + reference/sql/ALTER-USER.html.md.erb | 44 + reference/sql/ANALYZE.html.md.erb | 75 + reference/sql/BEGIN.html.md.erb | 58 + reference/sql/CHECKPOINT.html.md.erb | 23 + reference/sql/CLOSE.html.md.erb | 45 + reference/sql/COMMIT.html.md.erb | 43 + reference/sql/COPY.html.md.erb | 256 ++ reference/sql/CREATE-AGGREGATE.html.md.erb | 162 + reference/sql/CREATE-DATABASE.html.md.erb | 86 + reference/sql/CREATE-EXTERNAL-TABLE.html.md.erb | 333 ++ reference/sql/CREATE-FUNCTION.html.md.erb | 190 ++ reference/sql/CREATE-GROUP.html.md.erb | 43 + reference/sql/CREATE-LANGUAGE.html.md.erb | 93 + reference/sql/CREATE-OPERATOR-CLASS.html.md.erb | 153 + reference/sql/CREATE-OPERATOR.html.md.erb | 171 + reference/sql/CREATE-RESOURCE-QUEUE.html.md.erb | 139 + reference/sql/CREATE-ROLE.html.md.erb | 196 ++ reference/sql/CREATE-SCHEMA.html.md.erb | 63 + reference/sql/CREATE-SEQUENCE.html.md.erb | 135 + reference/sql/CREATE-TABLE-AS.html.md.erb | 126 + reference/sql/CREATE-TABLE.html.md.erb | 455 +++ reference/sql/CREATE-TABLESPACE.html.md.erb | 58 + reference/sql/CREATE-TYPE.html.md.erb | 185 + reference/sql/CREATE-USER.html.md.erb | 46 + reference/sql/CREATE-VIEW.html.md.erb | 88 + reference/sql/DEALLOCATE.html.md.erb | 42 + reference/sql/DECLARE.html.md.erb | 84 + reference/sql/DROP-AGGREGATE.html.md.erb | 48 + reference/sql/DROP-DATABASE.html.md.erb | 48 + reference/sql/DROP-EXTERNAL-TABLE.html.md.erb | 48 + reference/sql/DROP-FILESPACE.html.md.erb | 42 + reference/sql/DROP-FUNCTION.html.md.erb | 55 + reference/sql/DROP-GROUP.html.md.erb | 31 + reference/sql/DROP-OPERATOR-CLASS.html.md.erb | 54 + reference/sql/DROP-OPERATOR.html.md.erb | 64 + reference/sql/DROP-OWNED.html.md.erb | 50 + reference/sql/DROP-RESOURCE-QUEUE.html.md.erb | 65 + reference/sql/DROP-ROLE.html.md.erb | 43 + reference/sql/DROP-SCHEMA.html.md.erb | 45 + reference/sql/DROP-SEQUENCE.html.md.erb | 45 + reference/sql/DROP-TABLE.html.md.erb | 47 + reference/sql/DROP-TABLESPACE.html.md.erb | 42 + reference/sql/DROP-TYPE.html.md.erb | 45 + reference/sql/DROP-USER.html.md.erb | 31 + reference/sql/DROP-VIEW.html.md.erb | 45 + reference/sql/END.html.md.erb | 37 + reference/sql/EXECUTE.html.md.erb | 45 + reference/sql/EXPLAIN.html.md.erb | 94 + reference/sql/FETCH.html.md.erb | 146 + reference/sql/GRANT.html.md.erb | 180 + reference/sql/INSERT.html.md.erb | 111 + reference/sql/PREPARE.html.md.erb | 67 + reference/sql/REASSIGN-OWNED.html.md.erb | 48 + reference/sql/RELEASE-SAVEPOINT.html.md.erb | 48 + reference/sql/RESET.html.md.erb | 45 + reference/sql/REVOKE.html.md.erb | 101 + reference/sql/ROLLBACK-TO-SAVEPOINT.html.md.erb | 77 + reference/sql/ROLLBACK.html.md.erb | 43 + reference/sql/SAVEPOINT.html.md.erb | 66 + reference/sql/SELECT-INTO.html.md.erb | 55 + reference/sql/SELECT.html.md.erb | 507 +++ reference/sql/SET-ROLE.html.md.erb | 72 + .../sql/SET-SESSION-AUTHORIZATION.html.md.erb | 66 + reference/sql/SET.html.md.erb | 87 + reference/sql/SHOW.html.md.erb | 47 + reference/sql/TRUNCATE.html.md.erb | 52 + reference/sql/VACUUM.html.md.erb | 92 + reference/toolkit/hawq_toolkit.html.md.erb | 263 ++ requirements/system-requirements.html.md.erb | 197 ++ .../ConfigureResourceManagement.html.md.erb | 120 + resourcemgmt/HAWQResourceManagement.html.md.erb | 69 + resourcemgmt/ResourceManagerStatus.html.md.erb | 152 + resourcemgmt/ResourceQueues.html.md.erb | 204 ++ resourcemgmt/YARNIntegration.html.md.erb | 252 ++ resourcemgmt/best-practices.html.md.erb | 15 + resourcemgmt/index.md.erb | 12 + troubleshooting/Troubleshooting.html.md.erb | 101 + 425 files changed, 43058 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/Gemfile ---------------------------------------------------------------------- diff --git a/Gemfile b/Gemfile new file mode 100644 index 0000000..f66d333 --- /dev/null +++ b/Gemfile @@ -0,0 +1,5 @@ +source "https://rubygems.org" + +gem 'bookbindery' + +gem 'libv8', '3.16.14.7' http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/Gemfile.lock ---------------------------------------------------------------------- diff --git a/Gemfile.lock b/Gemfile.lock new file mode 100644 index 0000000..3c483c0 --- /dev/null +++ b/Gemfile.lock @@ -0,0 +1,203 @@ +GEM + remote: https://rubygems.org/ + specs: + activesupport (4.2.7.1) + i18n (~> 0.7) + json (~> 1.7, >= 1.7.7) + minitest (~> 5.1) + thread_safe (~> 0.3, >= 0.3.4) + tzinfo (~> 1.1) + addressable (2.4.0) + ansi (1.5.0) + bookbindery (9.12.0) + ansi (~> 1.4) + css_parser + elasticsearch + fog-aws (~> 0.7.1) + font-awesome-sass + git (~> 1.2.8) + middleman (~> 3.4.0) + middleman-livereload (~> 3.4.3) + middleman-syntax (~> 2.0) + nokogiri (= 1.6.7.2) + puma + rack-rewrite + redcarpet (~> 3.2.3) + rouge (!= 1.9.1) + therubyracer + thor + builder (3.2.2) + capybara (2.4.4) + mime-types (>= 1.16) + nokogiri (>= 1.3.3) + rack (>= 1.0.0) + rack-test (>= 0.5.4) + xpath (~> 2.0) + chunky_png (1.3.6) + coffee-script (2.4.1) + coffee-script-source + execjs + coffee-script-source (1.10.0) + compass (1.0.3) + chunky_png (~> 1.2) + compass-core (~> 1.0.2) + compass-import-once (~> 1.0.5) + rb-fsevent (>= 0.9.3) + rb-inotify (>= 0.9) + sass (>= 3.3.13, < 3.5) + compass-core (1.0.3) + multi_json (~> 1.0) + sass (>= 3.3.0, < 3.5) + compass-import-once (1.0.5) + sass (>= 3.2, < 3.5) + css_parser (1.4.5) + addressable + elasticsearch (2.0.0) + elasticsearch-api (= 2.0.0) + elasticsearch-transport (= 2.0.0) + elasticsearch-api (2.0.0) + multi_json + elasticsearch-transport (2.0.0) + faraday + multi_json + em-websocket (0.5.1) + eventmachine (>= 0.12.9) + http_parser.rb (~> 0.6.0) + erubis (2.7.0) + eventmachine (1.2.0.1) + excon (0.51.0) + execjs (2.7.0) + faraday (0.9.2) + multipart-post (>= 1.2, < 3) + ffi (1.9.14) + fog-aws (0.7.6) + fog-core (~> 1.27) + fog-json (~> 1.0) + fog-xml (~> 0.1) + ipaddress (~> 0.8) + fog-core (1.42.0) + builder + excon (~> 0.49) + formatador (~> 0.2) + fog-json (1.0.2) + fog-core (~> 1.0) + multi_json (~> 1.10) + fog-xml (0.1.2) + fog-core + nokogiri (~> 1.5, >= 1.5.11) + font-awesome-sass (4.6.2) + sass (>= 3.2) + formatador (0.2.5) + git (1.2.9.1) + haml (4.0.7) + tilt + hike (1.2.3) + hooks (0.4.1) + uber (~> 0.0.14) + http_parser.rb (0.6.0) + i18n (0.7.0) + ipaddress (0.8.3) + json (1.8.3) + kramdown (1.12.0) + libv8 (3.16.14.7) + listen (3.0.8) + rb-fsevent (~> 0.9, >= 0.9.4) + rb-inotify (~> 0.9, >= 0.9.7) + middleman (3.4.1) + coffee-script (~> 2.2) + compass (>= 1.0.0, < 2.0.0) + compass-import-once (= 1.0.5) + execjs (~> 2.0) + haml (>= 4.0.5) + kramdown (~> 1.2) + middleman-core (= 3.4.1) + middleman-sprockets (>= 3.1.2) + sass (>= 3.4.0, < 4.0) + uglifier (~> 2.5) + middleman-core (3.4.1) + activesupport (~> 4.1) + bundler (~> 1.1) + capybara (~> 2.4.4) + erubis + hooks (~> 0.3) + i18n (~> 0.7.0) + listen (~> 3.0.3) + padrino-helpers (~> 0.12.3) + rack (>= 1.4.5, < 2.0) + thor (>= 0.15.2, < 2.0) + tilt (~> 1.4.1, < 2.0) + middleman-livereload (3.4.6) + em-websocket (~> 0.5.1) + middleman-core (>= 3.3) + rack-livereload (~> 0.3.15) + middleman-sprockets (3.4.2) + middleman-core (>= 3.3) + sprockets (~> 2.12.1) + sprockets-helpers (~> 1.1.0) + sprockets-sass (~> 1.3.0) + middleman-syntax (2.1.0) + middleman-core (>= 3.2) + rouge (~> 1.0) + mime-types (3.1) + mime-types-data (~> 3.2015) + mime-types-data (3.2016.0521) + mini_portile2 (2.0.0) + minitest (5.9.0) + multi_json (1.12.1) + multipart-post (2.0.0) + nokogiri (1.6.7.2) + mini_portile2 (~> 2.0.0.rc2) + padrino-helpers (0.12.8) + i18n (~> 0.6, >= 0.6.7) + padrino-support (= 0.12.8) + tilt (~> 1.4.1) + padrino-support (0.12.8) + activesupport (>= 3.1) + puma (3.6.0) + rack (1.6.4) + rack-livereload (0.3.16) + rack + rack-rewrite (1.5.1) + rack-test (0.6.3) + rack (>= 1.0) + rb-fsevent (0.9.7) + rb-inotify (0.9.7) + ffi (>= 0.5.0) + redcarpet (3.2.3) + ref (2.0.0) + rouge (1.11.1) + sass (3.4.22) + sprockets (2.12.4) + hike (~> 1.2) + multi_json (~> 1.0) + rack (~> 1.0) + tilt (~> 1.1, != 1.3.0) + sprockets-helpers (1.1.0) + sprockets (~> 2.0) + sprockets-sass (1.3.1) + sprockets (~> 2.0) + tilt (~> 1.1) + therubyracer (0.12.2) + libv8 (~> 3.16.14.0) + ref + thor (0.19.1) + thread_safe (0.3.5) + tilt (1.4.1) + tzinfo (1.2.2) + thread_safe (~> 0.1) + uber (0.0.15) + uglifier (2.7.2) + execjs (>= 0.3.0) + json (>= 1.8.0) + xpath (2.0.0) + nokogiri (~> 1.3) + +PLATFORMS + ruby + +DEPENDENCIES + bookbindery + libv8 (= 3.16.14.7) + +BUNDLED WITH + 1.11.2 http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/LICENSE ---------------------------------------------------------------------- diff --git a/LICENSE b/LICENSE new file mode 100755 index 0000000..e434046 --- /dev/null +++ b/LICENSE @@ -0,0 +1,176 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/NOTICE ---------------------------------------------------------------------- diff --git a/NOTICE b/NOTICE new file mode 100755 index 0000000..f2821a9 --- /dev/null +++ b/NOTICE @@ -0,0 +1,15 @@ +Apache HAWQ (incubating) + +Copyright (c) 2016 Pivotal Software, Inc. All Rights Reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/README.md ---------------------------------------------------------------------- diff --git a/README.md b/README.md new file mode 100644 index 0000000..3ba3492 --- /dev/null +++ b/README.md @@ -0,0 +1,52 @@ +# Apache HAWQ (incubating) End-User Documentation + +This repository provides the full source for Apache HAWQ (incubating) end-user documentation in MarkDown format. The source files can be built into HTML output using [Bookbinder](https://github.com/cloudfoundry-incubator/bookbinder) or other MarkDown tools. + +Bookbinder is a gem that binds together a unified documentation web application from markdown, html, and/or DITA source material. The source material for bookbinder must be stored either in local directories or in GitHub repositories. Bookbinder runs [middleman](http://middlemanapp.com/) to produce a Rackup app that can be deployed locally or as a Web application. + +This document contains instructions for building the local Apache HAWQ (incubating) documentation. It contains the sections: + +* [Bookbinder Usage](#usage) +* [Prerequisites](#prereq) +* [Building the Documentation](#building) +* [Publishing the Documentation](#publishing) +* [Getting More Information](#moreinfo) + +<a name="usage"></a> +## Bookbinder Usage + +Bookbinder is meant to be used from within a project called a **book**. The book includes a configuration file that describes which documentation repositories to use as source materials. Bookbinder provides a set of scripts to aggregate those repositories and publish them to various locations. + +For Apache HAWQ (incubating), a preconfigured **book** is provided in the directory `/hawq-book`. You can use this configuration to build HTML for Apache HAWQ (incubating) on your local system. + +<a name="prereq"></a> +## Prerequisites + +* Bookbinder requires Ruby version 2.0.0-p195 or higher. + +<a name="building"></a> +## Building the Documentation + +1. Begin by moving or copying the `/hawq-book directory` to a directory that is parallel to `incubator-hawq/docs-apache-hawq-md`. For example: + + $ cd /repos/incubator-hawq/docs-apache-hawq-md + $ cp -r hawq-book .. + $ cd ../hawq-book + +2. The GemFile in the book directory already defines the `gem "bookbindery"` dependency. Make sure you are in the relocated book directory and enter: + + $ bundle install + +3. The installed `config.yml` file configures the Apache HAWQ (incubating) book for building locally. Build the files with the command: + + $ bundle exec bookbinder bind local + + Bookbinder converts the XML source into HTML, putting the final output in the `final_app` directory. + +5. Because the `final_app` directory contains the full output of the HTML conversion process, you can easily publish this directory as a hosted Web application. `final_app` contains a default configuration to serve the local files using the Rack web server: + + $ cd final_app + $ bundle install + $ rackup + + You can now view the local documentation at [http://localhost:9292](http://localhost:9292) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/admin/BackingUpandRestoringHAWQDatabases.html.md.erb ---------------------------------------------------------------------- diff --git a/admin/BackingUpandRestoringHAWQDatabases.html.md.erb b/admin/BackingUpandRestoringHAWQDatabases.html.md.erb new file mode 100644 index 0000000..f7031ed --- /dev/null +++ b/admin/BackingUpandRestoringHAWQDatabases.html.md.erb @@ -0,0 +1,373 @@ +--- +title: Backing Up and Restoring HAWQ +--- + +This chapter provides information on backing up and restoring databases in HAWQ system. + +As an administrator, you will need to back up and restore your database. HAWQ provides three utilities to help you back up your data: + +- `gpfdist` +- PXF +- `pg_dump` + +`gpfdist` and PXF are parallel loading and unloading tools that provide the best performance.  You can use `pg_dump`, a non-parallel utility inherited from PostgreSQL. + +In addition, in some situations you should back up your raw data from ETL processes. + +This section describes these three utilities, as well as raw data backup, to help you decide what fits your needs. + +## <a id="usinggpfdistorpxf"></a>About gpfdist and PXF + +You can perform a parallel backup in HAWQ using `gpfdist` or PXF to unload all data to external tables. Backup files can reside on a local file system or HDFS. To recover tables, you can load data back from external tables to the database. + +### <a id="performingaparallelbackup"></a>Performing a Parallel Backup + +1. Check the database size to ensure that the file system has enough space to save the backed up files. +2. Use the `pg_dump` utility to dump the schema of the target database. +3. Create a writable external table for each table to back up to that database. +4. Load table data into the newly created external tables. + +> **Note:** Put the insert statements in a single transaction to prevent problems if you perform any update operations during the backup. + + +### <a id="restoringfromabackup"></a>Restoring from a Backup + +1. Create a database to recover to. +2. Recreate the schema from the schema file \(created during the `pg_dump` process\). +3. Create a readable external table for each table in the database. +4. Load data from the external table to the actual table. +5. Run the `ANALYZE` command once loading is complete. This ensures that the query planner generates optimal plan based on up-to-date table statistics. + +### <a id="differencesbetweengpfdistandpxf"></a>Differences between gpfdist and PXF + +`gpfdist` and PXF differ in the following ways: + +- `gpfdist` stores backup files on local file system, while PXF stores files on HDFS. +- `gpfdist` only supports plain text format, while PXF also supports binary format like AVRO and customized format. +- `gpfdist` doesnât support generating compressed files, while PXF supports compression \(you can specify a compression codec used in Hadoop such as `org.apache.hadoop.io.compress.GzipCodec`\). +- Both `gpfdist` and PXF have fast loading performance, but `gpfdist` is much faster than PXF. + +## <a id="usingpg_dumpandpg_restore"></a>About pg\_dump and pg\_restore + +HAWQ supports the PostgreSQL backup and restore utilities, `pg_dump` and `pg_restore`. The `pg_dump` utility creates a single, large dump file in the master host containing the data from all active segments. The `pg_restore` utility restores a HAWQ database from the archive created by `pg_dump`. In most cases, this is probably not practical, as there is most likely not enough disk space in the master host for creating a single backup file of an entire distributed database. HAWQ supports these utilities in case you are migrating data from PostgreSQL to HAWQ. + +To create a backup archive for database `mydb`: + +```shell +$ pg_dump -Ft -f mydb.tar mydb +``` + +To create a compressed backup using custom format and compression level 3: + +```shell +$ pg_dump -Fc -Z3 -f mydb.dump mydb +``` + +To restore from an archive using `pg_restore`: + +```shell +$ pg_restore -d new_db mydb.dump +``` + +## <a id="aboutbackinguprawdata"></a>About Backing Up Raw Data + +Parallel backup using `gpfdist` or PXF works fine in most cases. There are a couple of situations where you cannot perform parallel backup and restore operations: + +- Performing periodically incremental backups. +- Dumping a large data volume to external tables - this process takes a long time. + +In such situations, you can back up raw data generated during ETL processes and reload it into HAWQ. This provides the flexibility to choose where you store backup files. + +## <a id="estimatingthebestpractice"></a>Selecting a Backup Strategy/Utility + +The table below summaries the differences between the four approaches we discussed above. + +<table> + <tr> + <th></th> + <th><code>gpfdist</code></th> + <th>PXF</th> + <th><code>pg_dump</code></th> + <th>Raw Data Backup</th> + </tr> + <tr> + <td><b>Parallel</b></td> + <td>Yes</td> + <td>Yes</td> + <td>No</td> + <td>No</td> + </tr> + <tr> + <td><b>Incremental Backup</b></td> + <td>No</td> + <td>No</td> + <td>No</td> + <td>Yes</td> + </tr> + <tr> + <td><b>Backup Location</b></td> + <td>Local FS</td> + <td>HDFS</td> + <td>Local FS</td> + <td>Local FS, HDFS</td> + </tr> + <tr> + <td><b>Format</b></td> + <td>Text, CSV</td> + <td>Text, CSV, Custom</td> + <td>Text, Tar, Custom</td> + <td>Depends on format of row data</td> + </tr> + <tr> +<td><b>Compression</b></td><td>No</td><td>Yes</td><td>Only support custom format</td><td>Optional</td></tr> +<tr><td><b>Scalability</b></td><td>Good</td><td>Good</td><td>---</td><td>Good</td></tr> +<tr><td><b>Performance</b></td><td>Fast loading, Fast unloading</td><td>Fast loading, Normal unloading</td><td>---</td><td>Fast (Just file copy)</td><tr> +</table> + +## <a id="estimatingspacerequirements"></a>Estimating Space Requirements + +Before you back up your database, ensure that you have enough space to store backup files. This section describes how to get the database size and estimate space requirements. + +- Use `hawq_toolkit` to query size of the database you want to backup. + + ``` + mydb=# SELECT sodddatsize FROM hawq_toolkit.hawq_size_of_database WHERE sodddatname=âmydbâ; + ``` + + If tables in your database are compressed, this query shows the compressed size of the database. + +- Estimate the total size of the backup files. + - If your database tables and backup files are both compressed, you can use the value `sodddatsize` as an estimate value. + - If your database tables are compressed  and backup files are not, you need to multiply `sodddatsize` by the compression ratio. Although this depends on the compression algorithms, you can use an empirical value such as 300%. + - If your back files are compressed and database tables are not, you need to divide `sodddatsize` by the compression ratio. +- Get space requirement. + - If you use HDFS with PXF, the space requirement is `size_of_backup_files * replication_factor`. + + - If you use gpfdist, the space requirement for each gpfdist instance is `size_of_backup_files / num_gpfdist_instances` since table data will be evenly distributed to all `gpfdist` instances. + + +## <a id="usinggpfdist"></a>Using gpfdist + +This section discusses `gpfdist` and shows an example of how to backup and restore HAWQ database. + +`gpfdist` is HAWQâs parallel file distribution program. It is used by readable external tables and `hawq load` to serve external table files to all HAWQ segments in parallel. It is used by writable external tables to accept output streams from HAWQ segments in parallel and write them out to a file. + +To use `gpfdist`, start the `gpfdist` server program on the host where you want to store backup files. You can start multiple `gpfdist` instances on the same host or on different hosts. For each `gpfdist` instance, you specify a directory from which `gpfdist` will serve files for readable external tables or create output files for writable external tables. For example, if you have a dedicated machine for backup with two disks, you can start two `gpfdist` instances, each using one disk: + + + +You can also run `gpfdist` instances on each segment host. During backup, table data will be evenly distributed to all `gpfdist` instances specified in the `LOCATION` clause in the `CREATE EXTERNAL TABLE` definition. + + + +### <a id="example"></a>Example + +This example of using `gpfdist` backs up and restores a 1TB `tpch` database. To do so, start two `gpfdist` instances on the backup host `sdw1` with two 1TB disks \(One disk mounts at `/data1`, another disk mounts at `/data2`\). + +#### <a id="usinggpfdisttobackupthetpchdatabase"></a>Using gpfdist to Back Up the tpch Database + +1. Create backup locations and start the `gpfdist` instances. + + In this example, issuing the first command creates two folders on two different disks with the same postfix `backup/tpch_20140627`. These folders are labeled as backups of the `tpch` database on 2014-06-27. In the next two commands, the example shows two `gpfdist` instances, one using port 8080, and another using port 8081: + + ```shell + sdw1$ mkdir -p /data1/gpadmin/backup/tpch_20140627 /data2/gpadmin/backup/tpch_20140627 + sdw1$ gpfdist -d /data1/gpadmin/backup/tpch_20140627 -p 8080 & + sdw1$ gpfdist -d /data2/gpadmin/backup/tpch_20140627 -p 8081 & + ``` + +2. Save the schema for the database: + + ```shell + master_host$ pg_dump --schema-only -f tpch.schema tpch + master_host$ scp tpch.schema sdw1:/data1/gpadmin/backup/tpch_20140627 + ``` + + On the HAWQ master host, use the `pg_dump` utility to save the schema of the tpch database to the file tpch.schema. Copy the schema file to the backup location to restore the database schema. + +3. Create a writable external table for each table in the database: + + ```shell + master_host$ psql tpch + ``` + ```sql + tpch=# create writable external table wext_orders (like orders) + tpch-# location('gpfdist://sdw1:8080/orders1.csv', 'gpfdist://sdw1:8081/orders2.csv') format 'CSV'; + tpch=# create writable external table wext_lineitem (like lineitem) + tpch-# location('gpfdist://sdw1:8080/lineitem1.csv', 'gpfdist://sdw1:8081/lineitem2.csv') format 'CSV'; + ``` + + The sample shows two tables in the `tpch` database: `orders` and `line item`. The sample shows that two corresponding external tables are created. Specify a location or each `gpfdist` instance in the `LOCATION` clause. This sample uses the CSV text format here, but you can also choose other delimited text formats. For more information, see the `CREATE EXTERNAL TABLE` SQL command. + +4. Unload data to the external tables: + + ```sql + tpch=# begin; + tpch=# insert into wext_orders select * from orders; + tpch=# insert into wext_lineitem select * from lineitem; + tpch=# commit; + ``` + +5. **\(Optional\)** Stop `gpfdist` servers to free ports for other processes: + + Find the progress ID and kill the process: + + ```shell + sdw1$ ps -ef | grep gpfdist + sdw1$ kill 612368; kill 612369 + ``` + + +#### <a id="torecoverusinggpfdist"></a>Recovering Using gpfdist + +1. Restart `gpfdist` instances if they arenât running: + + ```shell + sdw1$ gpfdist -d /data1/gpadmin/backup/tpch_20140627 -p 8080 & + sdw1$ gpfdist -d /data2/gpadmin/backup/tpch_20140627 -p 8081 & + ``` + +2. Create a new database and restore the schema: + + ```shell + master_host$ createdb tpch2 + master_host$ scp sdw1:/data1/gpadmin/backup/tpch_20140627/tpch.schema . + master_host$ psql -f tpch.schema -d tpch2 + ``` + +3. Create a readable external table for each table: + + ```shell + master_host$ psql tpch2 + ``` + + ```sql + tpch2=# create external table rext_orders (like orders) location('gpfdist://sdw1:8080/orders1.csv', 'gpfdist://sdw1:8081/orders2.csv') format 'CSV'; + tpch2=# create external table rext_lineitem (like lineitem) location('gpfdist://sdw1:8080/lineitem1.csv', 'gpfdist://sdw1:8081/lineitem2.csv') format 'CSV'; + ``` + + **Note:** The location clause is the same as the writable external table above. + +4. Load data back from external tables: + + ```sql + tpch2=# insert into orders select * from rext_orders; + tpch2=# insert into lineitem select * from rext_lineitem; + ``` + +5. Run the `ANALYZE` command after data loading: + + ```sql + tpch2=# analyze; + ``` + + +### <a id="troubleshootinggpfdist"></a>Troubleshooting gpfdist + +Keep in mind that `gpfdist` is accessed at runtime by the segment instances. Therefore, you must ensure that the HAWQ segment hosts have network access to gpfdist. Since the `gpfdist` program is a  web server, to test connectivity you can run the following command from each host in your HAWQ array \(segments and master\): + +```shell +$ wget http://gpfdist_hostname:port/filename +``` + +Also, make sure that your `CREATE EXTERNAL TABLE` definition has the correct host name, port, and file names for `gpfdist`. The file names and paths specified should be relative to the directory where gpfdist is serving files \(the directory path used when you started the `gpfdist` program\). See âDefining External Tables - Examplesâ. + +## <a id="usingpxf"></a>Using PXF + +HAWQ Extension Framework \(PXF\) is an extensible framework that allows HAWQ to query external system data. The details of how to install and use PXF can be found in [Working with PXF and External Data](../pxf/HawqExtensionFrameworkPXF.html). + +### <a id="usingpxftobackupthetpchdatabase"></a>Using PXF to Back Up the tpch Database + +1. Create a folder on HDFS for this backup: + + ```shell + master_host$ hdfs dfs -mkdir -p /backup/tpch-2014-06-27 + ``` + +2. Dump the database schema using `pg_dump` and store the schema file in a backup folder: + + ```shell + master_host$ pg_dump --schema-only -f tpch.schema tpch + master_host$ hdfs dfs -copyFromLocal tpch.schema /backup/tpch-2014-06-27 + ``` + +3. Create a writable external table for each table in the database: + + ```shell + master_host$ psql tpch + ``` + + ```sql + tpch=# CREATE WRITABLE EXTERNAL TABLE wext_orders (LIKE orders) + tpch-# LOCATION('pxf://namenode_host:51200/backup/tpch-2014-06-27/orders' + tpch-# '?Profile=HdfsTextSimple' + tpch-# '&COMPRESSION_CODEC=org.apache.hadoop.io.compress.SnappyCodec' + tpch-# ) + tpch-# FORMAT 'TEXT'; + + tpch=# CREATE WRITABLE EXTERNAL TABLE wext_lineitem (LIKE lineitem) + tpch-# LOCATION('pxf://namenode_host:51200/backup/tpch-2014-06-27/lineitem' + tpch-# '?Profile=HdfsTextSimple' + tpch-# '&COMPRESSION_CODEC=org.apache.hadoop.io.compress.SnappyCodec') + tpch-# FORMAT 'TEXT'; + ``` + + Here, all backup files for the `orders` table go in the /backup/tpch-2014-06-27/orders folder, all backup files for the `lineitem` table go in /backup/tpch-2014-06-27/lineitem folder. We use snappy compression to save disk space. + +4. Unload the data to external tables: + + ```sql + tpch=# BEGIN; + tpch=# INSERT INTO wext_orders SELECT * FROM orders; + tpch=# INSERT INTO wext_lineitem SELECT * FROM lineitem; + tpch=# COMMIT; + ``` + +5. **\(Optional\)** Change the HDFS file replication factor for the backup folder. HDFS replicates each block into three blocks by default for reliability. You can decrease this number for your backup files if you need to: + + ```shell + master_host$ hdfs dfs -setrep 2 /backup/tpch-2014-06-27 + ``` + + **Note:** This only changes the replication factor for existing files; new files will still use the default replication factor. + + +### <a id="torecoverfromapxfbackup"></a>Recovering a PXF Backup + +1. Create a new database and restore the schema: + + ```shell + master_host$ createdb tpch2 + master_host$ hdfs dfs -copyToLocal /backup/tpch-2014-06-27/tpch.schema . + master_host$ psql -f tpch.schema -d tpch2 + ``` + +2. Create a readable external table for each table to restore: + + ```shell + master_host$ psql tpch2 + ``` + + ```sql + tpch2=# CREATE EXTERNAL TABLE rext_orders (LIKE orders) + tpch2-# LOCATION('pxf://namenode_host:51200/backup/tpch-2014-06-27/orders?Profile=HdfsTextSimple') + tpch2-# FORMAT 'TEXT'; + tpch2=# CREATE EXTERNAL TABLE rext_lineitem (LIKE lineitem) + tpch2-# LOCATION('pxf://namenode_host:51200/backup/tpch-2014-06-27/lineitem?Profile=HdfsTextSimple') + tpch2-# FORMAT 'TEXT'; + ``` + + The location clause is almost the same as above, except you donât have to specify the `COMPRESSION_CODEC` because PXF will automatically detect it. + +3. Load data back from external tables: + + ```sql + tpch2=# INSERT INTO ORDERS SELECT * FROM rext_orders; + tpch2=# INSERT INTO LINEITEM SELECT * FROM rext_lineitem; + ``` + +4. Run `ANALYZE` after data loading: + + ```sql + tpch2=# ANALYZE; + ``` http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/admin/ClusterExpansion.html.md.erb ---------------------------------------------------------------------- diff --git a/admin/ClusterExpansion.html.md.erb b/admin/ClusterExpansion.html.md.erb new file mode 100644 index 0000000..dcd96df --- /dev/null +++ b/admin/ClusterExpansion.html.md.erb @@ -0,0 +1,224 @@ +--- +title: Expanding a Cluster +--- + +Apache HAWQ supports dynamic node expansion. You can add segment nodes while HAWQ is running without having to suspend or terminate cluster operations. + +**Note:** This topic describes how to expand a cluster using the command-line interface. If you are using Ambari to manage your HAWQ cluster, see [Expanding the HAWQ Cluster](/20/admin/ambari-admin.html#amb-expand) in [Managing HAWQ Using Ambari](/20/admin/ambari-admin.html) + +## <a id="topic_kkc_tgb_h5"></a>Guidelines for Cluster Expansion + +This topic provides some guidelines around expanding your HAWQ cluster. + +There are several recommendations to keep in mind when modifying the size of your running HAWQ cluster: + +- When you add a new node, install both a DataNode and a physical segment on the new node. +- After adding a new node, you should always rebalance HDFS data to maintain cluster performance. +- Adding or removing a node also necessitates an update to the HDFS metadata cache. This update will happen eventually, but can take some time. To speed the update of the metadata cache, execute **`select gp_metadata_cache_clear();`**. +- Note that for hash distributed tables, expanding the cluster will not immediately improve performance since hash distributed tables use a fixed number of virtual segments. In order to obtain better performance with hash distributed tables, you must redistribute the table to the updated cluster by either the [ALTER TABLE](/20/reference/sql/ALTER-TABLE.html) or [CREATE TABLE AS](/20/reference/sql/CREATE-TABLE-AS.html) command. +- If you are using hash tables, consider updating the `default_hash_table_bucket_number` server configuration parameter to a larger value after expanding the cluster but before redistributing the hash tables. + +## <a id="task_hawq_expand"></a>Adding a New Node to an Existing HAWQ Cluster + +The following procedure describes the steps required to add a node to an existing HAWQ cluster. + +For example purposes in this procedure, we are adding a new node named `sdw4`. + +1. Prepare the target machine by checking operating system configurations and passwordless ssh. HAWQ requires passwordless ssh access to all cluster nodes. To set up passwordless ssh on the new node, perform the following steps: + 1. Login to the master HAWQ node as gpadmin. If you are logged in as a different user, switch to the gpadmin user and source the `greenplum_path.sh` file. + + ```shell + $ su - gpadmin + $ source /usr/local/hawq/greenplum_path.sh + ``` + + 2. On the HAWQ master node, change directories to /usr/local/hawq/etc. In this location, create a file called `new_hosts` and add the hostname\(s\) of the node\(s\) you wish to add to the existing HAWQ cluster, one per line. For example: + + ``` + sdw4 + ``` + + 3. Login to the master HAWQ node as root and source the `greenplum_path.sh` file. + + ```shell + $ su - root + $ source /usr/local/hawq/greenplum_path.sh + ``` + + 4. Execute the following hawq command to set up passwordless ssh for root on the new host machine: + + ```shell + $ hawq ssh-exkeys -e hawq_hosts -x new_hosts + ``` + + 5. Create the gpadmin user on the new host\(s\). + + ```shell + $ hawq ssh -f new_hosts -e '/usr/sbin/useradd gpadmin' + $ hawq ssh âf new_hosts -e 'echo -e "changeme\changeme" | passwd gpadmin' + ``` + + 6. Switch to the gpadmin user and source the `greenplum_path.sh` file again. + + ```shell + $ su - gpadmin + $ source /usr/local/hawq/greenplum_path.sh + ``` + + 7. Execute the following hawq command a second time to set up passwordless ssh for the gpadmin user: + + ```shell + $ hawq ssh-exkeys -e hawq_hosts -x new_hosts + ``` + + 8. After setting up passwordless ssh, you can execute the following hawq command to check the target machine's configuration. + + ```shell + $ hawq check -f new_hosts + ``` + + Configure operating system parameters as needed on the host machine. See the HAWQ installation documentation for a list of specific operating system parameters to configure. + +2. Login to the target host machine `sdw4` as the root user. If you are logged in as a different user, switch to the root account: + + ```shell + $ su - root + ``` + +3. If not already installed, install the target machine \(`sdw4`\) as an HDFS DataNode. +4. If you have any user-defined function (UDF) libraries installed in your existing HAWQ cluster, install them on the new node. +4. Download and install HAWQ on the target machine \(`sdw4`\) as described in the [software build instructions](https://cwiki.apache.org/confluence/display/HAWQ/Build+and+Install) or in the distribution installation documentation. +5. On the HAWQ master node, check current cluster and host information using `psql`. + + ```shell + $ psql -d postgres + ``` + + ```sql + postgres=# select * from gp_segment_configuration; + ``` + + ``` + registration_order | role | status | port | hostname | address + --------------------+------+--------+-------+----------+--------------- + -1 | s | u | 5432 | sdw1 | 192.0.2.0 + 0 | m | u | 5432 | mdw | rhel64-1 + 1 | p | u | 40000 | sdw3 | 192.0.2.2 + 2 | p | u | 40000 | sdw2 | 192.0.2.1 + (4 rows) + ``` + + At this point the new node does not appear in the cluster. + +6. Execute the following command to confirm that HAWQ was installed on the new host: + + ```shell + $ hawq ssh -f new_hosts -e "ls -l $GPHOME" + ``` + +7. On the master node, use a text editor to add hostname `sdw4` into the `hawq_hosts` file you created during HAWQ installation. \(If you do not already have this file, then you create it first and list all the nodes in your cluster.\) + + ``` + mdw + smdw + sdw1 + sdw2 + sdw3 + sdw4 + ``` + +8. On the master node, use a text editor to add hostname `sdw4` to the `$GPHOME/etc/slaves` file. This file lists all the segment host names for your cluster. For example: + + ``` + sdw1 + sdw2 + sdw3 + sdw4 + ``` + +9. Sync the `hawq-site.xml` and `slaves` configuration files to all nodes in the cluster \(as listed in hawq\_hosts\). + + ```shell + $ hawq scp -f hawq_hosts hawq-site.xml slaves =:$GPHOME/etc/ + ``` + +10. Make sure that the HDFS DataNode service has started on the new node. +11. On `sdw4`, create directories based on the values assigned to the following properties in `hawq-site.xml`. These new directories must be owned by the same database user \(for example, `gpadmin`\) who will execute the `hawq init segment` command in the next step. + - `hawq_segment_directory` + - `hawq_segment_temp_directory` + **Note:** The `hawq_segment_directory` must be empty. + +12. On `sdw4`, switch to the database user \(for example, `gpadmin`\), and initalize the segment. + + ```shell + $ su - gpadmin + $ hawq init segment + ``` + +13. On the master node, check current cluster and host information using `psql` to verify that the new `sdw4` node has initialized successfully. + + ```shell + $ psql -d postgres + ``` + + ```sql + postgres=# select * from gp_segment_configuration ; + ``` + + ``` + registration_order | role | status | port | hostname | address + --------------------+------+--------+-------+----------+--------------- + -1 | s | u | 5432 | sdw1 | 192.0.2.0 + 0 | m | u | 5432 | mdw | rhel64-1 + 1 | p | u | 40000 | sdw3 | 192.0.2.2 + 2 | p | u | 40000 | sdw2 | 192.0.2.1 + 3 | p | u | 40000 | sdw4 | 192.0.2.3 + (5 rows) + ``` + +14. To maintain optimal cluster performance, rebalance HDFS data by running the following command: +15. + ```shell + $ sudo -u hdfs hdfs balancer -threshold threshhold_value + ``` + + where *threshhold\_value* represents how much a DataNode's disk usage, in percentage, can differ from overall disk usage in the cluster. Adjust the threshold value according to the needs of your production data and disk. The smaller the value, the longer the rebalance time. +> + **Note:** If you do not specify a threshold, then a default value of 20 is used. If the balancer detects that a DataNode is using less than a 20% difference of the cluster's overall disk usage, then data on that node will not be rebalanced. For example, if disk usage across all DataNodes in the cluster is 40% of the cluster's total disk-storage capacity, then the balancer script ensures that a DataNode's disk usage is between 20% and 60% of that DataNode's disk-storage capacity. DataNodes whose disk usage falls within that percentage range will not be rebalanced. + + Rebalance time is also affected by network bandwidth. You can adjust network bandwidth used by the balancer by using the following command: + + ```shell + $ sudo -u hdfs hdfs dfsadmin -setBalancerBandwidth network_bandwith + ``` + + The default value is 1MB/s. Adjust the value according to your network. + +15. Speed up the clearing of the metadata cache by using the following command: + + ```shell + $ psql -d postgres + ``` + + ```sql + postgres=# select gp_metadata_cache_clear(); + ``` + +16. After expansion, if the new size of your cluster is greater than or equal \(#nodes >=4\) to 4, change the value of the `output.replace-datanode-on-failure` HDFS parameter in `hdfs-client.xml` to `false`. + +17. (Optional) If you are using hash tables, adjust the `default_hash_table_bucket_number` server configuration property to reflect the cluster's new size. Update this configuration's value by multiplying the new number of nodes in the cluster by the appropriate amount indicated below. + + |Number of Nodes After Expansion|Suggested default\_hash\_table\_bucket\_number value| + |---------------|------------------------------------------| + |<= 85|6 \* \#nodes| + |\> 85 and <= 102|5 \* \#nodes| + |\> 102 and <= 128|4 \* \#nodes| + |\> 128 and <= 170|3 \* \#nodes| + |\> 170 and <= 256|2 \* \#nodes| + |\> 256 and <= 512|1 \* \#nodes| + |\> 512|512| + +18. If you are using hash distributed tables and wish to take advantage of the performance benefits of using a larger cluster, redistribute the data in all hash-distributed tables by using either the [ALTER TABLE](/20/reference/sql/ALTER-TABLE.html) or [CREATE TABLE AS](/20/reference/sql/CREATE-TABLE-AS.html) command. You should redistribute the table data if you modified the `default_hash_table_bucket_number` configuration parameter. + + + **Note:** The redistribution of table data can take a significant amount of time. http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/admin/ClusterShrink.html.md.erb ---------------------------------------------------------------------- diff --git a/admin/ClusterShrink.html.md.erb b/admin/ClusterShrink.html.md.erb new file mode 100644 index 0000000..33c5cc2 --- /dev/null +++ b/admin/ClusterShrink.html.md.erb @@ -0,0 +1,55 @@ +--- +title: Removing a Node +--- + +This topic outlines the proper procedure for removing a node from a HAWQ cluster. + +In general, you should not need to remove nodes manually from running HAWQ clusters. HAWQ isolates any nodes that HAWQ detects as failing due to hardware or other types of errors. + +## <a id="topic_p53_ct3_kv"></a>Guidelines for Removing a Node + +If you do need to remove a node from a HAWQ cluster, keep in mind the following guidelines around removing nodes: + +- Never remove more than two nodes at a time since the risk of data loss is high. +- Only remove nodes during system maintenance windows when the cluster is not busy or running queries. + +## <a id="task_oy5_ct3_kv"></a>Removing a Node from a Running HAWQ Cluster + +The following is a typical procedure to remove a node from a running HAWQ cluster: + +1. Login as gpadmin to the node that you wish to remove and source `greenplum_path.sh`. + + ```shell + $ su - gpadmin + $ source /usr/local/hawq/greenplum_path.sh + ``` + +2. Make sure that there are no running QEs on the segment. Execute the following command to check for running QE processes: + + ```shell + $ ps -ef | grep postgres + ``` + + In the output, look for processes that contain SQL commands such as INSERT or SELECT. For example: + + ```shell + [gpadmin@rhel64-3 ~]$ ps -ef | grep postgres + gpadmin 3000 2999 0 Mar21 ? 00:00:08 postgres: port 40000, logger process + gpadmin 3003 2999 0 Mar21 ? 00:00:03 postgres: port 40000, stats collector process + gpadmin 3004 2999 0 Mar21 ? 00:00:50 postgres: port 40000, writer process + gpadmin 3005 2999 0 Mar21 ? 00:00:06 postgres: port 40000, checkpoint process + gpadmin 3006 2999 0 Mar21 ? 00:01:25 postgres: port 40000, segment resource manager + gpadmin 7880 2999 0 02:08 ? 00:00:00 postgres: port 40000, gpadmin postgres 192.0.2.0(33874) con11 seg0 cmd18 MPPEXEC INSERT + ``` + +3. Stop hawq on this segment by executing the following command: + + ```shell + $ hawq stop segment + ``` + +4. On HAWQ master, remove the hostname of the segment from the `slaves` file. Then sync the `slaves` file to all nodes in the cluster by executing the following command: + + ```shell + $ hawq scp -f hostfile slaves =: $GPHOME/etc/slaves + ``` http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/admin/FaultTolerance.html.md.erb ---------------------------------------------------------------------- diff --git a/admin/FaultTolerance.html.md.erb b/admin/FaultTolerance.html.md.erb new file mode 100644 index 0000000..fc9de93 --- /dev/null +++ b/admin/FaultTolerance.html.md.erb @@ -0,0 +1,52 @@ +--- +title: Understanding the Fault Tolerance Service +--- + +The fault tolerance service (FTS) enables HAWQ to continue operating in the event that a segment node fails. The fault tolerance service runs automatically and requires no additional configuration requirements. + +Each segment runs a resource manager process that periodically sends (by default, every 30 seconds) the segmentâs status to the master's resource manager process. This interval is controlled by the `hawq_rm_segment_heartbeat_interval` server configuration parameter. + +When a segment encounters a critical error-- for example, a temporary directory on the segment fails due to a hardware error-- the segment reports that there is temporary directory failure to the HAWQ master through a heartbeat report. When the master receives the report, it marks the segment as DOWN in the `gp_segment_configuration` table. All changes to a segment's status are recorded in the `gp_configuration_history` catalog table, including the reason why the segment is marked as DOWN. When this segment is set to DOWN, master will not run query executors on the segment. The failed segment is fault-isolated from the rest of the cluster. + +Besides disk failure, there are other reasons why a segment can be marked as DOWN. For example, if HAWQ is running in YARN mode, every segment should have a NodeManager (Hadoopâs YARN service) running on it, so that the segment can be considered a resource to HAWQ. However, if the NodeManager on a segment is not operating properly, this segment will also be marked as DOWN in `gp_segment_configuration table`. The corresponding reason for the failure is recorded into `gp_configuration_history`. + +**Note:** If a disk fails in a particular segment, the failure may cause either an HDFS error or a temporary directory error in HAWQ. HDFS errors are handled by the Hadoop HDFS service. + +##Viewing the Current Status of a Segment <a id="view_segment_status"></a> + +To view the current status of the segment, query the `gp_segment_configuration` table. + +If the status of a segment is DOWN, the "description" column displays the reason. The reason can include any of the following reasons, as single reasons or as a combination of several reasons, split by a semicolon (";"). + +**Reason: heartbeat timeout** + +Master has not received a heartbeat from the segment. If you see this reason, make sure that HAWQ is running on the segment. + +If the segment reports a heartbeat at a later time, the segment is marked as UP. + +**Reason: failed probing segment** + +Master has probed the segment to verify that it is operating normally, and the segment response is NO. + +While a HAWQ instance is running, the Query Dispatcher finds that some Query Executors on the segment are not working normally. The resource manager process on master sends a message to this segment. When the segment resource manager receives the message from master, it checks whether its PostgreSQL postmaster process is working normally and sends a reply message to master. When master gets a reply message that indicates that this segment's postmaster process is not working normally, then the master marks the segment as DOWN with the reason "failed probing segment." + +Check the logs of the failed segment and try to restart the HAWQ instance. + +**Reason: communication error** + +Master cannot connect to the segment. + +Check the network connection between the master and the segment. + +**Reason: resource manager process was reset** + +If the timestamp of the segment resource manager process doesnât match the previous timestamp, it means that the resource manager process on segment has been restarted. In this case, HAWQ master needs to return the resources on this segment and marks the segment as DOWN. If the master receives a new heartbeat from this segment, it will mark it back to UP. + +**Reason: no global node report** + +HAWQ is using YARN for resource management. No cluster report has been received for this segment. + +Check that NodeManager is operating normally on this segment. + +If not, try to start NodeManager on the segment. +After NodeManager is started, run `yarn node --list` to see if the node is in list. If so, this segment is set to UP.
