This is an automated email from the ASF dual-hosted git repository. tomshawn pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/cloudberry-site.git
The following commit(s) were added to refs/heads/main by this push: new 19af3a49b 2.0 docs: add guc doc and a system view doc (#274) 19af3a49b is described below commit 19af3a49b7cbdf3f78c3437d73f2e7c2a2bc0e8f Author: TomShawn <41534398+tomsh...@users.noreply.github.com> AuthorDate: Mon May 26 17:10:27 2025 +0800 2.0 docs: add guc doc and a system view doc (#274) * 2.0 docs: add guc doc and a system view doc * add sidebar * fix * fix * fix * refine format --- docs/config-params-guc-list.md | 470 +++++++++++++++++++++ .../sys-views/pg-stat-progress-create-index.md | 85 +++- .../zh/docusaurus-plugin-content-docs/current.json | 4 + .../current/config-params-guc-list.md | 469 ++++++++++++++++++++ .../sys-views/pg-stat-progress-create-index.md | 81 +++- .../version-2.x.json | 4 + .../version-2.x/config-params-guc-list.md | 469 ++++++++++++++++++++ .../sys-views/pg-stat-progress-create-index.md | 81 +++- sidebars.ts | 2 +- .../version-2.x/config-params-guc-list.md | 470 +++++++++++++++++++++ .../sys-views/pg-stat-progress-create-index.md | 85 +++- versioned_sidebars/version-2.x-sidebars.json | 2 +- 12 files changed, 2132 insertions(+), 90 deletions(-) diff --git a/docs/config-params-guc-list.md b/docs/config-params-guc-list.md new file mode 100644 index 000000000..f4a00acde --- /dev/null +++ b/docs/config-params-guc-list.md @@ -0,0 +1,470 @@ +--- +title: Configuration Parameters +--- + +# Configuration Parameters + +This document lists the configuration parameters (GUC) of Apache Cloudberry database in alphabetical order. + +## autovacuum_freeze_max_age + +- Variable Type: Integer +- Default Value: 200000000 +- Value Range: [100000,2000000000] +- Setting Category: postmaster +- Description: Sets the "maximum age" of transaction IDs in a table. When the number of transactions accumulated since the transaction ID was allocated reaches this value, the system automatically performs autovacuum on the table to prevent transaction ID wraparound. This operation is enforced even if autovacuum is disabled to ensure data safety. + +## autovacuum_vacuum_cost_delay + +- Variable Type: Real +- Default Value: 2 +- Unit: ms +- Value Range: [-1,100] +- Setting Category: sighup +- Description: Sets the vacuum cost delay time (in milliseconds) for autovacuum operations. + +## autovacuum_vacuum_scale_factor + +- Variable Type: Real +- Default Value: 0.2 +- Value Range: [0,100] +- Setting Category: sighup +- Description: Controls the threshold ratio of updated or deleted tuples to total tuples before autovacuum is performed. + +## autovacuum_vacuum_threshold + +- Variable Type: Integer +- Default Value: 50 +- Value Range: [0,2147483647] +- Setting Category: sighup +- Description: Controls the minimum number of updated or deleted tuples required to trigger autovacuum. + +## checkpoint_timeout + +- Value Range: 30 - 86400 (integer, in seconds) +- Default Value: 300 (5 minutes) +- Setting Category: local, system, reload +- Description: Specifies the maximum time interval between automatic WAL checkpoints. + + If no unit is specified when setting this parameter, the system defaults to seconds. The allowed range is 30 seconds to 1 day. The default value is 5 minutes (300 seconds or 5min). Increasing this parameter's value will increase the time required for crash recovery. + +## gp_appendonly_compaction_segfile_limit + +- Variable Type: Integer +- Default Value: 10 +- Value Range: [0,127] +- Setting Category: user +- Description: Sets the minimum number of append-only segfiles that must be reserved for insert operations. + +## gp_autostats_lock_wait + +- Variable Type: Bool +- Default Value: off +- Setting Category: user +- Description: Controls whether autostats automatically generated `ANALYZE` waits for lock acquisition. + +## gp_command_count + +- Variable Type: Integer +- Default Value: 0 +- Value Range: [0,2147483647] +- Setting Category: internal +- Description: Displays the number of commands sent by the client in the current session. + +## gp_dynamic_partition_pruning + +- Parameter Type: Boolean +- Default Value: on +- Setting Category: coordinator, session, reload +- Description: Enables execution plans that can dynamically eliminate partition scans. + +## gp_enable_runtime_filter_pushdown + +- Value Range: Boolean +- Default Value: off +- Setting Category: user +- Description: Attempts to push down the hash table of hash join as a bloom filter to sequential scan or access methods (AM). + +## gp_enable_statement_trigger + +- Variable Type: Bool +- Default Value: off +- Setting Category: user +- Description: Allows creation of statement-level triggers. + +## gp_max_partition_level + +- Variable Type: Integer +- Default Value: 0 +- Value Range: [0,2147483647] +- Setting Category: superuser +- Description: Sets the maximum allowed partition level when creating partitioned tables using classic syntax. + +## gp_resource_manager + +- Value Range: none, group, group-v2, queue +- Default Value: none +- Setting Category: local, system, restart +- Description: Specifies the resource management scheme currently enabled in the Apache Cloudberry database cluster. + - `none`: No resource manager is used (default). + - `group`: Uses resource groups and enables resource group behavior based on Linux cgroup v1 functionality. + - `group-v2`: Uses resource groups and enables resource group behavior based on Linux cgroup v2 functionality. + - `queue`: Uses resource queues for resource management. + +## gp_role + +- Value Range: dispatch, execute, utility +- Default Value: Undefined (depends on process type) +- Setting Category: read only (automatically set in background) +- Description: This parameter is used to identify the role of the current server process. + + The role of the Coordinator process is `dispatch`, indicating it is responsible for query dispatch. The role of the Segment process is `execute`, indicating it is responsible for executing query plans. `utility` is used for special maintenance or management sessions. This parameter is automatically set by the system in the background and is mainly used to identify different types of internal worker processes. + +## gp_session_id + +- Variable Type: Integer +- Default Value: -1 +- Value Range: [-2147483648,2147483647] +- Setting Category: backend +- Description: Used to uniquely identify a session in the Apache Cloudberry cluster. + +## krb_server_keyfile + +- Variable Type: string +- Default Value: `<FILE:/workspace/dist/database/etc/postgresql/krb5.keytab>` +- Setting Category: sighup +- Description: Sets the location of the Kerberos server key file. + +## log_checkpoints + +- Value Range: Boolean +- Default Value: on +- Setting Category: local, system, reload +- Description: Writes checkpoint and restartpoint information to the server log. The log messages include statistics such as the number of buffers written and the time taken to write them. + +## max_connections + +- Value Range: 10 - 262143 +- Default Value: 250 on Coordinator, 750 on Segment +- Setting Category: local, system, restart +- Description: The maximum number of concurrent connections allowed by the database server. + + In the Apache Cloudberry system, client connections enter only through the Coordinator instance. Segment instances should allow 3 to 10 times the number of connections as the Coordinator. When increasing this parameter's value, the value of `max_prepared_transactions` must be increased accordingly. + + The larger this parameter value, the more shared memory Apache Cloudberry requires. + +## max_replication_slots + +- Variable Type: Integer +- Default Value: 10 +- Value Range: [0,262143] +- Setting Category: postmaster +- Description: Sets the maximum number of replication slots that can be defined simultaneously. + +## optimizer_array_constraints + +- Variable Type: Bool +- Default Value: on +- Setting Category: user +- Description: Allows the optimizer's constraint derivation framework to recognize array-type constraints. + +## optimizer_array_expansion_threshold + +- Value Range: Integer greater than `0` +- Default Value: 20 +- Setting Category: coordinator, session, reload +- Description: When GPORCA is enabled (default) and executing queries containing constant array predicates, the `optimizer_array_expansion_threshold` parameter limits the optimization process based on the number of constants in the array. + + If the number of array elements in the query predicate exceeds the value specified by this parameter, GPORCA will not convert the predicate to disjunctive normal form during query optimization, thereby reducing optimization time and memory consumption. For example, when GPORCA processes a query with an `IN` clause containing more than 20 elements, it will not convert the clause to disjunctive normal form for optimization performance. This behavioral difference can be observed in the [...] + + Modifying this parameter's value affects the trade-off between optimization time and memory usage, as well as optimization benefits from constraint derivation, such as conflict detection and partition pruning. This parameter can be set at the database system level, individual database level, or session and query level. + +## optimizer_cost_model + +- Variable Type: Enum +- Default Value: calibrated +- Setting Category: user +- Description: Sets the cost model used by the optimizer. + +## optimizer_cost_threshold + +- Variable Type: Real +- Default Value: 0 +- Value Range: [0,2.15E+09] +- Setting Category: user +- Description: Sets the sampling threshold related to the optimal execution plan cost, where 0 means no upper limit. + +## optimizer_cte_inlining_bound + +- Variable Type: Integer +- Default Value: 0 +- Value Range: [0,2147483647] +- Setting Category: user +- Description: Sets the size boundary for the optimizer to decide whether to inline CTEs (Common Table Expressions). + +## optimizer_damping_factor_filter + +- Variable Type: Real +- Default Value: 0.75 +- Value Range: [0,1] +- Setting Category: user +- Description: Sets the damping factor used for selection predicates in the optimizer, where `1.0` means no damping. + +## optimizer_damping_factor_groupby + +- Variable Type: Real +- Default Value: 0.75 +- Value Range: [0,1] +- Setting Category: user +- Description: Sets the damping factor for `group by` operations in the optimizer, where `1.0` means no damping. + +## optimizer_damping_factor_join + +- Variable Type: Real +- Default Value: 0 +- Value Range: [0,1] +- Setting Category: user +- Description: Sets the damping factor for join predicates in the optimizer, where `1.0` means no damping and `0.0` means using square root damping. + +## optimizer_discard_redistribute_hashjoin + +- Variable Type: Bool +- Default Value: off +- Setting Category: user +- Description: Controls whether the optimizer discards hash join plans containing redistribute operations. + +## optimizer_dpe_stats + +- Variable Type: Bool +- Default Value: on +- Setting Category: user +- Description: Enables statistics derivation for partitioned tables in dynamic partition elimination scenarios. + +## optimizer_enable_derive_stats_all_groups + +- Variable Type: Bool +- Default Value: off +- Setting Category: user +- Description: Enables statistics derivation for all groups after completing search space exploration. + +## optimizer_enable_dynamicbitmapscan + +- Value Range: Boolean +- Default Value: on +- Setting Category: user +- Description: When enabled, the optimizer uses dynamic bitmap scan plans. + + If this parameter is set to `off`, GPORCA will not generate dynamic bitmap scan plans and will fall back to using dynamic sequential scan as an alternative. + +## optimizer_enable_dynamicindexonlyscan + +- Parameter Type: Boolean +- Default Value: on +- Setting Category: coordinator, session, reload +- Description: When GPORCA is enabled (default), the `optimizer_enable_dynamicindexonlyscan` parameter controls whether to generate dynamic index-only scan plans. + + The default value is `on`. When planning queries on partitioned tables, if the query does not contain single-row volatile (SIRV) functions, GPORCA may generate dynamic index-only scans as an alternative. If set to `off`, GPORCA will not generate dynamic index-only scan plans. This parameter can be set at the database system level, individual database level, or session and query level. + +## optimizer_enable_dynamicindexscan + +- Value Range: Boolean +- Default Value: on +- Setting Category: user +- Description: This parameter controls whether to enable dynamic index scan in query plans. + + When enabled, the optimizer uses dynamic index scan plans. If this parameter is set to `off`, GPORCA will not generate dynamic index scan plans and will fall back to using dynamic sequential scan as an alternative. + +## optimizer_enable_foreign_table + +- Parameter Type: Boolean +- Default Value: true +- Setting Category: coordinator, session, reload +- Description: When GPORCA is enabled (default) and this parameter is set to `true` (default), GPORCA generates execution plans for queries involving foreign tables. + + If set to `false`, queries containing foreign tables will fall back to being planned by the PostgreSQL-based optimizer. + +## optimizer_enable_indexonlyscan + +- Parameter Type: Boolean +- Default Value: true +- Setting Category: coordinator, session, reload +- Description: When GPORCA is enabled (default) and this parameter is set to `true` (default), GPORCA can generate index-only scan plans for B-tree indexes and any type of index that contains all columns in the query. (GiST indexes only support index-only scans for certain operator classes.) + + GPORCA only accesses values in the index and not the actual data blocks of the table. This can improve query execution performance, especially when the table has been vacuumed, contains wide columns, and all visible columns are already in the index, eliminating the need to read any data blocks. If this parameter is set to `false`, GPORCA will not generate index-only scan plans. This parameter can be set at the database system level, individual database level, or session and query level. + +## optimizer_enable_orderedagg + +- Parameter Type: Boolean +- Default Value: on +- Setting Category: coordinator, session, reload +- Description: When GPORCA is enabled (default), this parameter controls whether to generate query plans for ordered aggregates. + + When set to `on` (default), GPORCA generates execution plans for queries containing ordered aggregates. When set to `off`, such queries will fall back to being planned by the PostgreSQL-based optimizer. + + This parameter can be set at the database system level, individual database level, or session and query level. + +## optimizer_enable_push_join_below_union_all + +- Parameter Type: Boolean +- Default Value: off +- Setting Category: coordinator, session, reload +- Description: When GPORCA is enabled (default), the `optimizer_enable_push_join_below_union_all` parameter controls GPORCA's behavior when encountering queries containing `JOIN UNION ALL`. + + The default value is `off`. GPORCA will not perform any transformation when the query contains `JOIN UNION ALL`. + + If set to `on` and the plan cost meets requirements, GPORCA will transform `JOIN UNION ALL` into multiple subqueries each performing `JOIN` followed by `UNION ALL`. This transformation may improve join performance when the subqueries of `UNION ALL` can benefit from join operations (but don't qualify in the original plan). + + For example, in scenarios where indexed nested loop joins are highly efficient, such as when the inner table is large with an index and the outer table is small, or when multiple large tables with indexes are `UNION ALL`'ed with a small table, this transformation can push join conditions down as index conditions, potentially performing better than using hash joins. + + Enabling this transformation may increase query planning time, so it's recommended to use `EXPLAIN` to analyze query execution plans with this parameter both enabled and disabled. This parameter can be set at the database system level, individual database level, or session and query level. + +## optimizer_enable_query_parameter + +- Variable Type: Bool +- Default Value: on +- Setting Category: user +- Description: Allows the GPORCA optimizer to handle query parameters. + +## optimizer_enable_right_outer_join + +- Parameter Type: Boolean +- Default Value: on +- Setting Category: coordinator, session, reload +- Description: When GPORCA is enabled (default), this parameter controls whether GPORCA generates right outer joins. + + If set to the default value `on`, GPORCA can either directly generate right outer joins or convert left outer joins to right outer joins (when the optimizer deems it appropriate). If set to `off`, GPORCA will convert incoming right outer joins to equivalent left outer joins and completely avoid generating any right outer joins. + + If you encounter performance issues with queries using right outer joins, you can disable right outer joins by setting this parameter to `off`. + + This parameter can be set at the database system level, individual database level, or session and query level. However, it's more recommended to control it at the query level, as right outer joins may be more appropriate query plan choices in certain scenarios. + +## optimizer_force_three_stage_scalar_dqa + +- Variable Type: Bool +- Default Value: on +- Setting Category: user +- Description: Forces the optimizer to always choose three-stage aggregation plans for scalar distinct qualified aggregates. + +## optimizer_nestloop_factor + +- Variable Type: Real +- Default Value: 1024 +- Value Range: [1,1.79769e+308] +- Setting Category: user +- Description: Sets the cost factor for nested loop joins in the optimizer. + +## optimizer_penalize_broadcast_threshold + +- Parameter Type: Integer +- Value Range: `[0,2147483647]` +- Default Value: 100000 +- Setting Category: user +- Description: Specifies the maximum number of relation rows that can be broadcast without penalty. + + If the number of broadcast rows exceeds this threshold, the optimizer will increase its execution cost to reduce the likelihood of selecting that plan. + + Setting this parameter to `0` disables the penalty mechanism, meaning no penalty is applied to any broadcast. + +## optimizer_push_group_by_below_setop_threshold + +- Variable Type: Integer +- Default Value: 10 +- Value Range: [0,2147483647] +- Setting Category: user +- Description: Sets the maximum number of child nodes to attempt pushing `GROUP BY` operations below SetOp nodes. + +## optimizer_replicated_table_insert + +- Variable Type: Bool +- Default Value: on +- Setting Category: user +- Description: Omits broadcast operations when inserting data into replicated tables. + +## optimizer_skew_factor + +- Variable Type: Integer +- Default Value: 0 +- Value Range: [0,100] +- Setting Category: user +- Description: Sets the source and weight of the skew factor. `0` means disabling skew derivation based on sample statistics, `1`–`100` means enabling and calculating skew ratio based on samples; the actual skew used for cost estimation is the product of this parameter and the skew ratio. + +## optimizer_sort_factor + +- Variable Type: Real +- Default Value: 1 +- Value Range: [0,1.79769e+308] +- Setting Category: user +- Description: Sets the cost factor for sort operations in the optimizer; `1.0` means the same as default cost, greater than 1 means higher sort cost, less than 1 means lower cost. + +## optimizer_trace_fallback + +- Variable Type: Bool +- Default Value: off +- Setting Category: user +- Description: Prints information at the `INFO` log level when GPORCA falls back to using the traditional optimizer. + +## optimizer_use_gpdb_allocators + +- Variable Type: Bool +- Default Value: on +- Setting Category: postmaster +- Description: Allows the GPORCA optimizer to use the database's memory context management mechanism (Memory Contexts). + +## optimizer_xform_bind_threshold + +- Variable Type: Integer +- Default Value: 0 +- Value Range: [0,2147483647] +- Setting Category: user +- Description: Limits the maximum number of bindings that can be generated for each transformation rule (xform) on each group expression. Setting to `0` means this limit is not enabled. + +## superuser_reserved_connections + +- Value Range: Integer less than `max_connections` +- Default Value: 10 +- Setting Category: local, system, restart +- Description: Specifies the number of connection slots reserved for Apache Cloudberry database superusers. + +## track_io_timing + +- Parameter Type: Boolean +- Default Value: off +- Setting Category: superuser +- Description: Used to collect timing statistics for database I/O activities. When enabled, the system records I/O operation durations during statement execution, which is useful for performance analysis and bottleneck identification. This parameter is off by default and can only be set by superusers. + +## wal_compression + +- Variable Type: Bool +- Default Value: on +- Setting Category: superuser +- Description: Enables compression of full page writes in WAL files. + +## wal_keep_size + +- Parameter Type: integer +- Value Range: 0 - 2147483647 (in MB) +- Default Value: 320 +- Setting Category: sighup +- Description: Specifies the maximum size of WAL files to retain for standby servers. + +## work_mem + +- Value Range: Integer in kilobytes +- Default Value: 32MB +- Setting Category: coordinator, session, reload +- Description: Specifies the maximum amount of memory that can be used by each query operation (such as sort or hash table) before writing to temporary disk files. If no unit is specified for this parameter, it defaults to kilobytes. The default value is 32MB. + + In complex queries, multiple sort or hash operations may be executed in parallel, each of which can use the amount of memory specified by `work_mem` before writing to temporary files. Additionally, multiple sessions may perform these operations simultaneously, so the total memory consumption may be much higher than the value of `work_mem` itself. This should be taken into special consideration when choosing the value for this parameter. + + `work_mem` affects these operations: sort operations for `ORDER BY`, `DISTINCT`, and merge joins; hash tables for hash joins, hash aggregates, and processing `IN` subqueries; bitmap index scans; and tuple store-based operations such as function scans, CTEs, PL/pgSQL, and management UDFs. + + In addition to allocating memory for specific execution operators, `work_mem` also affects the PostgreSQL optimizer's preference for certain query plans. Note that `work_mem` is independent of the resource queue and resource group memory management mechanisms. It takes effect at the query level and is not affected by resource queue or resource group memory limits. + +## writable_external_table_bufsize + +- Variable Type: Integer +- Default Value: 1024 +- Unit: kB +- Value Range: [32,131072] +- Setting Category: user +- Description: Sets the buffer size (in kB) for writing to gpfdist before writing to writable external tables. diff --git a/docs/sys-catalogs/sys-views/pg-stat-progress-create-index.md b/docs/sys-catalogs/sys-views/pg-stat-progress-create-index.md index dc8b3e40a..89064f873 100644 --- a/docs/sys-catalogs/sys-views/pg-stat-progress-create-index.md +++ b/docs/sys-catalogs/sys-views/pg-stat-progress-create-index.md @@ -2,26 +2,65 @@ title: pg_stat_progress_create_index --- -# pg_stat_progress_create_index - -The `pg_stat_progress_create_index` view provides real-time information about the progress of `CREATE INDEX` and `REINDEX` operations. Each row represents a backend process currently building an index and includes details such as the command type, current phase, number of blocks and tuples processed, and partitioning information. This view is useful for monitoring and managing long-running index creation tasks. - -|Column|Type|Description| -|------|----|-----------| -|`gp_segment_id`|integer| Unique identifier of a segment (or coordinator) instance. (This column is not present in the `gp_stat_progress_create_index_summary` view.)| -| `pid` | integer | Process identifier of the backend, or the coordinator process identifier if the `gp_stat_progress_create_index_summary` view. | -| `datid` | oid | The object identifer of the database to which this backend is connected. | -| `datname` | name | Name of the database to which this backend is connected. | -| `relid` | oid | The object identifer of the table on which the index is being created. | -| `index_relid` | oid | The object identifer of the index being created or reindexed. Because Apache Cloudberry does not support concurrent (re)indexing, this value is always `0`. | -| `command` | text | The name of the command that is running: `CREATE INDEX` or `REINDEX`. | -| `phase` | text | Current processing phase of index creation. | -| `lockers_total` | bigint | Total number of lockers to wait for, when applicable. | -| `lockers_done` | bigint | Number of lockers already waited for. | -| `current_locker_pid` | bigint | The process identifier of the locker currently being waited for. | -| `blocks_total` | bigint | Total number of blocks to be processed in the current phase. | -| `blocks_done` | bigint | Number of blocks already processed in the current phase. | -| `tuples_total` | bigint | Total number of tuples to be processed in the current phase. | -| `tuples_done` | bigint | Number of tuples already processed in the current phase. | -| `partitions_total` | bigint | When creating an index on a partitioned table, this column is set to the total number of partitions on which the index is to be created. This field is 0 during a `REINDEX`. | -| `partitions_done` | bigint | When creating an index on a partitioned table, this column is set to the number of partitions on which the index has been completed. This field is 0 during a `REINDEX`. | +## pg_stat_progress_create_index + +`pg_stat_progress_create_index` is a system view that shows real-time progress for ongoing `CREATE INDEX` or `REINDEX` operations in the current database. + +This view displays detailed status information for each backend process building an index, including the current execution phase, the number of tuples processed, blocks scanned, and more. Once the operation completes, the corresponding entries are automatically removed from the view. + +In Apache Cloudberry, this view also supports AO (Append-Optimized) tables and can be used to observe the phase-wise progress of index creation on such tables. + +Typical use cases include: + +- Monitoring index creation or rebuild operations in real time. +- Analyzing performance bottlenecks of long-running `CREATE INDEX` or `REINDEX` commands. +- Checking if any index operations are currently consuming system resources. +- Correlating with `pg_stat_activity` to trace backend process details. + +Example queries: + +```sql +-- Views all ongoing index creation tasks +SELECT * FROM pg_stat_progress_create_index; + +-- Views index progress for a specific table +SELECT * FROM pg_stat_progress_create_index +WHERE relid = 'ao_test'::regclass; +``` + +### Field descriptions + +| Field | Description | +|-----------------------|-------------| +| `gp_segment_id` | ID of the segment where this entry resides. Only applicable in a distributed environment. | +| `pid` | Process ID of the backend. Can be joined with `pg_stat_activity` for session details. | +| `datid` | OID of the database, corresponding to `pg_database.oid`. | +| `datname` | Name of the database. | +| `relid` | OID of the table being indexed, corresponding to `pg_class.oid`. | +| `index_relid` | OID of the index being built. | +| `command` | Command being executed: `CREATE INDEX` or `REINDEX`. | +| `phase` | Current phase of execution, for example: <br /> `initializing`: Initialization phase <br /> `scanning heap`: Scanning table data <br /> `sorting`: Sorting phase <br /> `building index: loading tuples in tree`: Building index structure <br /> `waiting for locks`: Waiting for table or metadata locks | +| `lockers_total` | Total number of sessions holding conflicting locks (if any). | +| `lockers_done` | Number of sessions that have released their locks. | +| `current_locker_pid` | Process ID of the session currently holding the lock (if waiting). | +| `blocks_total` | Total number of data blocks to scan (may be 0 if unknown or not started). | +| `blocks_done` | Number of data blocks already scanned. | +| `tuples_total` | Estimated total number of tuples to process (if available). | +| `tuples_done` | Number of tuples already processed. | +| `partitions_total` | Total number of partitions (if applicable). | +| `partitions_done` | Number of partitions already processed (if applicable). | + +:::note +- This view only shows currently running index operations. Entries disappear once the operation completes. +- For small tables, index creation may complete instantly, and the view might return no rows. +- To better observe progress, try creating an index on a large table or use complex columns (for example, large text) to slow down execution. +- Progress is also reported when building indexes on AO (Append-Optimized) tables. +- You can join this view with ``pg_stat_activity`` using the ``pid`` field. + + ```sql + SELECT a.usename, a.query, p.phase, p.blocks_done, p.blocks_total + FROM pg_stat_activity a + JOIN pg_stat_progress_create_index p ON a.pid = p.pid; + ``` + +::: diff --git a/i18n/zh/docusaurus-plugin-content-docs/current.json b/i18n/zh/docusaurus-plugin-content-docs/current.json index e9fdbfbcf..4b51df14b 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current.json +++ b/i18n/zh/docusaurus-plugin-content-docs/current.json @@ -94,5 +94,9 @@ "sidebar.docsbars.category.GPORCA Query Optimizer": { "message": "GPORCA 查询优化器", "description": "The label for category GPORCA query optimizer in sidebar docs" + }, + "sidebar.docsbars.category.System Views": { + "message": "系统视图", + "description": "The label for category system views in sidebar docs" } } diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/config-params-guc-list.md b/i18n/zh/docusaurus-plugin-content-docs/current/config-params-guc-list.md new file mode 100644 index 000000000..31f4f73e0 --- /dev/null +++ b/i18n/zh/docusaurus-plugin-content-docs/current/config-params-guc-list.md @@ -0,0 +1,469 @@ +--- +title: 配置参数 +--- + +# 配置参数 + +本文档按照字母顺序列出 Apache Cloudberry 数据库的配置参数 (GUC)。 + +## autovacuum_freeze_max_age + +- 变量类型:Integer +- 默认值:200000000 +- 值范围:[100000,2000000000] +- 设置分类:postmaster +- 描述:设置表中事务 ID 的“最大使用年龄”。当事务 ID 自分配以来累计经过的事务数达到该值时,系统会自动对表执行 autovacuum,以防止事务 ID 回卷。即使关闭了 autovacuum,系统也会强制执行此操作以保障数据安全。 + +## autovacuum_vacuum_cost_delay + +- 变量类型:Real +- 默认值:2 +- 单位:ms +- 值范围:[-1,100] +- 设置分类:sighup +- 描述:设置 autovacuum 操作中 vacuum 成本延迟的时间(单位:毫秒)。 + +## autovacuum_vacuum_scale_factor + +- 变量类型:Real +- 默认值:0.2 +- 值范围:[0,100] +- 设置分类:sighup +- 描述:控制在执行 autovacuum 之前,表中已更新或删除元组数量相对于总元组数的比例阈值。 + +## autovacuum_vacuum_threshold + +- 变量类型:Integer +- 默认值:50 +- 值范围:[0,2147483647] +- 设置分类:sighup +- 描述:控制触发 autovacuum 所需的最小更新或删除元组数量。 + +## checkpoint_timeout + +- 值范围:30 - 86400(整数,单位为秒) +- 默认值:300(5 分钟) +- 设置分类:local, system, reload +- 描述:指定自动 WAL 检查点之间的最大时间间隔。 + + 如果设置此参数时未指定单位,系统会默认按秒解析。允许的范围是 30 秒到 1 天。默认值为 5 分钟(即 300 秒或 5min)。增大此参数的值会增加崩溃恢复所需的时间。 + +## gp_appendonly_compaction_segfile_limit + +- 变量类型:Integer +- 默认值:10 +- 值范围:[0,127] +- 设置分类:user +- 描述:设置插入操作时必须预留的 append-only segfile 最小数量。 + +## gp_autostats_lock_wait + +- 变量类型:Bool +- 默认值:off +- 设置分类:user +- 描述:控制 autostats 自动生成的 `ANALYZE` 是否等待锁获取。 + +## gp_command_count + +- 变量类型:Integer +- 默认值:0 +- 值范围:[0,2147483647] +- 设置分类:internal +- 描述:显示当前会话中客户端已发送的命令数量。 + +## gp_dynamic_partition_pruning + +- 参数类型:Boolean +- 默认值:on +- 设置分类:coordinator, session, reload +- 描述:启用可动态消除分区扫描的执行计划。 + +## gp_enable_runtime_filter_pushdown + +- 值范围:Boolean +- 默认值:off +- 设置分类:user +- 描述:尝试将 hash join 的哈希表作为 bloom filter 下推到顺序扫描或访问方法(AM)。 + +## gp_enable_statement_trigger + +- 变量类型:Bool +- 默认值:off +- 设置分类:user +- 描述:允许创建语句级触发器。 + +## gp_max_partition_level + +- 变量类型:Integer +- 默认值:0 +- 值范围:[0,2147483647] +- 设置分类:superuser +- 描述:设置使用经典语法创建分区表时允许的最大分区层级数。 + +## gp_resource_manager + +- 值范围:none, group, group-v2, queue +- 默认值:none +- 设置分类:local, system, restart +- 描述:指定当前在 Apache Cloudberry 数据库集群中启用的资源管理方案。 + - `none`:不使用任何资源管理器(默认值)。 + - `group`:使用资源组,并基于 Linux cgroup v1 功能启用资源组行为。 + - `group-v2`:使用资源组,并基于 Linux cgroup v2 功能启用资源组行为。 + - `queue`:使用资源队列进行资源管理。 + +## gp_role + +- 值范围:dispatch, execute, utility +- 默认值:未定义(取决于进程类型) +- 设置分类:read only(仅在后台自动设置) +- 描述:该参数用于标识当前服务器进程的角色。 +- Coordinator 进程的角色为 `dispatch`,表示它负责查询调度。Segment 进程的角色为 `execute`,表示它负责执行查询计划。`utility` 用于一些特殊的维护或管理会话。该参数在后台由系统自动设置,主要用于标识不同类型的内部工作进程。 + +## gp_session_id + +- 变量类型:Integer +- 默认值:-1 +- 值范围:[-2147483648,2147483647] +- 设置分类:backend +- 描述:在 Apache Cloudberry 集群中用于唯一标识某个会话的全局 ID。 + +## krb_server_keyfile + +- 变量类型:string +- 默认值:`FILE:/workspace/dist/database/etc/postgresql/krb5.keytab` +- 设置分类:sighup +- 描述:设置 Kerberos 服务器密钥文件的位置。 + +## log_checkpoints + +- 值范围:Boolean +- 默认值:on +- 设置分类:local, system, reload +- 描述:将检查点 (checkpoint) 和重启点 (restartpoint) 写入服务器日志。日志消息中包含一些统计信息,例如写入的缓冲区数量和写入所花费的时间。 + +## max_connections + +- 值范围:10 - 262143 +- 默认值:Coordinator 上为 250,Segment 上为 750 +- 设置分类:local, system, restart +- 描述:数据库服务器允许的最大并发连接数。 + + 在 Apache Cloudberry 系统中,客户端连接仅通过 Coordinator 实例进入。Segment 实例应允许为 Coordinator 的 3 到 10 倍的连接数。增加此参数的值时,必须相应地增加 `max_prepared_transactions` 的值。 + + 此参数值越大,Apache Cloudberry 需要的共享内存也越多。 + +## max_replication_slots + +- 变量类型:Integer +- 默认值:10 +- 值范围:[0,262143] +- 设置分类:postmaster +- 描述:设置可同时定义的复制槽的最大数量。 + +## optimizer_array_constraints + +- 变量类型:Bool +- 默认值:on +- 设置分类:user +- 描述:允许优化器的约束推导框架识别数组类型的约束条件。 + +## optimizer_array_expansion_threshold + +- 值范围:大于 `0` 的整数 +- 默认值:20 +- 设置分类:coordinator, session, reload +- 描述:当启用 GPORCA(默认启用)并执行包含常量数组谓词的查询时,`optimizer_array_expansion_threshold` 参数会根据数组中常量的数量限制优化过程。 + + 如果查询谓词中的数组元素数量超过该参数指定的值,GPORCA 在查询优化期间将不会将该谓词转换为析取范式 (disjunctive normal form),从而缩短优化时间并减少内存消耗。例如,当 GPORCA 处理一个 `IN` 子句元素超过 20 个的查询时,为了优化性能,它不会将该子句转换为析取范式。在执行计划中,这种行为差异可以从 `IN` 条件的过滤方式中看出。 + + 修改该参数的值会影响优化时间和内存使用之间的权衡,同时也可能影响到基于约束推导带来的优化收益,例如冲突检测和分区裁剪。此参数可在数据库系统级、单个数据库级,或 session 和 query 级别进行设置。 + +## optimizer_cost_model + +- 变量类型:Enum +- 默认值:calibrated +- 设置分类:user +- 描述:设置优化器使用的成本模型。 + +## optimizer_cost_threshold + +- 变量类型:Real +- 默认值:0 +- 值范围:[0,2147480000] +- 设置分类:user +- 描述:设置与最佳执行计划成本相关的采样阈值,0 表示不设上限。 + +## optimizer_cte_inlining_bound + +- 变量类型:Integer +- 默认值:0 +- 值范围:[0,2147483647] +- 设置分类:user +- 描述:设置优化器决定是否内联 CTE(公用表表达式)的大小界限。 + +## optimizer_damping_factor_filter + +- 变量类型:Real +- 默认值:0.75 +- 值范围:[0,1] +- 设置分类:user +- 描述:设置优化器中用于选择谓词的抑制因子,`1.0` 表示不进行抑制。 + +## optimizer_damping_factor_groupby + +- 变量类型:Real +- 默认值:0.75 +- 值范围:[0,1] +- 设置分类:user +- 描述:设置优化器中 `group by` 操作的抑制因子,`1.0` 表示不进行抑制。 + +## optimizer_damping_factor_join + +- 变量类型:Real +- 默认值:0 +- 值范围:[0,1] +- 设置分类:user +- 描述:设置优化器中连接谓词的抑制因子,`1.0` 表示不抑制,`0.0` 表示使用平方根抑制法。 + +## optimizer_discard_redistribute_hashjoin + +- 变量类型:Bool +- 默认值:off +- 设置分类:user +- 描述:控制优化器是否丢弃包含 redistribute 动作的哈希连接计划。 + +## optimizer_dpe_stats + +- 变量类型:Bool +- 默认值:on +- 设置分类:user +- 描述:启用针对动态分区消除场景下分区表的统计信息推导。 + +## optimizer_enable_derive_stats_all_groups + +- 变量类型:Bool +- 默认值:off +- 设置分类:user +- 描述:在完成搜索空间探索后,启用对所有分组的统计信息推导。 + +## optimizer_enable_dynamicbitmapscan + +- 值范围:Boolean +- 默认值:on +- 设置分类:user +- 描述:启用后,优化器会使用 dynamic bitmap scan(动态位图扫描)计划。 + + 如果将该参数设为 `off`,GPORCA 将不会生成 dynamic bitmap scan 计划,而是退回使用 dynamic sequential scan(动态顺序扫描)作为替代。 + +## optimizer_enable_dynamicindexonlyscan + +- 参数类型:Boolean +- 默认值:on +- 设置分类:coordinator, session, reload +- 描述:当启用 GPORCA(默认启用)时,`optimizer_enable_dynamicindexonlyscan` 参数用于控制是否生成动态仅索引 (dynamic index-only) 扫描计划。 + + 默认值为 `on`,在对分区表进行查询计划时,如果查询中不包含单行易变(SIRV)函数,GPORCA 可能会生成动态仅索引扫描作为替代方案。如果设置为 `off`,GPORCA 不会生成动态仅索引扫描计划。此参数可在数据库系统级、单个数据库级,或在 session 和 query 级别进行设置。 + +## optimizer_enable_dynamicindexscan + +- 值范围:Boolean +- 默认值:on +- 设置分类:user +- 描述:该参数用于控制是否在查询计划中启用动态索引扫描。 + + 启用后,优化器会使用 dynamic index scan(动态索引扫描)计划。如果将该参数设为 `off`,GPORCA 将不会生成 dynamic index scan 计划,而是退回使用 dynamic sequential scan(动态顺序扫描)作为替代。 + +## optimizer_enable_foreign_table + +- 参数类型:Boolean +- 默认值:true +- 设置分类:coordinator, session, reload +- 描述:当启用 GPORCA(默认启用)且该参数设置为 `true`(默认值)时,GPORCA 会为涉及外部表 (foreign table) 的查询生成执行计划。 + + 如果设置为 `false`,包含外部表的查询将回退由基于 PostgreSQL 的优化器生成执行计划。 + +## optimizer_enable_indexonlyscan + +- 参数类型:Boolean +- 默认值:true +- 设置分类:coordinator, session, reload +- 描述:当启用 GPORCA(默认启用)且此参数设置为 `true`(默认值)时,GPORCA 可以为 B-tree 索引和包含查询中所有列的任意类型索引生成仅索引 (index-only) 扫描计划。(GiST 索引仅支持部分操作类的 index-only 扫描。) + + GPORCA 只访问索引中的值,不访问表的实际数据块。这可以提高查询执行性能,尤其是在表经过 vacuum、包含宽列,且索引中已包含所有可见列的情况下,无需读取任何数据块。如果将此参数设置为 `false`,GPORCA 将不会生成 index-only 扫描计划。此参数可在数据库系统级、单个数据库级,或在 session 和 query 级别进行设置。 + +## optimizer_enable_orderedagg + +- 参数类型:Boolean +- 默认值:on +- 设置分类:coordinator, session, reload +- 描述:当启用 GPORCA(默认启用)时,此参数用于控制是否为有序聚合 (ordered aggregates) 生成查询计划。 + + 当设置为 `on`(默认值)时,GPORCA 会为包含有序聚合的查询生成执行计划。当设置为 `off` 时,该类查询将回退由基于 PostgreSQL 的优化器进行计划生成。 + + 此参数可设置在数据库系统级、单个数据库级,或 session 和 query 级别。 + +## optimizer_enable_push_join_below_union_all + +- 参数类型:Boolean +- 默认值:off +- 设置分类:coordinator, session, reload +- 描述:当启用 GPORCA(默认启用)时,`optimizer_enable_push_join_below_union_all` 参数用于控制 GPORCA 在遇到包含 `JOIN UNION ALL` 的查询时的行为。 + + 默认值为 `off`,GPORCA 在查询中包含 `JOIN UNION ALL` 时不会进行任何变换。 + + 如果设置为 `on` 且计划成本符合要求,GPORCA 会将 `JOIN UNION ALL` 转换为多个子查询各自进行 `JOIN` 的 `UNION ALL`。当 `UNION ALL` 的子查询能够受益于连接操作(但在原始计划中不符合条件)时,该变换可能提升连接性能。 + + 例如,对于一个索引嵌套循环连接 (indexed nested loop join) 非常高效的场景,即内表较大且有索引,而外表较小时;或者多个有索引的大表与一个小表进行 `UNION ALL` 后再连接的情况,此变换可将连接条件下推为索引条件,从而比使用哈希连接性能更好。 + + 启用此变换可能会增加查询规划时间,因此建议使用 `EXPLAIN` 分析开启和关闭此参数时的查询执行计划。此参数可在数据库系统级、单个数据库级,或在 session 和 query 级别进行设置。 + +## optimizer_enable_query_parameter + +- 变量类型:Bool +- 默认值:on +- 设置分类:user +- 描述:允许 GPORCA 优化器理查询参数。 + +## optimizer_enable_right_outer_join + +- 参数类型:Boolean +- 默认值:on +- 设置分类:coordinator, session, reload +- 描述:当启用 GPORCA(默认启用)时,此参数用于控制 GPORCA 是否生成 right outer join(右外连接)。 + + 如果设置为默认值 `on`,GPORCA 既可以直接生成 right outer join,也可以将 left outer join 转换为 right outer join(当优化器认为合适时)。如果设置为 `off`,GPORCA 会将传入的 right outer join 转换为等价的 left outer join,并完全避免生成任何 right outer join。 + + 如果在使用 right outer join 的查询中遇到性能问题,可以通过将此参数设置为 `off` 来禁止使用 right outer join。 + + 此参数可以在数据库系统级、单个数据库级,或 session 和 query 级别进行设置。不过更推荐在查询级别进行控制,因为某些场景下 right outer join 是更合适的查询计划选择。 + +## optimizer_force_three_stage_scalar_dqa + +- 变量类型:Bool +- 默认值:on +- 设置分类:user +- 描述:强制优化器始终为标量形式的去重聚合(Distinct Qualified Aggregate)选择三阶段聚合计划。 + +## optimizer_nestloop_factor + +- 变量类型:Real +- 默认值:1024 +- 值范围:[1,1.79769e+308] +- 设置分类:user +- 描述:设置优化器中嵌套循环连接(NestLoop Join)的成本因子。 + +## optimizer_penalize_broadcast_threshold + +- 参数类型:Integer +- 值范围:`[0,2147483647]` +- 默认值:100000 +- 设置分类:user +- 描述:指定在不受惩罚的情况下,最多可广播的关系(relation)行数。 + + 如果广播的行数超过该阈值,优化器会增加其执行成本以降低选择该计划的可能性。 + + 将该参数设置为 `0` 表示禁用该惩罚机制,即对任何广播都不加惩罚。 + +## optimizer_push_group_by_below_setop_threshold + +- 变量类型:Integer +- 默认值:10 +- 值范围:[0,2147483647] +- 设置分类:user +- 描述:设置在集合操作(SetOp)节点下尝试下推 `GROUP BY` 操作的子节点最大数量。 + +## optimizer_replicated_table_insert + +- 变量类型:Bool +- 默认值:on +- 设置分类:user +- 描述:向复制表插入数据时省略广播操作。 + +## optimizer_skew_factor + +- 变量类型:Integer +- 默认值:0 +- 值范围:[0,100] +- 设置分类:user +- 描述:设置倾斜系数的来源及权重。`0` 表示禁用基于样本统计的倾斜推导,`1`–`100` 表示启用并根据样本计算倾斜比;实际用于成本估算的倾斜度为该参数与倾斜比的乘积。 + +## optimizer_sort_factor + +- 变量类型:Real +- 默认值:1 +- 值范围:[0,1.79769e+308] +- 设置分类:user +- 描述:设置优化器中排序操作的成本因子;`1.0` 表示与默认成本相同,大于 1 表示排序代价更高,小于 1 表示代价更低。 + +## optimizer_trace_fallback + +- 变量类型:Bool +- 默认值:off +- 设置分类:user +- 描述:当 GPORCA 回退使用传统优化器时,在 `INFO` 日志级别打印提示信息。 + +## optimizer_use_gpdb_allocators + +- 变量类型:Bool +- 默认值:on +- 设置分类:postmaster +- 描述:允许 GPORCA 优化器使用数据库的内存上下文管理机制(Memory Contexts)。 + +## optimizer_xform_bind_threshold + +- 变量类型:Integer +- 默认值:0 +- 值范围:[0,2147483647] +- 设置分类:user +- 描述:限制每个转换规则(xform)在每个分组表达式上最多生成的绑定数量。设置为 `0` 表示不启用该限制。 + +## superuser_reserved_connections + +- 值范围:小于 `max_connections` 的整数 +- 默认值:10 +- 设置分类:local, system, restart +- 描述:指定为 Apache Cloudberry 数据库超级用户保留的连接槽数量。 + +## track_io_timing + +- 参数类型:Boolean +- 默认值:off +- 设置分类:superuser +- 描述:用于收集数据库 I/O 活动的时间统计信息。启用此参数后,系统会记录语句执行期间的 I/O 操作耗时,这对于性能分析和瓶颈定位非常有用。该参数默认关闭,仅可由超级用户设置。 + +## wal_compression + +- 变量类型:Bool +- 默认值:on +- 设置分类:superuser +- 描述:启用对 WAL 文件中完整页面写入内容的压缩。 + +## wal_keep_size + +- 参数类型:integer +- 值范围:0 - 2147483647(以 MB 为单位) +- 默认值:320 +- 设置分类:sighup +- 描述:指定为备用服务器 (standby servers) 保留的 WAL 文件大小上限。 + +## work_mem + +- 值范围:以 kilobyte 为单位的整数 +- 默认值:32MB +- 设置分类:coordinator, session, reload +- 描述:指定每个查询操作(例如排序或哈希表)在写入临时磁盘文件前最多可以使用的内存量。如果该参数没有指定单位,则默认为 kilobyte。默认值为 32MB。 + + 在复杂查询中,可能会并行执行多个排序或哈希操作,每个操作在写入临时文件前都可以使用 `work_mem` 指定的内存量。此外,多个会话可能同时进行这些操作,因此总内存消耗可能远高于 `work_mem` 本身的数值。在选择该参数值时需特别注意这一点。 + + `work_mem` 会影响这些操作:用于 `ORDER BY`、`DISTINCT` 和合并连接的排序操作;用于哈希连接、哈希聚合,以及处理 `IN` 子查询的哈希表;位图索引扫描;基于 tuple store 的操作,例如函数扫描、CTE、PL/pgSQL 和管理类 UDF 等。 + + 除了为特定执行算子分配内存外,`work_mem` 还会影响 PostgreSQL 优化器选择某些查询计划的偏好。需要注意的是,`work_mem` 是独立于资源队列和资源组的内存管理机制,它在查询级别生效,不会受到资源队列或资源组内存限制的影响。 + +## writable_external_table_bufsize + +- 变量类型:Integer +- 默认值:1024 +- 单位:kB +- 值范围:[32,131072] +- 设置分类:user +- 描述:为可写外部表写入 gpfdist 之前设置的缓冲区大小(单位:kB)。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/sys-catalogs/sys-views/pg-stat-progress-create-index.md b/i18n/zh/docusaurus-plugin-content-docs/current/sys-catalogs/sys-views/pg-stat-progress-create-index.md index f118bbf8a..d3b66a7ea 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/sys-catalogs/sys-views/pg-stat-progress-create-index.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/sys-catalogs/sys-views/pg-stat-progress-create-index.md @@ -4,24 +4,63 @@ title: pg_stat_progress_create_index # pg_stat_progress_create_index -`pg_stat_progress_create_index` 视图提供了 `CREATE INDEX` 和 `REINDEX` 操作进度的实时信息。每一行代表一个当前正在构建索引的后端进程,包含诸如命令类型、当前阶段、已处理的块数和元组数以及分区信息等详细信息。此视图有助于监控和管理长时间运行的索引创建任务。 - -| 字段 | 类型 | 描述 | -|---|---|---| -| `gp_segment_id` | integer | Segment(或 Coordinator)实例的唯一标识符。(此字段不在 `gp_stat_progress_create_index_summary` 视图中。) | -| `pid` | integer | 后端进程的 ID,如果是在 `gp_stat_progress_create_index_summary` 视图中,则是 Coordinator 进程的 ID。 | -| `datid` | oid | 此后端连接的数据库的对象标识符。 | -| `datname` | name | 此后端连接的数据库名称。 | -| `relid` | oid | 正在创建索引的表的对象标识符。 | -| `index_relid` | oid | 正在创建或重建的索引的对象标识符。由于 Apache Cloudberry 不支持并发(重)索引,此值始终为 `0`。 | -| `command` | text | 正在运行的命令名称:`CREATE INDEX` 或 `REINDEX`。 | -| `phase` | text | 索引创建的当前处理阶段。 | -| `lockers_total` | bigint | 适用的情况下,需要等待的锁总数。 | -| `lockers_done` | bigint | 已经等待的锁数量。 | -| `current_locker_pid` | bigint | 当前正在等待的锁的进程 ID。 | -| `blocks_total` | bigint | 当前阶段要处理的块总数。 | -| `blocks_done` | bigint | 当前阶段已处理的块数。 | -| `tuples_total` | bigint | 当前阶段要处理的元组总数。 | -| `tuples_done` | bigint | 当前阶段已处理的元组数。 | -| `partitions_total` | bigint | 在分区表上创建索引时,此列设置为要创建索引的分区总数。在 `REINDEX` 期间此字段为 0。 | -| `partitions_done` | bigint | 在分区表上创建索引时,此列设置为已完成索引的分区数量。在 `REINDEX` 期间此字段为 0。 | +`pg_stat_progress_create_index` 是一个系统视图,用于显示当前数据库中正在执行的 `CREATE INDEX` 或 `REINDEX` 操作的实时执行进度。 + +该视图展示的是每个正在构建索引的后台进程 (Backend) 的详细状态,包括执行阶段(phase)、处理的元组数量、已扫描的数据块数等信息。在执行完成后,该视图对应记录将被自动清除。 + +在 Apache Cloudberry 中,该视图也适用于 AO 表(Append-Optimized 表),可用于观察其索引构建过程的阶段性信息。 + +该视图的主要用途包括: + +- 实时监控索引创建或重建进程。 +- 分析长时间运行的 `CREATE INDEX`/`REINDEX` 的瓶颈。 +- 判断系统当前是否存在资源占用的索引任务。 +- 配合 `pg_stat_activity` 分析执行索引任务的进程信息。 + +示例如下: + +```sql +-- 查询所有正在执行的索引构建任务 +SELECT * FROM pg_stat_progress_create_index; + +-- 查询指定表上的索引进度 +SELECT * FROM pg_stat_progress_create_index +WHERE relid = 'ao_test'::regclass; +``` + +### 字段说明 + +| 字段名 | 说明 | +|----------------------|------| +| `gp_segment_id` | 当前记录所属的 Segment ID,仅在分布式环境中有效。 | +| `pid` | 后台进程的进程号,可与 `pg_stat_activity` 联合查询获取会话详情。 | +| `datid` | 数据库的 OID,对应 `pg_database.oid`。 | +| `datname` | 数据库名称。 | +| `relid` | 正在被索引的表的 OID,对应 `pg_class.oid`。 | +| `index_relid` | 正在构建的索引对象的 OID。 | +| `command` | 当前执行的命令类型:`CREATE INDEX` 或 `REINDEX`。 | +| `phase` | 当前执行阶段,例如: <br /> `initializing`:初始化阶段 <br /> `scanning heap`:扫描表数据 <br /> `sorting`:排序阶段 <br /> `building index: loading tuples in tree`:构建索引结构 <br /> `waiting for locks`:等待表锁或元数据锁 | +| `lockers_total` | 等待释放锁的会话总数(如有)。 | +| `lockers_done` | 已完成锁释放的会话数量。 | +| `current_locker_pid` | 当前持有锁的进程号(如果正在等待)。 | +| `blocks_total` | 要扫描的总数据块数(可能为 0 表示不可用或未开始)。 | +| `blocks_done` | 已经扫描的数据块数。 | +| `tuples_total` | 预计需要处理的元组总数(如可计算)。 | +| `tuples_done` | 已处理的元组数。 | +| `partitions_total` | (如适用)分区表中总分区数。 | +| `partitions_done` | (如适用)已处理的分区数。 | + +:::note 注意 +- 此视图仅展示正在执行中的索引操作。索引创建完成后,记录会从视图中消失。 +- 对于数据量小的表,索引可能瞬间完成,导致查询视图时无任何行。 +- 为便于观察,可以在大表上执行索引构建,或使用复杂字段(如大文本)放缓过程。 +- 在 AO 表上执行 `CREATE INDEX` 同样会上报进度信息。 +- 该视图可与 ``pg_stat_activity`` 联合使用,通过 ``pid`` 字段关联: + + ```sql + SELECT a.usename, a.query, p.phase, p.blocks_done, p.blocks_total + FROM pg_stat_activity a + JOIN pg_stat_progress_create_index p ON a.pid = p.pid; + ``` + +::: diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-2.x.json b/i18n/zh/docusaurus-plugin-content-docs/version-2.x.json index b8623fcbc..dbd14ea33 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-2.x.json +++ b/i18n/zh/docusaurus-plugin-content-docs/version-2.x.json @@ -94,5 +94,9 @@ "sidebar.docsbars.category.GPORCA Query Optimizer": { "message": "GPORCA 查询优化器", "description": "The label for category GPORCA query optimizer in sidebar docs" + }, + "sidebar.docsbars.category.System Views": { + "message": "系统视图", + "description": "The label for category system views in sidebar docs" } } diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-2.x/config-params-guc-list.md b/i18n/zh/docusaurus-plugin-content-docs/version-2.x/config-params-guc-list.md new file mode 100644 index 000000000..31f4f73e0 --- /dev/null +++ b/i18n/zh/docusaurus-plugin-content-docs/version-2.x/config-params-guc-list.md @@ -0,0 +1,469 @@ +--- +title: 配置参数 +--- + +# 配置参数 + +本文档按照字母顺序列出 Apache Cloudberry 数据库的配置参数 (GUC)。 + +## autovacuum_freeze_max_age + +- 变量类型:Integer +- 默认值:200000000 +- 值范围:[100000,2000000000] +- 设置分类:postmaster +- 描述:设置表中事务 ID 的“最大使用年龄”。当事务 ID 自分配以来累计经过的事务数达到该值时,系统会自动对表执行 autovacuum,以防止事务 ID 回卷。即使关闭了 autovacuum,系统也会强制执行此操作以保障数据安全。 + +## autovacuum_vacuum_cost_delay + +- 变量类型:Real +- 默认值:2 +- 单位:ms +- 值范围:[-1,100] +- 设置分类:sighup +- 描述:设置 autovacuum 操作中 vacuum 成本延迟的时间(单位:毫秒)。 + +## autovacuum_vacuum_scale_factor + +- 变量类型:Real +- 默认值:0.2 +- 值范围:[0,100] +- 设置分类:sighup +- 描述:控制在执行 autovacuum 之前,表中已更新或删除元组数量相对于总元组数的比例阈值。 + +## autovacuum_vacuum_threshold + +- 变量类型:Integer +- 默认值:50 +- 值范围:[0,2147483647] +- 设置分类:sighup +- 描述:控制触发 autovacuum 所需的最小更新或删除元组数量。 + +## checkpoint_timeout + +- 值范围:30 - 86400(整数,单位为秒) +- 默认值:300(5 分钟) +- 设置分类:local, system, reload +- 描述:指定自动 WAL 检查点之间的最大时间间隔。 + + 如果设置此参数时未指定单位,系统会默认按秒解析。允许的范围是 30 秒到 1 天。默认值为 5 分钟(即 300 秒或 5min)。增大此参数的值会增加崩溃恢复所需的时间。 + +## gp_appendonly_compaction_segfile_limit + +- 变量类型:Integer +- 默认值:10 +- 值范围:[0,127] +- 设置分类:user +- 描述:设置插入操作时必须预留的 append-only segfile 最小数量。 + +## gp_autostats_lock_wait + +- 变量类型:Bool +- 默认值:off +- 设置分类:user +- 描述:控制 autostats 自动生成的 `ANALYZE` 是否等待锁获取。 + +## gp_command_count + +- 变量类型:Integer +- 默认值:0 +- 值范围:[0,2147483647] +- 设置分类:internal +- 描述:显示当前会话中客户端已发送的命令数量。 + +## gp_dynamic_partition_pruning + +- 参数类型:Boolean +- 默认值:on +- 设置分类:coordinator, session, reload +- 描述:启用可动态消除分区扫描的执行计划。 + +## gp_enable_runtime_filter_pushdown + +- 值范围:Boolean +- 默认值:off +- 设置分类:user +- 描述:尝试将 hash join 的哈希表作为 bloom filter 下推到顺序扫描或访问方法(AM)。 + +## gp_enable_statement_trigger + +- 变量类型:Bool +- 默认值:off +- 设置分类:user +- 描述:允许创建语句级触发器。 + +## gp_max_partition_level + +- 变量类型:Integer +- 默认值:0 +- 值范围:[0,2147483647] +- 设置分类:superuser +- 描述:设置使用经典语法创建分区表时允许的最大分区层级数。 + +## gp_resource_manager + +- 值范围:none, group, group-v2, queue +- 默认值:none +- 设置分类:local, system, restart +- 描述:指定当前在 Apache Cloudberry 数据库集群中启用的资源管理方案。 + - `none`:不使用任何资源管理器(默认值)。 + - `group`:使用资源组,并基于 Linux cgroup v1 功能启用资源组行为。 + - `group-v2`:使用资源组,并基于 Linux cgroup v2 功能启用资源组行为。 + - `queue`:使用资源队列进行资源管理。 + +## gp_role + +- 值范围:dispatch, execute, utility +- 默认值:未定义(取决于进程类型) +- 设置分类:read only(仅在后台自动设置) +- 描述:该参数用于标识当前服务器进程的角色。 +- Coordinator 进程的角色为 `dispatch`,表示它负责查询调度。Segment 进程的角色为 `execute`,表示它负责执行查询计划。`utility` 用于一些特殊的维护或管理会话。该参数在后台由系统自动设置,主要用于标识不同类型的内部工作进程。 + +## gp_session_id + +- 变量类型:Integer +- 默认值:-1 +- 值范围:[-2147483648,2147483647] +- 设置分类:backend +- 描述:在 Apache Cloudberry 集群中用于唯一标识某个会话的全局 ID。 + +## krb_server_keyfile + +- 变量类型:string +- 默认值:`FILE:/workspace/dist/database/etc/postgresql/krb5.keytab` +- 设置分类:sighup +- 描述:设置 Kerberos 服务器密钥文件的位置。 + +## log_checkpoints + +- 值范围:Boolean +- 默认值:on +- 设置分类:local, system, reload +- 描述:将检查点 (checkpoint) 和重启点 (restartpoint) 写入服务器日志。日志消息中包含一些统计信息,例如写入的缓冲区数量和写入所花费的时间。 + +## max_connections + +- 值范围:10 - 262143 +- 默认值:Coordinator 上为 250,Segment 上为 750 +- 设置分类:local, system, restart +- 描述:数据库服务器允许的最大并发连接数。 + + 在 Apache Cloudberry 系统中,客户端连接仅通过 Coordinator 实例进入。Segment 实例应允许为 Coordinator 的 3 到 10 倍的连接数。增加此参数的值时,必须相应地增加 `max_prepared_transactions` 的值。 + + 此参数值越大,Apache Cloudberry 需要的共享内存也越多。 + +## max_replication_slots + +- 变量类型:Integer +- 默认值:10 +- 值范围:[0,262143] +- 设置分类:postmaster +- 描述:设置可同时定义的复制槽的最大数量。 + +## optimizer_array_constraints + +- 变量类型:Bool +- 默认值:on +- 设置分类:user +- 描述:允许优化器的约束推导框架识别数组类型的约束条件。 + +## optimizer_array_expansion_threshold + +- 值范围:大于 `0` 的整数 +- 默认值:20 +- 设置分类:coordinator, session, reload +- 描述:当启用 GPORCA(默认启用)并执行包含常量数组谓词的查询时,`optimizer_array_expansion_threshold` 参数会根据数组中常量的数量限制优化过程。 + + 如果查询谓词中的数组元素数量超过该参数指定的值,GPORCA 在查询优化期间将不会将该谓词转换为析取范式 (disjunctive normal form),从而缩短优化时间并减少内存消耗。例如,当 GPORCA 处理一个 `IN` 子句元素超过 20 个的查询时,为了优化性能,它不会将该子句转换为析取范式。在执行计划中,这种行为差异可以从 `IN` 条件的过滤方式中看出。 + + 修改该参数的值会影响优化时间和内存使用之间的权衡,同时也可能影响到基于约束推导带来的优化收益,例如冲突检测和分区裁剪。此参数可在数据库系统级、单个数据库级,或 session 和 query 级别进行设置。 + +## optimizer_cost_model + +- 变量类型:Enum +- 默认值:calibrated +- 设置分类:user +- 描述:设置优化器使用的成本模型。 + +## optimizer_cost_threshold + +- 变量类型:Real +- 默认值:0 +- 值范围:[0,2147480000] +- 设置分类:user +- 描述:设置与最佳执行计划成本相关的采样阈值,0 表示不设上限。 + +## optimizer_cte_inlining_bound + +- 变量类型:Integer +- 默认值:0 +- 值范围:[0,2147483647] +- 设置分类:user +- 描述:设置优化器决定是否内联 CTE(公用表表达式)的大小界限。 + +## optimizer_damping_factor_filter + +- 变量类型:Real +- 默认值:0.75 +- 值范围:[0,1] +- 设置分类:user +- 描述:设置优化器中用于选择谓词的抑制因子,`1.0` 表示不进行抑制。 + +## optimizer_damping_factor_groupby + +- 变量类型:Real +- 默认值:0.75 +- 值范围:[0,1] +- 设置分类:user +- 描述:设置优化器中 `group by` 操作的抑制因子,`1.0` 表示不进行抑制。 + +## optimizer_damping_factor_join + +- 变量类型:Real +- 默认值:0 +- 值范围:[0,1] +- 设置分类:user +- 描述:设置优化器中连接谓词的抑制因子,`1.0` 表示不抑制,`0.0` 表示使用平方根抑制法。 + +## optimizer_discard_redistribute_hashjoin + +- 变量类型:Bool +- 默认值:off +- 设置分类:user +- 描述:控制优化器是否丢弃包含 redistribute 动作的哈希连接计划。 + +## optimizer_dpe_stats + +- 变量类型:Bool +- 默认值:on +- 设置分类:user +- 描述:启用针对动态分区消除场景下分区表的统计信息推导。 + +## optimizer_enable_derive_stats_all_groups + +- 变量类型:Bool +- 默认值:off +- 设置分类:user +- 描述:在完成搜索空间探索后,启用对所有分组的统计信息推导。 + +## optimizer_enable_dynamicbitmapscan + +- 值范围:Boolean +- 默认值:on +- 设置分类:user +- 描述:启用后,优化器会使用 dynamic bitmap scan(动态位图扫描)计划。 + + 如果将该参数设为 `off`,GPORCA 将不会生成 dynamic bitmap scan 计划,而是退回使用 dynamic sequential scan(动态顺序扫描)作为替代。 + +## optimizer_enable_dynamicindexonlyscan + +- 参数类型:Boolean +- 默认值:on +- 设置分类:coordinator, session, reload +- 描述:当启用 GPORCA(默认启用)时,`optimizer_enable_dynamicindexonlyscan` 参数用于控制是否生成动态仅索引 (dynamic index-only) 扫描计划。 + + 默认值为 `on`,在对分区表进行查询计划时,如果查询中不包含单行易变(SIRV)函数,GPORCA 可能会生成动态仅索引扫描作为替代方案。如果设置为 `off`,GPORCA 不会生成动态仅索引扫描计划。此参数可在数据库系统级、单个数据库级,或在 session 和 query 级别进行设置。 + +## optimizer_enable_dynamicindexscan + +- 值范围:Boolean +- 默认值:on +- 设置分类:user +- 描述:该参数用于控制是否在查询计划中启用动态索引扫描。 + + 启用后,优化器会使用 dynamic index scan(动态索引扫描)计划。如果将该参数设为 `off`,GPORCA 将不会生成 dynamic index scan 计划,而是退回使用 dynamic sequential scan(动态顺序扫描)作为替代。 + +## optimizer_enable_foreign_table + +- 参数类型:Boolean +- 默认值:true +- 设置分类:coordinator, session, reload +- 描述:当启用 GPORCA(默认启用)且该参数设置为 `true`(默认值)时,GPORCA 会为涉及外部表 (foreign table) 的查询生成执行计划。 + + 如果设置为 `false`,包含外部表的查询将回退由基于 PostgreSQL 的优化器生成执行计划。 + +## optimizer_enable_indexonlyscan + +- 参数类型:Boolean +- 默认值:true +- 设置分类:coordinator, session, reload +- 描述:当启用 GPORCA(默认启用)且此参数设置为 `true`(默认值)时,GPORCA 可以为 B-tree 索引和包含查询中所有列的任意类型索引生成仅索引 (index-only) 扫描计划。(GiST 索引仅支持部分操作类的 index-only 扫描。) + + GPORCA 只访问索引中的值,不访问表的实际数据块。这可以提高查询执行性能,尤其是在表经过 vacuum、包含宽列,且索引中已包含所有可见列的情况下,无需读取任何数据块。如果将此参数设置为 `false`,GPORCA 将不会生成 index-only 扫描计划。此参数可在数据库系统级、单个数据库级,或在 session 和 query 级别进行设置。 + +## optimizer_enable_orderedagg + +- 参数类型:Boolean +- 默认值:on +- 设置分类:coordinator, session, reload +- 描述:当启用 GPORCA(默认启用)时,此参数用于控制是否为有序聚合 (ordered aggregates) 生成查询计划。 + + 当设置为 `on`(默认值)时,GPORCA 会为包含有序聚合的查询生成执行计划。当设置为 `off` 时,该类查询将回退由基于 PostgreSQL 的优化器进行计划生成。 + + 此参数可设置在数据库系统级、单个数据库级,或 session 和 query 级别。 + +## optimizer_enable_push_join_below_union_all + +- 参数类型:Boolean +- 默认值:off +- 设置分类:coordinator, session, reload +- 描述:当启用 GPORCA(默认启用)时,`optimizer_enable_push_join_below_union_all` 参数用于控制 GPORCA 在遇到包含 `JOIN UNION ALL` 的查询时的行为。 + + 默认值为 `off`,GPORCA 在查询中包含 `JOIN UNION ALL` 时不会进行任何变换。 + + 如果设置为 `on` 且计划成本符合要求,GPORCA 会将 `JOIN UNION ALL` 转换为多个子查询各自进行 `JOIN` 的 `UNION ALL`。当 `UNION ALL` 的子查询能够受益于连接操作(但在原始计划中不符合条件)时,该变换可能提升连接性能。 + + 例如,对于一个索引嵌套循环连接 (indexed nested loop join) 非常高效的场景,即内表较大且有索引,而外表较小时;或者多个有索引的大表与一个小表进行 `UNION ALL` 后再连接的情况,此变换可将连接条件下推为索引条件,从而比使用哈希连接性能更好。 + + 启用此变换可能会增加查询规划时间,因此建议使用 `EXPLAIN` 分析开启和关闭此参数时的查询执行计划。此参数可在数据库系统级、单个数据库级,或在 session 和 query 级别进行设置。 + +## optimizer_enable_query_parameter + +- 变量类型:Bool +- 默认值:on +- 设置分类:user +- 描述:允许 GPORCA 优化器理查询参数。 + +## optimizer_enable_right_outer_join + +- 参数类型:Boolean +- 默认值:on +- 设置分类:coordinator, session, reload +- 描述:当启用 GPORCA(默认启用)时,此参数用于控制 GPORCA 是否生成 right outer join(右外连接)。 + + 如果设置为默认值 `on`,GPORCA 既可以直接生成 right outer join,也可以将 left outer join 转换为 right outer join(当优化器认为合适时)。如果设置为 `off`,GPORCA 会将传入的 right outer join 转换为等价的 left outer join,并完全避免生成任何 right outer join。 + + 如果在使用 right outer join 的查询中遇到性能问题,可以通过将此参数设置为 `off` 来禁止使用 right outer join。 + + 此参数可以在数据库系统级、单个数据库级,或 session 和 query 级别进行设置。不过更推荐在查询级别进行控制,因为某些场景下 right outer join 是更合适的查询计划选择。 + +## optimizer_force_three_stage_scalar_dqa + +- 变量类型:Bool +- 默认值:on +- 设置分类:user +- 描述:强制优化器始终为标量形式的去重聚合(Distinct Qualified Aggregate)选择三阶段聚合计划。 + +## optimizer_nestloop_factor + +- 变量类型:Real +- 默认值:1024 +- 值范围:[1,1.79769e+308] +- 设置分类:user +- 描述:设置优化器中嵌套循环连接(NestLoop Join)的成本因子。 + +## optimizer_penalize_broadcast_threshold + +- 参数类型:Integer +- 值范围:`[0,2147483647]` +- 默认值:100000 +- 设置分类:user +- 描述:指定在不受惩罚的情况下,最多可广播的关系(relation)行数。 + + 如果广播的行数超过该阈值,优化器会增加其执行成本以降低选择该计划的可能性。 + + 将该参数设置为 `0` 表示禁用该惩罚机制,即对任何广播都不加惩罚。 + +## optimizer_push_group_by_below_setop_threshold + +- 变量类型:Integer +- 默认值:10 +- 值范围:[0,2147483647] +- 设置分类:user +- 描述:设置在集合操作(SetOp)节点下尝试下推 `GROUP BY` 操作的子节点最大数量。 + +## optimizer_replicated_table_insert + +- 变量类型:Bool +- 默认值:on +- 设置分类:user +- 描述:向复制表插入数据时省略广播操作。 + +## optimizer_skew_factor + +- 变量类型:Integer +- 默认值:0 +- 值范围:[0,100] +- 设置分类:user +- 描述:设置倾斜系数的来源及权重。`0` 表示禁用基于样本统计的倾斜推导,`1`–`100` 表示启用并根据样本计算倾斜比;实际用于成本估算的倾斜度为该参数与倾斜比的乘积。 + +## optimizer_sort_factor + +- 变量类型:Real +- 默认值:1 +- 值范围:[0,1.79769e+308] +- 设置分类:user +- 描述:设置优化器中排序操作的成本因子;`1.0` 表示与默认成本相同,大于 1 表示排序代价更高,小于 1 表示代价更低。 + +## optimizer_trace_fallback + +- 变量类型:Bool +- 默认值:off +- 设置分类:user +- 描述:当 GPORCA 回退使用传统优化器时,在 `INFO` 日志级别打印提示信息。 + +## optimizer_use_gpdb_allocators + +- 变量类型:Bool +- 默认值:on +- 设置分类:postmaster +- 描述:允许 GPORCA 优化器使用数据库的内存上下文管理机制(Memory Contexts)。 + +## optimizer_xform_bind_threshold + +- 变量类型:Integer +- 默认值:0 +- 值范围:[0,2147483647] +- 设置分类:user +- 描述:限制每个转换规则(xform)在每个分组表达式上最多生成的绑定数量。设置为 `0` 表示不启用该限制。 + +## superuser_reserved_connections + +- 值范围:小于 `max_connections` 的整数 +- 默认值:10 +- 设置分类:local, system, restart +- 描述:指定为 Apache Cloudberry 数据库超级用户保留的连接槽数量。 + +## track_io_timing + +- 参数类型:Boolean +- 默认值:off +- 设置分类:superuser +- 描述:用于收集数据库 I/O 活动的时间统计信息。启用此参数后,系统会记录语句执行期间的 I/O 操作耗时,这对于性能分析和瓶颈定位非常有用。该参数默认关闭,仅可由超级用户设置。 + +## wal_compression + +- 变量类型:Bool +- 默认值:on +- 设置分类:superuser +- 描述:启用对 WAL 文件中完整页面写入内容的压缩。 + +## wal_keep_size + +- 参数类型:integer +- 值范围:0 - 2147483647(以 MB 为单位) +- 默认值:320 +- 设置分类:sighup +- 描述:指定为备用服务器 (standby servers) 保留的 WAL 文件大小上限。 + +## work_mem + +- 值范围:以 kilobyte 为单位的整数 +- 默认值:32MB +- 设置分类:coordinator, session, reload +- 描述:指定每个查询操作(例如排序或哈希表)在写入临时磁盘文件前最多可以使用的内存量。如果该参数没有指定单位,则默认为 kilobyte。默认值为 32MB。 + + 在复杂查询中,可能会并行执行多个排序或哈希操作,每个操作在写入临时文件前都可以使用 `work_mem` 指定的内存量。此外,多个会话可能同时进行这些操作,因此总内存消耗可能远高于 `work_mem` 本身的数值。在选择该参数值时需特别注意这一点。 + + `work_mem` 会影响这些操作:用于 `ORDER BY`、`DISTINCT` 和合并连接的排序操作;用于哈希连接、哈希聚合,以及处理 `IN` 子查询的哈希表;位图索引扫描;基于 tuple store 的操作,例如函数扫描、CTE、PL/pgSQL 和管理类 UDF 等。 + + 除了为特定执行算子分配内存外,`work_mem` 还会影响 PostgreSQL 优化器选择某些查询计划的偏好。需要注意的是,`work_mem` 是独立于资源队列和资源组的内存管理机制,它在查询级别生效,不会受到资源队列或资源组内存限制的影响。 + +## writable_external_table_bufsize + +- 变量类型:Integer +- 默认值:1024 +- 单位:kB +- 值范围:[32,131072] +- 设置分类:user +- 描述:为可写外部表写入 gpfdist 之前设置的缓冲区大小(单位:kB)。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-2.x/sys-catalogs/sys-views/pg-stat-progress-create-index.md b/i18n/zh/docusaurus-plugin-content-docs/version-2.x/sys-catalogs/sys-views/pg-stat-progress-create-index.md index f118bbf8a..d3b66a7ea 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-2.x/sys-catalogs/sys-views/pg-stat-progress-create-index.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-2.x/sys-catalogs/sys-views/pg-stat-progress-create-index.md @@ -4,24 +4,63 @@ title: pg_stat_progress_create_index # pg_stat_progress_create_index -`pg_stat_progress_create_index` 视图提供了 `CREATE INDEX` 和 `REINDEX` 操作进度的实时信息。每一行代表一个当前正在构建索引的后端进程,包含诸如命令类型、当前阶段、已处理的块数和元组数以及分区信息等详细信息。此视图有助于监控和管理长时间运行的索引创建任务。 - -| 字段 | 类型 | 描述 | -|---|---|---| -| `gp_segment_id` | integer | Segment(或 Coordinator)实例的唯一标识符。(此字段不在 `gp_stat_progress_create_index_summary` 视图中。) | -| `pid` | integer | 后端进程的 ID,如果是在 `gp_stat_progress_create_index_summary` 视图中,则是 Coordinator 进程的 ID。 | -| `datid` | oid | 此后端连接的数据库的对象标识符。 | -| `datname` | name | 此后端连接的数据库名称。 | -| `relid` | oid | 正在创建索引的表的对象标识符。 | -| `index_relid` | oid | 正在创建或重建的索引的对象标识符。由于 Apache Cloudberry 不支持并发(重)索引,此值始终为 `0`。 | -| `command` | text | 正在运行的命令名称:`CREATE INDEX` 或 `REINDEX`。 | -| `phase` | text | 索引创建的当前处理阶段。 | -| `lockers_total` | bigint | 适用的情况下,需要等待的锁总数。 | -| `lockers_done` | bigint | 已经等待的锁数量。 | -| `current_locker_pid` | bigint | 当前正在等待的锁的进程 ID。 | -| `blocks_total` | bigint | 当前阶段要处理的块总数。 | -| `blocks_done` | bigint | 当前阶段已处理的块数。 | -| `tuples_total` | bigint | 当前阶段要处理的元组总数。 | -| `tuples_done` | bigint | 当前阶段已处理的元组数。 | -| `partitions_total` | bigint | 在分区表上创建索引时,此列设置为要创建索引的分区总数。在 `REINDEX` 期间此字段为 0。 | -| `partitions_done` | bigint | 在分区表上创建索引时,此列设置为已完成索引的分区数量。在 `REINDEX` 期间此字段为 0。 | +`pg_stat_progress_create_index` 是一个系统视图,用于显示当前数据库中正在执行的 `CREATE INDEX` 或 `REINDEX` 操作的实时执行进度。 + +该视图展示的是每个正在构建索引的后台进程 (Backend) 的详细状态,包括执行阶段(phase)、处理的元组数量、已扫描的数据块数等信息。在执行完成后,该视图对应记录将被自动清除。 + +在 Apache Cloudberry 中,该视图也适用于 AO 表(Append-Optimized 表),可用于观察其索引构建过程的阶段性信息。 + +该视图的主要用途包括: + +- 实时监控索引创建或重建进程。 +- 分析长时间运行的 `CREATE INDEX`/`REINDEX` 的瓶颈。 +- 判断系统当前是否存在资源占用的索引任务。 +- 配合 `pg_stat_activity` 分析执行索引任务的进程信息。 + +示例如下: + +```sql +-- 查询所有正在执行的索引构建任务 +SELECT * FROM pg_stat_progress_create_index; + +-- 查询指定表上的索引进度 +SELECT * FROM pg_stat_progress_create_index +WHERE relid = 'ao_test'::regclass; +``` + +### 字段说明 + +| 字段名 | 说明 | +|----------------------|------| +| `gp_segment_id` | 当前记录所属的 Segment ID,仅在分布式环境中有效。 | +| `pid` | 后台进程的进程号,可与 `pg_stat_activity` 联合查询获取会话详情。 | +| `datid` | 数据库的 OID,对应 `pg_database.oid`。 | +| `datname` | 数据库名称。 | +| `relid` | 正在被索引的表的 OID,对应 `pg_class.oid`。 | +| `index_relid` | 正在构建的索引对象的 OID。 | +| `command` | 当前执行的命令类型:`CREATE INDEX` 或 `REINDEX`。 | +| `phase` | 当前执行阶段,例如: <br /> `initializing`:初始化阶段 <br /> `scanning heap`:扫描表数据 <br /> `sorting`:排序阶段 <br /> `building index: loading tuples in tree`:构建索引结构 <br /> `waiting for locks`:等待表锁或元数据锁 | +| `lockers_total` | 等待释放锁的会话总数(如有)。 | +| `lockers_done` | 已完成锁释放的会话数量。 | +| `current_locker_pid` | 当前持有锁的进程号(如果正在等待)。 | +| `blocks_total` | 要扫描的总数据块数(可能为 0 表示不可用或未开始)。 | +| `blocks_done` | 已经扫描的数据块数。 | +| `tuples_total` | 预计需要处理的元组总数(如可计算)。 | +| `tuples_done` | 已处理的元组数。 | +| `partitions_total` | (如适用)分区表中总分区数。 | +| `partitions_done` | (如适用)已处理的分区数。 | + +:::note 注意 +- 此视图仅展示正在执行中的索引操作。索引创建完成后,记录会从视图中消失。 +- 对于数据量小的表,索引可能瞬间完成,导致查询视图时无任何行。 +- 为便于观察,可以在大表上执行索引构建,或使用复杂字段(如大文本)放缓过程。 +- 在 AO 表上执行 `CREATE INDEX` 同样会上报进度信息。 +- 该视图可与 ``pg_stat_activity`` 联合使用,通过 ``pid`` 字段关联: + + ```sql + SELECT a.usename, a.query, p.phase, p.blocks_done, p.blocks_total + FROM pg_stat_activity a + JOIN pg_stat_progress_create_index p ON a.pid = p.pid; + ``` + +::: diff --git a/sidebars.ts b/sidebars.ts index 684239e75..fcffadcbd 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -350,6 +350,7 @@ const sidebars: SidebarsConfig = { }, 'data-types', + 'config-params-guc-list', { type: 'category', @@ -474,7 +475,6 @@ const sidebars: SidebarsConfig = { 'sys-catalogs/sys-tables/pg-user-mapping', ] }, - { type: 'category', label: 'System Views', diff --git a/versioned_docs/version-2.x/config-params-guc-list.md b/versioned_docs/version-2.x/config-params-guc-list.md new file mode 100644 index 000000000..f4a00acde --- /dev/null +++ b/versioned_docs/version-2.x/config-params-guc-list.md @@ -0,0 +1,470 @@ +--- +title: Configuration Parameters +--- + +# Configuration Parameters + +This document lists the configuration parameters (GUC) of Apache Cloudberry database in alphabetical order. + +## autovacuum_freeze_max_age + +- Variable Type: Integer +- Default Value: 200000000 +- Value Range: [100000,2000000000] +- Setting Category: postmaster +- Description: Sets the "maximum age" of transaction IDs in a table. When the number of transactions accumulated since the transaction ID was allocated reaches this value, the system automatically performs autovacuum on the table to prevent transaction ID wraparound. This operation is enforced even if autovacuum is disabled to ensure data safety. + +## autovacuum_vacuum_cost_delay + +- Variable Type: Real +- Default Value: 2 +- Unit: ms +- Value Range: [-1,100] +- Setting Category: sighup +- Description: Sets the vacuum cost delay time (in milliseconds) for autovacuum operations. + +## autovacuum_vacuum_scale_factor + +- Variable Type: Real +- Default Value: 0.2 +- Value Range: [0,100] +- Setting Category: sighup +- Description: Controls the threshold ratio of updated or deleted tuples to total tuples before autovacuum is performed. + +## autovacuum_vacuum_threshold + +- Variable Type: Integer +- Default Value: 50 +- Value Range: [0,2147483647] +- Setting Category: sighup +- Description: Controls the minimum number of updated or deleted tuples required to trigger autovacuum. + +## checkpoint_timeout + +- Value Range: 30 - 86400 (integer, in seconds) +- Default Value: 300 (5 minutes) +- Setting Category: local, system, reload +- Description: Specifies the maximum time interval between automatic WAL checkpoints. + + If no unit is specified when setting this parameter, the system defaults to seconds. The allowed range is 30 seconds to 1 day. The default value is 5 minutes (300 seconds or 5min). Increasing this parameter's value will increase the time required for crash recovery. + +## gp_appendonly_compaction_segfile_limit + +- Variable Type: Integer +- Default Value: 10 +- Value Range: [0,127] +- Setting Category: user +- Description: Sets the minimum number of append-only segfiles that must be reserved for insert operations. + +## gp_autostats_lock_wait + +- Variable Type: Bool +- Default Value: off +- Setting Category: user +- Description: Controls whether autostats automatically generated `ANALYZE` waits for lock acquisition. + +## gp_command_count + +- Variable Type: Integer +- Default Value: 0 +- Value Range: [0,2147483647] +- Setting Category: internal +- Description: Displays the number of commands sent by the client in the current session. + +## gp_dynamic_partition_pruning + +- Parameter Type: Boolean +- Default Value: on +- Setting Category: coordinator, session, reload +- Description: Enables execution plans that can dynamically eliminate partition scans. + +## gp_enable_runtime_filter_pushdown + +- Value Range: Boolean +- Default Value: off +- Setting Category: user +- Description: Attempts to push down the hash table of hash join as a bloom filter to sequential scan or access methods (AM). + +## gp_enable_statement_trigger + +- Variable Type: Bool +- Default Value: off +- Setting Category: user +- Description: Allows creation of statement-level triggers. + +## gp_max_partition_level + +- Variable Type: Integer +- Default Value: 0 +- Value Range: [0,2147483647] +- Setting Category: superuser +- Description: Sets the maximum allowed partition level when creating partitioned tables using classic syntax. + +## gp_resource_manager + +- Value Range: none, group, group-v2, queue +- Default Value: none +- Setting Category: local, system, restart +- Description: Specifies the resource management scheme currently enabled in the Apache Cloudberry database cluster. + - `none`: No resource manager is used (default). + - `group`: Uses resource groups and enables resource group behavior based on Linux cgroup v1 functionality. + - `group-v2`: Uses resource groups and enables resource group behavior based on Linux cgroup v2 functionality. + - `queue`: Uses resource queues for resource management. + +## gp_role + +- Value Range: dispatch, execute, utility +- Default Value: Undefined (depends on process type) +- Setting Category: read only (automatically set in background) +- Description: This parameter is used to identify the role of the current server process. + + The role of the Coordinator process is `dispatch`, indicating it is responsible for query dispatch. The role of the Segment process is `execute`, indicating it is responsible for executing query plans. `utility` is used for special maintenance or management sessions. This parameter is automatically set by the system in the background and is mainly used to identify different types of internal worker processes. + +## gp_session_id + +- Variable Type: Integer +- Default Value: -1 +- Value Range: [-2147483648,2147483647] +- Setting Category: backend +- Description: Used to uniquely identify a session in the Apache Cloudberry cluster. + +## krb_server_keyfile + +- Variable Type: string +- Default Value: `<FILE:/workspace/dist/database/etc/postgresql/krb5.keytab>` +- Setting Category: sighup +- Description: Sets the location of the Kerberos server key file. + +## log_checkpoints + +- Value Range: Boolean +- Default Value: on +- Setting Category: local, system, reload +- Description: Writes checkpoint and restartpoint information to the server log. The log messages include statistics such as the number of buffers written and the time taken to write them. + +## max_connections + +- Value Range: 10 - 262143 +- Default Value: 250 on Coordinator, 750 on Segment +- Setting Category: local, system, restart +- Description: The maximum number of concurrent connections allowed by the database server. + + In the Apache Cloudberry system, client connections enter only through the Coordinator instance. Segment instances should allow 3 to 10 times the number of connections as the Coordinator. When increasing this parameter's value, the value of `max_prepared_transactions` must be increased accordingly. + + The larger this parameter value, the more shared memory Apache Cloudberry requires. + +## max_replication_slots + +- Variable Type: Integer +- Default Value: 10 +- Value Range: [0,262143] +- Setting Category: postmaster +- Description: Sets the maximum number of replication slots that can be defined simultaneously. + +## optimizer_array_constraints + +- Variable Type: Bool +- Default Value: on +- Setting Category: user +- Description: Allows the optimizer's constraint derivation framework to recognize array-type constraints. + +## optimizer_array_expansion_threshold + +- Value Range: Integer greater than `0` +- Default Value: 20 +- Setting Category: coordinator, session, reload +- Description: When GPORCA is enabled (default) and executing queries containing constant array predicates, the `optimizer_array_expansion_threshold` parameter limits the optimization process based on the number of constants in the array. + + If the number of array elements in the query predicate exceeds the value specified by this parameter, GPORCA will not convert the predicate to disjunctive normal form during query optimization, thereby reducing optimization time and memory consumption. For example, when GPORCA processes a query with an `IN` clause containing more than 20 elements, it will not convert the clause to disjunctive normal form for optimization performance. This behavioral difference can be observed in the [...] + + Modifying this parameter's value affects the trade-off between optimization time and memory usage, as well as optimization benefits from constraint derivation, such as conflict detection and partition pruning. This parameter can be set at the database system level, individual database level, or session and query level. + +## optimizer_cost_model + +- Variable Type: Enum +- Default Value: calibrated +- Setting Category: user +- Description: Sets the cost model used by the optimizer. + +## optimizer_cost_threshold + +- Variable Type: Real +- Default Value: 0 +- Value Range: [0,2.15E+09] +- Setting Category: user +- Description: Sets the sampling threshold related to the optimal execution plan cost, where 0 means no upper limit. + +## optimizer_cte_inlining_bound + +- Variable Type: Integer +- Default Value: 0 +- Value Range: [0,2147483647] +- Setting Category: user +- Description: Sets the size boundary for the optimizer to decide whether to inline CTEs (Common Table Expressions). + +## optimizer_damping_factor_filter + +- Variable Type: Real +- Default Value: 0.75 +- Value Range: [0,1] +- Setting Category: user +- Description: Sets the damping factor used for selection predicates in the optimizer, where `1.0` means no damping. + +## optimizer_damping_factor_groupby + +- Variable Type: Real +- Default Value: 0.75 +- Value Range: [0,1] +- Setting Category: user +- Description: Sets the damping factor for `group by` operations in the optimizer, where `1.0` means no damping. + +## optimizer_damping_factor_join + +- Variable Type: Real +- Default Value: 0 +- Value Range: [0,1] +- Setting Category: user +- Description: Sets the damping factor for join predicates in the optimizer, where `1.0` means no damping and `0.0` means using square root damping. + +## optimizer_discard_redistribute_hashjoin + +- Variable Type: Bool +- Default Value: off +- Setting Category: user +- Description: Controls whether the optimizer discards hash join plans containing redistribute operations. + +## optimizer_dpe_stats + +- Variable Type: Bool +- Default Value: on +- Setting Category: user +- Description: Enables statistics derivation for partitioned tables in dynamic partition elimination scenarios. + +## optimizer_enable_derive_stats_all_groups + +- Variable Type: Bool +- Default Value: off +- Setting Category: user +- Description: Enables statistics derivation for all groups after completing search space exploration. + +## optimizer_enable_dynamicbitmapscan + +- Value Range: Boolean +- Default Value: on +- Setting Category: user +- Description: When enabled, the optimizer uses dynamic bitmap scan plans. + + If this parameter is set to `off`, GPORCA will not generate dynamic bitmap scan plans and will fall back to using dynamic sequential scan as an alternative. + +## optimizer_enable_dynamicindexonlyscan + +- Parameter Type: Boolean +- Default Value: on +- Setting Category: coordinator, session, reload +- Description: When GPORCA is enabled (default), the `optimizer_enable_dynamicindexonlyscan` parameter controls whether to generate dynamic index-only scan plans. + + The default value is `on`. When planning queries on partitioned tables, if the query does not contain single-row volatile (SIRV) functions, GPORCA may generate dynamic index-only scans as an alternative. If set to `off`, GPORCA will not generate dynamic index-only scan plans. This parameter can be set at the database system level, individual database level, or session and query level. + +## optimizer_enable_dynamicindexscan + +- Value Range: Boolean +- Default Value: on +- Setting Category: user +- Description: This parameter controls whether to enable dynamic index scan in query plans. + + When enabled, the optimizer uses dynamic index scan plans. If this parameter is set to `off`, GPORCA will not generate dynamic index scan plans and will fall back to using dynamic sequential scan as an alternative. + +## optimizer_enable_foreign_table + +- Parameter Type: Boolean +- Default Value: true +- Setting Category: coordinator, session, reload +- Description: When GPORCA is enabled (default) and this parameter is set to `true` (default), GPORCA generates execution plans for queries involving foreign tables. + + If set to `false`, queries containing foreign tables will fall back to being planned by the PostgreSQL-based optimizer. + +## optimizer_enable_indexonlyscan + +- Parameter Type: Boolean +- Default Value: true +- Setting Category: coordinator, session, reload +- Description: When GPORCA is enabled (default) and this parameter is set to `true` (default), GPORCA can generate index-only scan plans for B-tree indexes and any type of index that contains all columns in the query. (GiST indexes only support index-only scans for certain operator classes.) + + GPORCA only accesses values in the index and not the actual data blocks of the table. This can improve query execution performance, especially when the table has been vacuumed, contains wide columns, and all visible columns are already in the index, eliminating the need to read any data blocks. If this parameter is set to `false`, GPORCA will not generate index-only scan plans. This parameter can be set at the database system level, individual database level, or session and query level. + +## optimizer_enable_orderedagg + +- Parameter Type: Boolean +- Default Value: on +- Setting Category: coordinator, session, reload +- Description: When GPORCA is enabled (default), this parameter controls whether to generate query plans for ordered aggregates. + + When set to `on` (default), GPORCA generates execution plans for queries containing ordered aggregates. When set to `off`, such queries will fall back to being planned by the PostgreSQL-based optimizer. + + This parameter can be set at the database system level, individual database level, or session and query level. + +## optimizer_enable_push_join_below_union_all + +- Parameter Type: Boolean +- Default Value: off +- Setting Category: coordinator, session, reload +- Description: When GPORCA is enabled (default), the `optimizer_enable_push_join_below_union_all` parameter controls GPORCA's behavior when encountering queries containing `JOIN UNION ALL`. + + The default value is `off`. GPORCA will not perform any transformation when the query contains `JOIN UNION ALL`. + + If set to `on` and the plan cost meets requirements, GPORCA will transform `JOIN UNION ALL` into multiple subqueries each performing `JOIN` followed by `UNION ALL`. This transformation may improve join performance when the subqueries of `UNION ALL` can benefit from join operations (but don't qualify in the original plan). + + For example, in scenarios where indexed nested loop joins are highly efficient, such as when the inner table is large with an index and the outer table is small, or when multiple large tables with indexes are `UNION ALL`'ed with a small table, this transformation can push join conditions down as index conditions, potentially performing better than using hash joins. + + Enabling this transformation may increase query planning time, so it's recommended to use `EXPLAIN` to analyze query execution plans with this parameter both enabled and disabled. This parameter can be set at the database system level, individual database level, or session and query level. + +## optimizer_enable_query_parameter + +- Variable Type: Bool +- Default Value: on +- Setting Category: user +- Description: Allows the GPORCA optimizer to handle query parameters. + +## optimizer_enable_right_outer_join + +- Parameter Type: Boolean +- Default Value: on +- Setting Category: coordinator, session, reload +- Description: When GPORCA is enabled (default), this parameter controls whether GPORCA generates right outer joins. + + If set to the default value `on`, GPORCA can either directly generate right outer joins or convert left outer joins to right outer joins (when the optimizer deems it appropriate). If set to `off`, GPORCA will convert incoming right outer joins to equivalent left outer joins and completely avoid generating any right outer joins. + + If you encounter performance issues with queries using right outer joins, you can disable right outer joins by setting this parameter to `off`. + + This parameter can be set at the database system level, individual database level, or session and query level. However, it's more recommended to control it at the query level, as right outer joins may be more appropriate query plan choices in certain scenarios. + +## optimizer_force_three_stage_scalar_dqa + +- Variable Type: Bool +- Default Value: on +- Setting Category: user +- Description: Forces the optimizer to always choose three-stage aggregation plans for scalar distinct qualified aggregates. + +## optimizer_nestloop_factor + +- Variable Type: Real +- Default Value: 1024 +- Value Range: [1,1.79769e+308] +- Setting Category: user +- Description: Sets the cost factor for nested loop joins in the optimizer. + +## optimizer_penalize_broadcast_threshold + +- Parameter Type: Integer +- Value Range: `[0,2147483647]` +- Default Value: 100000 +- Setting Category: user +- Description: Specifies the maximum number of relation rows that can be broadcast without penalty. + + If the number of broadcast rows exceeds this threshold, the optimizer will increase its execution cost to reduce the likelihood of selecting that plan. + + Setting this parameter to `0` disables the penalty mechanism, meaning no penalty is applied to any broadcast. + +## optimizer_push_group_by_below_setop_threshold + +- Variable Type: Integer +- Default Value: 10 +- Value Range: [0,2147483647] +- Setting Category: user +- Description: Sets the maximum number of child nodes to attempt pushing `GROUP BY` operations below SetOp nodes. + +## optimizer_replicated_table_insert + +- Variable Type: Bool +- Default Value: on +- Setting Category: user +- Description: Omits broadcast operations when inserting data into replicated tables. + +## optimizer_skew_factor + +- Variable Type: Integer +- Default Value: 0 +- Value Range: [0,100] +- Setting Category: user +- Description: Sets the source and weight of the skew factor. `0` means disabling skew derivation based on sample statistics, `1`–`100` means enabling and calculating skew ratio based on samples; the actual skew used for cost estimation is the product of this parameter and the skew ratio. + +## optimizer_sort_factor + +- Variable Type: Real +- Default Value: 1 +- Value Range: [0,1.79769e+308] +- Setting Category: user +- Description: Sets the cost factor for sort operations in the optimizer; `1.0` means the same as default cost, greater than 1 means higher sort cost, less than 1 means lower cost. + +## optimizer_trace_fallback + +- Variable Type: Bool +- Default Value: off +- Setting Category: user +- Description: Prints information at the `INFO` log level when GPORCA falls back to using the traditional optimizer. + +## optimizer_use_gpdb_allocators + +- Variable Type: Bool +- Default Value: on +- Setting Category: postmaster +- Description: Allows the GPORCA optimizer to use the database's memory context management mechanism (Memory Contexts). + +## optimizer_xform_bind_threshold + +- Variable Type: Integer +- Default Value: 0 +- Value Range: [0,2147483647] +- Setting Category: user +- Description: Limits the maximum number of bindings that can be generated for each transformation rule (xform) on each group expression. Setting to `0` means this limit is not enabled. + +## superuser_reserved_connections + +- Value Range: Integer less than `max_connections` +- Default Value: 10 +- Setting Category: local, system, restart +- Description: Specifies the number of connection slots reserved for Apache Cloudberry database superusers. + +## track_io_timing + +- Parameter Type: Boolean +- Default Value: off +- Setting Category: superuser +- Description: Used to collect timing statistics for database I/O activities. When enabled, the system records I/O operation durations during statement execution, which is useful for performance analysis and bottleneck identification. This parameter is off by default and can only be set by superusers. + +## wal_compression + +- Variable Type: Bool +- Default Value: on +- Setting Category: superuser +- Description: Enables compression of full page writes in WAL files. + +## wal_keep_size + +- Parameter Type: integer +- Value Range: 0 - 2147483647 (in MB) +- Default Value: 320 +- Setting Category: sighup +- Description: Specifies the maximum size of WAL files to retain for standby servers. + +## work_mem + +- Value Range: Integer in kilobytes +- Default Value: 32MB +- Setting Category: coordinator, session, reload +- Description: Specifies the maximum amount of memory that can be used by each query operation (such as sort or hash table) before writing to temporary disk files. If no unit is specified for this parameter, it defaults to kilobytes. The default value is 32MB. + + In complex queries, multiple sort or hash operations may be executed in parallel, each of which can use the amount of memory specified by `work_mem` before writing to temporary files. Additionally, multiple sessions may perform these operations simultaneously, so the total memory consumption may be much higher than the value of `work_mem` itself. This should be taken into special consideration when choosing the value for this parameter. + + `work_mem` affects these operations: sort operations for `ORDER BY`, `DISTINCT`, and merge joins; hash tables for hash joins, hash aggregates, and processing `IN` subqueries; bitmap index scans; and tuple store-based operations such as function scans, CTEs, PL/pgSQL, and management UDFs. + + In addition to allocating memory for specific execution operators, `work_mem` also affects the PostgreSQL optimizer's preference for certain query plans. Note that `work_mem` is independent of the resource queue and resource group memory management mechanisms. It takes effect at the query level and is not affected by resource queue or resource group memory limits. + +## writable_external_table_bufsize + +- Variable Type: Integer +- Default Value: 1024 +- Unit: kB +- Value Range: [32,131072] +- Setting Category: user +- Description: Sets the buffer size (in kB) for writing to gpfdist before writing to writable external tables. diff --git a/versioned_docs/version-2.x/sys-catalogs/sys-views/pg-stat-progress-create-index.md b/versioned_docs/version-2.x/sys-catalogs/sys-views/pg-stat-progress-create-index.md index dc8b3e40a..89064f873 100644 --- a/versioned_docs/version-2.x/sys-catalogs/sys-views/pg-stat-progress-create-index.md +++ b/versioned_docs/version-2.x/sys-catalogs/sys-views/pg-stat-progress-create-index.md @@ -2,26 +2,65 @@ title: pg_stat_progress_create_index --- -# pg_stat_progress_create_index - -The `pg_stat_progress_create_index` view provides real-time information about the progress of `CREATE INDEX` and `REINDEX` operations. Each row represents a backend process currently building an index and includes details such as the command type, current phase, number of blocks and tuples processed, and partitioning information. This view is useful for monitoring and managing long-running index creation tasks. - -|Column|Type|Description| -|------|----|-----------| -|`gp_segment_id`|integer| Unique identifier of a segment (or coordinator) instance. (This column is not present in the `gp_stat_progress_create_index_summary` view.)| -| `pid` | integer | Process identifier of the backend, or the coordinator process identifier if the `gp_stat_progress_create_index_summary` view. | -| `datid` | oid | The object identifer of the database to which this backend is connected. | -| `datname` | name | Name of the database to which this backend is connected. | -| `relid` | oid | The object identifer of the table on which the index is being created. | -| `index_relid` | oid | The object identifer of the index being created or reindexed. Because Apache Cloudberry does not support concurrent (re)indexing, this value is always `0`. | -| `command` | text | The name of the command that is running: `CREATE INDEX` or `REINDEX`. | -| `phase` | text | Current processing phase of index creation. | -| `lockers_total` | bigint | Total number of lockers to wait for, when applicable. | -| `lockers_done` | bigint | Number of lockers already waited for. | -| `current_locker_pid` | bigint | The process identifier of the locker currently being waited for. | -| `blocks_total` | bigint | Total number of blocks to be processed in the current phase. | -| `blocks_done` | bigint | Number of blocks already processed in the current phase. | -| `tuples_total` | bigint | Total number of tuples to be processed in the current phase. | -| `tuples_done` | bigint | Number of tuples already processed in the current phase. | -| `partitions_total` | bigint | When creating an index on a partitioned table, this column is set to the total number of partitions on which the index is to be created. This field is 0 during a `REINDEX`. | -| `partitions_done` | bigint | When creating an index on a partitioned table, this column is set to the number of partitions on which the index has been completed. This field is 0 during a `REINDEX`. | +## pg_stat_progress_create_index + +`pg_stat_progress_create_index` is a system view that shows real-time progress for ongoing `CREATE INDEX` or `REINDEX` operations in the current database. + +This view displays detailed status information for each backend process building an index, including the current execution phase, the number of tuples processed, blocks scanned, and more. Once the operation completes, the corresponding entries are automatically removed from the view. + +In Apache Cloudberry, this view also supports AO (Append-Optimized) tables and can be used to observe the phase-wise progress of index creation on such tables. + +Typical use cases include: + +- Monitoring index creation or rebuild operations in real time. +- Analyzing performance bottlenecks of long-running `CREATE INDEX` or `REINDEX` commands. +- Checking if any index operations are currently consuming system resources. +- Correlating with `pg_stat_activity` to trace backend process details. + +Example queries: + +```sql +-- Views all ongoing index creation tasks +SELECT * FROM pg_stat_progress_create_index; + +-- Views index progress for a specific table +SELECT * FROM pg_stat_progress_create_index +WHERE relid = 'ao_test'::regclass; +``` + +### Field descriptions + +| Field | Description | +|-----------------------|-------------| +| `gp_segment_id` | ID of the segment where this entry resides. Only applicable in a distributed environment. | +| `pid` | Process ID of the backend. Can be joined with `pg_stat_activity` for session details. | +| `datid` | OID of the database, corresponding to `pg_database.oid`. | +| `datname` | Name of the database. | +| `relid` | OID of the table being indexed, corresponding to `pg_class.oid`. | +| `index_relid` | OID of the index being built. | +| `command` | Command being executed: `CREATE INDEX` or `REINDEX`. | +| `phase` | Current phase of execution, for example: <br /> `initializing`: Initialization phase <br /> `scanning heap`: Scanning table data <br /> `sorting`: Sorting phase <br /> `building index: loading tuples in tree`: Building index structure <br /> `waiting for locks`: Waiting for table or metadata locks | +| `lockers_total` | Total number of sessions holding conflicting locks (if any). | +| `lockers_done` | Number of sessions that have released their locks. | +| `current_locker_pid` | Process ID of the session currently holding the lock (if waiting). | +| `blocks_total` | Total number of data blocks to scan (may be 0 if unknown or not started). | +| `blocks_done` | Number of data blocks already scanned. | +| `tuples_total` | Estimated total number of tuples to process (if available). | +| `tuples_done` | Number of tuples already processed. | +| `partitions_total` | Total number of partitions (if applicable). | +| `partitions_done` | Number of partitions already processed (if applicable). | + +:::note +- This view only shows currently running index operations. Entries disappear once the operation completes. +- For small tables, index creation may complete instantly, and the view might return no rows. +- To better observe progress, try creating an index on a large table or use complex columns (for example, large text) to slow down execution. +- Progress is also reported when building indexes on AO (Append-Optimized) tables. +- You can join this view with ``pg_stat_activity`` using the ``pid`` field. + + ```sql + SELECT a.usename, a.query, p.phase, p.blocks_done, p.blocks_total + FROM pg_stat_activity a + JOIN pg_stat_progress_create_index p ON a.pid = p.pid; + ``` + +::: diff --git a/versioned_sidebars/version-2.x-sidebars.json b/versioned_sidebars/version-2.x-sidebars.json index 7dc6c8724..94e466a70 100644 --- a/versioned_sidebars/version-2.x-sidebars.json +++ b/versioned_sidebars/version-2.x-sidebars.json @@ -363,6 +363,7 @@ ] }, "data-types", + "config-params-guc-list", { "type": "category", "label": "Built-in Functions", @@ -486,7 +487,6 @@ "sys-catalogs/sys-tables/pg-user-mapping" ] }, - { "type": "category", "label": "System Views", --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cloudberry.apache.org For additional commands, e-mail: commits-h...@cloudberry.apache.org