tuhaihe commented on code in PR #277: URL: https://github.com/apache/cloudberry-site/pull/277#discussion_r2123384404
########## docs/operate-with-data/manage-spill-files.md: ########## Review Comment: Seems this file is not listed in the `sidebar.ts` file. ########## docs/performance/optimize-queries/use-orca/gporca-usage-considerations.md: ########## Review Comment: This file should be in English. ########## docs/performance/optimize-queries/analyze-query-performance.md: ########## @@ -0,0 +1,196 @@ +--- +title: Analyze Query Performance +--- + +# Analyze Query Performance + +Examine the query plans of poorly performing queries to identify possible performance tuning opportunities. + +Apache Cloudberry devises a query plan for each query. Choosing the right query plan to match the query and data structure is necessary for good performance. A query plan defines how Apache Cloudberry will run the query in the parallel execution environment. + +The query optimizer uses data statistics maintained by the database to choose a query plan with the lowest possible cost. Cost is measured in disk I/O, shown as units of disk page fetches. The goal is to minimize the total execution cost for the plan. + +View the plan for a given query with the `EXPLAIN` command. `EXPLAIN` shows the query optimizer's estimated cost for the query plan. For example: + +```sql +EXPLAIN SELECT * FROM names WHERE id=22; +``` + +`EXPLAIN ANALYZE` runs the statement in addition to displaying its plan. This is useful for determining how close the optimizer's estimates are to reality. For example: + +```sql +EXPLAIN ANALYZE SELECT * FROM names WHERE id=22; +``` + +:::note +In Apache Cloudberry, the default GPORCA optimizer co-exists with the Postgres-based planner. The EXPLAIN output generated by GPORCA is different than the output generated by the Postgres-based planner. + +By default, Apache Cloudberry uses GPORCA to generate an execution plan for a query when possible. + +When the `EXPLAIN ANALYZE` command uses GPORCA, the `EXPLAIN` plan shows only the number of partitions that are being eliminated. The scanned partitions are not shown. To show name of the scanned partitions in the segment logs set the server configuration parameter `gp_log_dynamic_partition_pruning` to `on`. This example `SET` command enables the parameter. + +```sql +SET gp_log_dynamic_partition_pruning = on; +``` + +For information about GPORCA, see [Querying Data](query.html). Review Comment: The `Querying Data` link cannot work. ########## docs/operate-with-data/sql-queries/value-expressions.md: ########## @@ -0,0 +1,270 @@ +--- +title: Value Expressions +--- + +# Value Expressions + +Value expressions are the core components that allow you to calculate, transform, and reference data within a query. When defining queries, understanding these expressions is key to writing flexible and effective SQL. + +This document explains types of expressions such as column references, positional parameters, array and row constructors, function calls, and type casts. Each type plays a specific role in how data is processed and represented in a query result. + +## Column references + +A column reference has the form: + +```sql +<correlation>.<columnname> +``` + +Here, `<correlation>` is the name of a table (possibly qualified with a schema name) or an alias for a table defined with a `FROM` clause or one of the keywords `NEW` or `OLD`. `NEW` and `OLD` can appear only in rewrite rules, but you can use other correlation names in any SQL statement. If the column name is unique across all tables in the query, you can omit the `<correlation>` part of the column reference. + +## Positional parameters + +Positional parameters are arguments to SQL statements or functions that you reference by their positions in a series of arguments. For example, `$1` refers to the first argument, `$2` to the second argument, and so on. The values of positional parameters are set from arguments external to the SQL statement or supplied when SQL functions are invoked. Some client libraries support specifying data values separately from the SQL command, in which case parameters refer to the out-of-line data values. A parameter reference has the form: + +```sql +$number +``` + +For example: + +```sql +CREATE FUNCTION dept(text) RETURNS dept + AS $$ SELECT * FROM dept WHERE name = $1 $$ + LANGUAGE SQL; +``` + +Here, the `$1` references the value of the first function argument whenever the function is invoked. + +## Subscripts + +If an expression yields a value of an array type, you can extract a specific element of the array value as follows: + +```sql +<expression>[<subscript>] + +``` + +You can extract multiple adjacent elements, called an array slice, as follows (including the brackets): + +```sql +<expression>[<lower_subscript>:<upper_subscript>] + +``` + +Each subscript is an expression and yields an integer value. + +Array expressions usually must be in parentheses, but you can omit the parentheses when the expression to be subscripted is a column reference or positional parameter. You can concatenate multiple subscripts when the original array is multidimensional. For example (including the parentheses): + +```sql +mytable.arraycolumn[4] +``` + +```sql +mytable.two_d_column[17][34] +``` + +```sql +$1[10:42] +``` + +```sql +(arrayfunction(a,b))[42] +``` + +## Field selection + +If an expression yields a value of a composite type (row type), you can extract a specific field of the row as follows: + +```sql +<expression>.<fieldname> +``` + +The row expression usually must be in parentheses, but you can omit these parentheses when the expression to be selected from is a table reference or positional parameter. For example: + +```sql +mytable.mycolumn +``` + +```sql +$1.somecolumn +``` + +```sql +(rowfunction(a,b)).col3 +``` + +A qualified column reference is a special case of field selection syntax. + +## Operator invocations + +Operator invocations have the following possible syntaxes: + +```sql +<expression operator expression>(binary infix operator) +``` + +```sql +<operator expression>(unary prefix operator) +``` + +```sql +<expression operator>(unary postfix operator) +``` + +Where `operator` is an operator token, one of the key words `AND`, `OR`, or `NOT`, or qualified operator name in the form: + +```sql +OPERATOR(<schema>.<operatorname>) +``` + +Available operators and whether they are unary or binary depends on the operators that the system or user defines. For more information about built-in operators, see [Built-in Functions and Operators](functions-operators.html). + +## Function calls + +The syntax for a function call is the name of a function (possibly qualified with a schema name), followed by its argument list enclosed in parentheses: + +```sql +function ([expression [, expression ... ]]) +``` + +For example, the following function call computes the square root of 2: + +```sql +sqrt(2) +``` + +## Type casts + +A type cast specifies a conversion from one data type to another. A cast applied to a value expression of a known type is a run-time type conversion. The cast succeeds only if a suitable type conversion is defined. This differs from the use of casts with constants. A cast applied to a string literal represents the initial assignment of a type to a literal constant value, so it succeeds for any type if the contents of the string literal are acceptable input syntax for the data type. + +Apache Cloudberry supports three types of casts applied to a value expression: + +- *Explicit cast* - Apache Cloudberry applies a cast when you explicitly specify a cast between two data types. Apache Cloudberry accepts two equivalent syntaxes for explicit type casts: + + ```sql + CAST ( expression AS type ) + expression::type + ``` + + The `CAST` syntax conforms to SQL; the syntax using `::` is historical PostgreSQL usage. + +- *Assignment cast* - Apache Cloudberry implicitly invokes a cast in assignment contexts, when assigning a value to a column of the target data type. For example, a [`CREATE CAST`](../../sql-stmt/create-cast.md) command with the `AS ASSIGNMENT` clause creates a cast that is applied implicitly in the assignment context. This example assignment cast assumes that `tbl1.f1` is a column of type `text`. The `INSERT` command is allowed because the value is implicitly cast from the `integer` to `text` type. Review Comment: Need to fix this link too. ########## docs/operate-with-data/sql-queries/cte-queries.md: ########## @@ -0,0 +1,290 @@ +--- +title: Common Table Expressions +--- + +# WITH Queries (Common Table Expressions) + +The `WITH` clause provides a way to write auxiliary statements for use in a larger query. These statements, which are often referred to as Common Table Expressions or CTEs, can be thought of as defining temporary tables that exist just for one query. + +:::info +Limitations when using a `WITH` clause in Apache Cloudberry include: + +- For a `SELECT` command that includes a `WITH` clause, the clause can contain at most a single clause that modifies table data (`INSERT`, `UPDATE`, or `DELETE` command). +- For a data-modifying command (`INSERT`, `UPDATE`, or `DELETE`) that includes a `WITH` clause, the clause can only contain a `SELECT` command; it cannot contain a data-modifying command. +::: + +By default, Apache Cloudberry enables the `RECURSIVE` keyword for the `WITH` clause. `RECURSIVE` can be deactivated by setting the server configuration parameter `gp_recursive_cte` to `false`. + +## SELECT in a WITH clause + +One use of CTEs is to break down complicated queries into simpler parts. The examples in this section show the `WITH` clause being used with a `SELECT` command. The example `WITH` clauses can be used the same manner with `INSERT`, `UPDATE`, or `DELETE`. + +A `SELECT` command in the `WITH` clause is evaluated only once per execution of the parent query, even if it is referred to more than once by the parent query or sibling `WITH` clauses. Thus, expensive calculations that are needed in multiple places can be placed within a `WITH` clause to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a `WITH` query than an ordinary sub-query. The `WITH` query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards. However, evaluation might stop early if the references to the query demand only a limited number of rows. + +The following example query displays per-product sales totals in only the top sales regions: + +```sql +WITH regional_sales AS ( + SELECT region, SUM(amount) AS total_sales + FROM orders + GROUP BY region + ), top_regions AS ( + SELECT region + FROM regional_sales + WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) + ) +SELECT region, + product, + SUM(quantity) AS product_units, + SUM(amount) AS product_sales +FROM orders +WHERE region IN (SELECT region FROM top_regions) +GROUP BY region, product; +``` + +The `WITH` clause defines two auxiliary statements named `regional_sales` and `top_regions`, where the output of `regional_sales` is used in `top_regions` and the output of `top_regions` is used in the primary `SELECT` query. The query could have been written without the `WITH` clause, but would have required two levels of nested sub-`SELECT`s. + +When you specify the optional `RECURSIVE` keyword, the `WITH` clause can accomplish operations not otherwise possible in standard SQL. Using `RECURSIVE`, a `WITH` query can refer to its own output. This simple example computes the sum of integers from 1 through 100: + +```sql +WITH RECURSIVE t(n) AS ( + VALUES (1) + UNION ALL + SELECT n+1 FROM t WHERE n < 100 +) +SELECT sum(n) FROM t; +``` + +The general form of a recursive `WITH` query always follows the pattern of: a *non-recursive term*, followed by a `UNION` (or `UNION ALL`), and then a *recursive term*, where only the *recursive term* can contain a reference to the query output. + +```sql +<non_recursive_term> UNION [ ALL ] <recursive_term> +``` + +A recursive `WITH` query that contains a `UNION [ ALL ]` is evaluated as follows: + +- Evaluate the non-recursive term. For `UNION` (but not `UNION ALL`), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary *working table*. +- As long as the working table is not empty, repeat these steps: + 1. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For `UNION` (but not `UNION ALL`), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary *intermediate table*. + 2. Replace the contents of the *working table* with the contents of the *intermediate table*, then empty the *intermediate table*. + + :::note + While `RECURSIVE` allows queries to be specified recursively, Apache Cloudberry evaluates such queries iteratively internally. + ::: + +In the example above, the working table has just a single row in each step, and it takes on the values from 1 through 100 in successive steps. In the 100th step, there is no output because of the `WHERE` clause, and so the query terminates. + +Recursive `WITH` queries are typically used to deal with hierarchical or tree-structured data. For example, this query locates all of the direct and indirect sub-parts of a product, given only a table that shows immediate inclusions: + +```sql +WITH RECURSIVE included_parts(sub_part, part, quantity) AS ( + SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product' + UNION ALL + SELECT p.sub_part, p.part, p.quantity * pr.quantity + FROM included_parts pr, parts p + WHERE p.part = pr.sub_part +) +SELECT sub_part, SUM(quantity) as total_quantity +FROM included_parts +GROUP BY sub_part; +``` + +When working with recursive `WITH` queries, you must ensure that the recursive part of the query eventually returns no tuples, or else the query loops indefinitely. + +For some queries, using `UNION` instead of `UNION ALL` can ensure that the recursive part of the query eventually returns no tuples by discarding rows that duplicate previous output rows. However, often a cycle does not involve output rows that are complete duplicates: it might be sufficient to check just one or a few fields to see if the same point has been reached before. The standard method for handling such situations is to compute an array of the visited values. For example, consider the following query that searches a table `graph` using a `link` field: + +```sql +WITH RECURSIVE search_graph(id, link, data, depth) AS ( + SELECT g.id, g.link, g.data, 1 + FROM graph g + UNION ALL + SELECT g.id, g.link, g.data, sg.depth + 1 + FROM graph g, search_graph sg + WHERE g.id = sg.link +) +SELECT * FROM search_graph; +``` + +This query loops if the `link` relationships contain cycles. Because the query requires a `depth` output, changing `UNION ALL` to `UNION` does not eliminate the looping. Instead the query needs to recognize whether it has reached the same row again while following a particular path of links. This modified query adds two columns, `path` and `cycle`, to the loop-prone query: + +```sql +WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS ( + SELECT g.id, g.link, g.data, 1, + ARRAY[g.id], + false + FROM graph g + UNION ALL + SELECT g.id, g.link, g.data, sg.depth + 1, + path || g.id, + g.id = ANY(path) + FROM graph g, search_graph sg + WHERE g.id = sg.link AND NOT cycle +) +SELECT * FROM search_graph; +``` + +Aside from detecting cycles, the array value is useful in its own right because it represents the "path" taken to reach any particular row. + +In the general case where more than one field needs to be checked to recognize a cycle, use an array of rows. For example, if we needed to compare fields `f1` and `f2`: + +```sql +WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS ( + SELECT g.id, g.link, g.data, 1, + ARRAY[ROW(g.f1, g.f2)], + false + FROM graph g + UNION ALL + SELECT g.id, g.link, g.data, sg.depth + 1, + path || ROW(g.f1, g.f2), + ROW(g.f1, g.f2) = ANY(path) + FROM graph g, search_graph sg + WHERE g.id = sg.link AND NOT cycle +) +SELECT * FROM search_graph; +``` + +**Tip:** Omit the `ROW()` syntax in the case where only one field must be checked to recognize a cycle. This uses a simple array rather than a composite-type array, gaining efficiency. + +**Tip:** The recursive query evaluation algorithm produces its output in breadth-first search order. You can display the results in depth-first search order by making the outer query `ORDER BY` a "path" column constructed in this way. + +A helpful technique for testing a query when you are not certain if it might loop indefinitely is to place a `LIMIT` in the parent query. For example, the following query would loop forever without the `LIMIT` clause: + +```sql +WITH RECURSIVE t(n) AS ( + SELECT 1 + UNION ALL + SELECT n+1 FROM t +) +SELECT n FROM t LIMIT 100; +``` + +This technique works because Apache Cloudberry evaluates only as many rows of a `WITH` query as are actually fetched by the parent query. *Using this technique in production environments is not recommended*, because other systems might work differently. Also, the technique might not work if the outer query sorts the recursive `WITH` results or joins the results to another table, because in such cases the outer query will usually try to fetch all of the `WITH` query's output anyway. + +A useful property of `WITH` queries is that they are normally evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling `WITH` queries. Thus, expensive calculations that are needed in multiple places can be placed within a `WITH` query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is not able to push restrictions from the parent query down into a multiply-referenced `WITH` query, because that might affect all uses of the `WITH` query's output when it should affect only one. Apache Cloudberry evalues the multiply-referenced `WITH` query as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.) + +If a `WITH` query is non-recursive and side-effect-free (that is, it is a `SELECT` containing no volatile functions) then it can be folded into the parent query, allowing joint optimization of the two query levels. By default, this happens if the parent query references the `WITH` query just once, but not if it references the `WITH` query more than once. You can override that decision by specifying `MATERIALIZED` to force separate calculation of the `WITH` query, or by specifying `NOT MATERIALIZED` to force it to be merged into the parent query. The latter choice risks duplicate computation of the `WITH` query, but it can still give a net savings if each usage of the `WITH` query needs only a small part of the `WITH` query's full output. + +A simple example of these rules follows: + +```sql +WITH w AS ( + SELECT * FROM big_table +) +SELECT * FROM w WHERE key = 123; +``` + +This `WITH` query will be folded, producing the same execution plan as: + +```sql +SELECT * FROM big_table WHERE key = 123; +``` + +In particular, if there's an index on `key`, Apache Cloudberry uses it to fetch just the rows having `key = 123`. On the other hand, in: + +```sql +WITH w AS ( + SELECT * FROM big_table +) +SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref +WHERE w2.key = 123; +``` + +the `WITH` query will be materialized, producing a temporary copy of `big_table` that is then joined with itself — without benefit of any index. This query will run much more efficiently if written as: + +```sql +WITH w AS NOT MATERIALIZED ( + SELECT * FROM big_table +) +SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref +WHERE w2.key = 123; +``` + +so that the parent query's restrictions can be applied directly to scans of `big_table`. + +An example where `NOT MATERIALIZED` could be undesirable is: + +```sql +WITH w AS ( + SELECT key, very_expensive_function(val) as f FROM some_table +) +SELECT * FROM w AS w1 JOIN w AS w2 ON w1.f = w2.f; +``` + +Here, materialization of the `WITH` query ensures that Apache Cloudberry evaluations `very_expensive_function` only once per table row, not twice. + +## Data-modifying statements in a WITH clause + +For a `SELECT` command, you can use the data-modifying commands `INSERT`, `UPDATE`, or `DELETE` in a `WITH` clause. This allows you to perform several different operations in the same query. + +A data-modifying statement in a `WITH` clause is run exactly once, and always to completion, independently of whether the primary query reads all (or indeed any) of the output. This is different from the rule when using `SELECT` in a `WITH` clause, the execution of a `SELECT` continues only as long as the primary query demands its output. + +This simple CTE query deletes rows from `products`. The `DELETE` in the `WITH` clause deletes the specified rows from products, returning their contents by means of its `RETURNING` clause. + +```sql +WITH deleted_rows AS ( + DELETE FROM products + WHERE + "date" >= '2010-10-01' AND + "date" < '2010-11-01' + RETURNING * +) +SELECT * FROM deleted_rows; +``` + +Data-modifying statements in a `WITH` clause must have `RETURNING` clauses, as shown in the previous example. It is the output of the `RETURNING` clause, *not* the target table of the data-modifying statement, that forms the temporary table that can be referred to by the rest of the query. If a data-modifying statement in a `WITH` lacks a `RETURNING` clause, then it forms no temporary table and cannot be referred to in the rest of the query. Apache Cloudberry runs such a statement nonetheless. + +If the optional `RECURSIVE` keyword is enabled, recursive self-references in data-modifying statements are not allowed. In some cases it is possible to work around this limitation by referring to the output of a recursive `WITH`. For example, this query would remove all direct and indirect subparts of a product: + +```sql +WITH RECURSIVE included_parts(sub_part, part) AS ( + SELECT sub_part, part FROM parts WHERE part = 'our_product' + UNION ALL + SELECT p.sub_part, p.part + FROM included_parts pr, parts p + WHERE p.part = pr.sub_part + ) +DELETE FROM parts + WHERE part IN (SELECT part FROM included_parts); +``` + +The sub-statements in a `WITH` clause are run concurrently with each other and with the main query. Therefore, when using a data-modifying statement in a `WITH`, the order in which the specified updates actually happen is unpredictable. All of the statements are run wth the same *snapshot*. The effects of the statement are not visible on the target tables. This alleviates the effects of the unpredictability of the actual order of row updates, and means thats the `RETURNING` data is the only way to communicate changes between different `WITH` sub-statements and the main query. + +In this example, the outer `SELECT` returns the original prices before the action of the `UPDATE` in the `WITH` clause: + +```sql +WITH t AS ( + UPDATE products SET price = price * 1.05 + RETURNING * +) +SELECT * FROM products; +``` + +In this example, the outer `SELECT` returns the updated data: + +```sql +WITH t AS ( + UPDATE products SET price = price * 1.05 + RETURNING * +) +SELECT * FROM t; +``` + +Updating the same row twice in a single statement is not supported. The effects of such a statement will not be predictable. Only one of the modifications takes place, but it is not easy (and sometimes not possible) to predict which modification occurs. + +Any table used as the target of a data-modifying statement in a `WITH` clause must not have a conditional rule, nor an `ALSO` rule, nor an `INSTEAD` rule that expands to multiple statements. + +## Considerations + +When constructing `WITH` queries, keep the following in mind: + +- `SELECT FOR UPDATE` cannot be inlined. +- Apache Cloudberry inlines multiply-referenced CTEs only when requested (by specifying `NOT MATERIALIZED`). +- Multiply-referenced CTEs cannot be inlined if they contain outer self-references. +- Apache Cloudberry does not inline when the CTE includes a volatile function. +- An `ORDER BY` in the subquery or CTE does not force an ordering for the whole query. +- Apache Cloudberry always materializes a CTE term in a query. Due to these reasons: + Review Comment: Can delete this blank line. ########## docs/operate-with-data/sql-queries/define-queries.md: ########## @@ -0,0 +1,21 @@ +--- +title: Define Queries +--- + +# Define Queries + +Apache Cloudberry is built on top of PostgreSQL and follows the SQL standard for data definition, manipulation, and analysis. To define queries effectively, you need to understand the rules that decide SQL syntax and semantics. + +This topic introduces the key concepts for building SQL queries in Apache Cloudberry, including SQL language structure, value expressions, aggregate and window functions, subqueries, and expression evaluation rules. Each of these areas plays a specific role in enabling accurate, efficient, and expressive queries. + +The subtopics linked from this page offer detailed explanations and examples to help you construct SQL queries that work seamlessly with the Cloudberry engine. + + + + + + + + + + Review Comment: Can delete these blank lines. ########## docs/operate-with-data/sql-queries/subqueries.md: ########## @@ -0,0 +1,86 @@ +--- +title: Subqueries +--- + +# Subqueries + +Subqueries let you embed one query inside another, enabling dynamic and conditional logic when defining queries. They are useful for filtering, calculating intermediate values, or expressing correlated relationships. + +This document distinguishes between scalar and correlated subqueries and provides guidance on when and how to use them effectively. + +A scalar subquery is a `SELECT` query in parentheses that returns exactly one row with one column. Do not use a `SELECT` query that returns multiple rows or columns as a scalar subquery. The query runs and uses the returned value in the surrounding value expression. A correlated scalar subquery contains references to the outer query block. + +## Correlated subqueries + +A correlated subquery (CSQ) is a `SELECT` query with a `WHERE` clause or target list that contains references to the parent outer clause. CSQs efficiently express results in terms of results of another query. Apache Cloudberry supports correlated subqueries that provide compatibility with many existing applications. A CSQ is a scalar or table subquery, depending on whether it returns one or multiple rows. Apache Cloudberry does not support correlated subqueries with skip-level correlations. + +## Correlated subquery examples + +### Example 1 – Scalar correlated subquery + +```sql +SELECT * FROM t1 WHERE t1.x + > (SELECT MAX(t2.x) FROM t2 WHERE t2.y = t1.y); +``` + +### Example 2 – Correlated EXISTS subquery + +```sql +SELECT * FROM t1 WHERE +EXISTS (SELECT 1 FROM t2 WHERE t2.x = t1.x); +``` + +Apache Cloudberry uses one of the following methods to run CSQs: + +- Unnest the CSQ into join operations – This method is most efficient, and it is how Apache Cloudberry runs most CSQs, including queries from the TPC-H benchmark. +- Run the CSQ on every row of the outer query – This method is relatively inefficient, and it is how Apache Cloudberry runs queries that contain CSQs in the `SELECT` list or are connected by `OR` conditions. + +The following examples illustrate how to rewrite some of these types of queries to improve performance. + +### Example 3 - CSQ in the select list + +*Original Query* + +```sql +SELECT T1.a, + (SELECT COUNT(DISTINCT T2.z) FROM t2 WHERE t1.x = t2.y) dt2 +FROM t1; +``` + +Rewrite this query to perform an inner join with `t1` first and then perform a left join with `t1` again. The rewrite applies for only an equijoin in the correlated condition. + +*Rewritten Query* + +```sql +SELECT t1.a, dt2 FROM t1 + LEFT JOIN + (SELECT t2.y AS csq_y, COUNT(DISTINCT t2.z) AS dt2 + FROM t1, t2 WHERE t1.x = t2.y + GROUP BY t1.x) + ON (t1.x = csq_y); +``` + +### Example 4 - CSQs connected by OR clauses + +*Original Query* + +```sql +SELECT * FROM t1 +WHERE +x > (SELECT COUNT(*) FROM t2 WHERE t1.x = t2.x) +OR x < (SELECT COUNT(*) FROM t3 WHERE t1.y = t3.y) +``` + +Rewrite this query to separate it into two parts with a union on the `OR` conditions. + +*Rewritten Query* + +```sql +SELECT * FROM t1 +WHERE x > (SELECT count(*) FROM t2 WHERE t1.x = t2.x) +UNION +SELECT * FROM t1 +WHERE x < (SELECT count(*) FROM t3 WHERE t1.y = t3.y) +``` + +To view the query plan, use `EXPLAIN SELECT` or `EXPLAIN ANALYZE SELECT`. Subplan nodes in the query plan indicate that the query will run on every row of the outer query, and the query is a candidate for rewriting. For more information about these statements, see [Query Profiling](query-profiling.html). Review Comment: The `Query Profiling` link doesn't work. ########## docs/performance/optimize-queries/analyze-query-performance.md: ########## @@ -0,0 +1,196 @@ +--- +title: Analyze Query Performance +--- + +# Analyze Query Performance + +Examine the query plans of poorly performing queries to identify possible performance tuning opportunities. + +Apache Cloudberry devises a query plan for each query. Choosing the right query plan to match the query and data structure is necessary for good performance. A query plan defines how Apache Cloudberry will run the query in the parallel execution environment. + +The query optimizer uses data statistics maintained by the database to choose a query plan with the lowest possible cost. Cost is measured in disk I/O, shown as units of disk page fetches. The goal is to minimize the total execution cost for the plan. + +View the plan for a given query with the `EXPLAIN` command. `EXPLAIN` shows the query optimizer's estimated cost for the query plan. For example: + +```sql +EXPLAIN SELECT * FROM names WHERE id=22; +``` + +`EXPLAIN ANALYZE` runs the statement in addition to displaying its plan. This is useful for determining how close the optimizer's estimates are to reality. For example: + +```sql +EXPLAIN ANALYZE SELECT * FROM names WHERE id=22; +``` + +:::note Review Comment: Will this block work for the rest texts? If not, need the `:::` to end it. ########## docs/operate-with-data/sql-queries/value-expressions.md: ########## @@ -0,0 +1,270 @@ +--- +title: Value Expressions +--- + +# Value Expressions + +Value expressions are the core components that allow you to calculate, transform, and reference data within a query. When defining queries, understanding these expressions is key to writing flexible and effective SQL. + +This document explains types of expressions such as column references, positional parameters, array and row constructors, function calls, and type casts. Each type plays a specific role in how data is processed and represented in a query result. + +## Column references + +A column reference has the form: + +```sql +<correlation>.<columnname> +``` + +Here, `<correlation>` is the name of a table (possibly qualified with a schema name) or an alias for a table defined with a `FROM` clause or one of the keywords `NEW` or `OLD`. `NEW` and `OLD` can appear only in rewrite rules, but you can use other correlation names in any SQL statement. If the column name is unique across all tables in the query, you can omit the `<correlation>` part of the column reference. + +## Positional parameters + +Positional parameters are arguments to SQL statements or functions that you reference by their positions in a series of arguments. For example, `$1` refers to the first argument, `$2` to the second argument, and so on. The values of positional parameters are set from arguments external to the SQL statement or supplied when SQL functions are invoked. Some client libraries support specifying data values separately from the SQL command, in which case parameters refer to the out-of-line data values. A parameter reference has the form: + +```sql +$number +``` + +For example: + +```sql +CREATE FUNCTION dept(text) RETURNS dept + AS $$ SELECT * FROM dept WHERE name = $1 $$ + LANGUAGE SQL; +``` + +Here, the `$1` references the value of the first function argument whenever the function is invoked. + +## Subscripts + +If an expression yields a value of an array type, you can extract a specific element of the array value as follows: + +```sql +<expression>[<subscript>] + +``` + +You can extract multiple adjacent elements, called an array slice, as follows (including the brackets): + +```sql +<expression>[<lower_subscript>:<upper_subscript>] + +``` + +Each subscript is an expression and yields an integer value. + +Array expressions usually must be in parentheses, but you can omit the parentheses when the expression to be subscripted is a column reference or positional parameter. You can concatenate multiple subscripts when the original array is multidimensional. For example (including the parentheses): + +```sql +mytable.arraycolumn[4] +``` + +```sql +mytable.two_d_column[17][34] +``` + +```sql +$1[10:42] +``` + +```sql +(arrayfunction(a,b))[42] +``` + +## Field selection + +If an expression yields a value of a composite type (row type), you can extract a specific field of the row as follows: + +```sql +<expression>.<fieldname> +``` + +The row expression usually must be in parentheses, but you can omit these parentheses when the expression to be selected from is a table reference or positional parameter. For example: + +```sql +mytable.mycolumn +``` + +```sql +$1.somecolumn +``` + +```sql +(rowfunction(a,b)).col3 +``` + +A qualified column reference is a special case of field selection syntax. + +## Operator invocations + +Operator invocations have the following possible syntaxes: + +```sql +<expression operator expression>(binary infix operator) +``` + +```sql +<operator expression>(unary prefix operator) +``` + +```sql +<expression operator>(unary postfix operator) +``` + +Where `operator` is an operator token, one of the key words `AND`, `OR`, or `NOT`, or qualified operator name in the form: + +```sql +OPERATOR(<schema>.<operatorname>) +``` + +Available operators and whether they are unary or binary depends on the operators that the system or user defines. For more information about built-in operators, see [Built-in Functions and Operators](functions-operators.html). Review Comment: Need to fix this link too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
