Re: [PR] docs: add db objects and query docs [cloudberry-site]

via GitHub Tue, 03 Jun 2025 04:04:22 -0700


tuhaihe commented on code in PR #277:
URL: https://github.com/apache/cloudberry-site/pull/277#discussion_r2123384404



##########
docs/operate-with-data/manage-spill-files.md:
##########


Review Comment:
   Seems this file is not listed in the `sidebar.ts` file.



##########
docs/performance/optimize-queries/use-orca/gporca-usage-considerations.md:
##########


Review Comment:
   This file should be in English.



##########
docs/performance/optimize-queries/analyze-query-performance.md:
##########
@@ -0,0 +1,196 @@
+---
+title: Analyze Query Performance
+---
+
+# Analyze Query Performance
+
+Examine the query plans of poorly performing queries to identify possible 
performance tuning opportunities.
+
+Apache Cloudberry devises a query plan for each query. Choosing the right 
query plan to match the query and data structure is necessary for good 
performance. A query plan defines how Apache Cloudberry will run the query in 
the parallel execution environment.
+
+The query optimizer uses data statistics maintained by the database to choose 
a query plan with the lowest possible cost. Cost is measured in disk I/O, shown 
as units of disk page fetches. The goal is to minimize the total execution cost 
for the plan.
+
+View the plan for a given query with the `EXPLAIN` command. `EXPLAIN` shows 
the query optimizer's estimated cost for the query plan. For example:
+
+```sql
+EXPLAIN SELECT * FROM names WHERE id=22;
+```
+
+`EXPLAIN ANALYZE` runs the statement in addition to displaying its plan. This 
is useful for determining how close the optimizer's estimates are to reality. 
For example:
+
+```sql
+EXPLAIN ANALYZE SELECT * FROM names WHERE id=22;
+```
+
+:::note
+In Apache Cloudberry, the default GPORCA optimizer co-exists with the 
Postgres-based planner. The EXPLAIN output generated by GPORCA is different 
than the output generated by the Postgres-based planner.
+
+By default, Apache Cloudberry uses GPORCA to generate an execution plan for a 
query when possible.
+
+When the `EXPLAIN ANALYZE` command uses GPORCA, the `EXPLAIN` plan shows only 
the number of partitions that are being eliminated. The scanned partitions are 
not shown. To show name of the scanned partitions in the segment logs set the 
server configuration parameter `gp_log_dynamic_partition_pruning` to `on`. This 
example `SET` command enables the parameter.
+
+```sql
+SET gp_log_dynamic_partition_pruning = on;
+```
+
+For information about GPORCA, see [Querying Data](query.html).

Review Comment:
   The `Querying Data` link cannot work.



##########
docs/operate-with-data/sql-queries/value-expressions.md:
##########
@@ -0,0 +1,270 @@
+---
+title: Value Expressions
+---
+
+# Value Expressions
+
+Value expressions are the core components that allow you to calculate, 
transform, and reference data within a query. When defining queries, 
understanding these expressions is key to writing flexible and effective SQL.
+
+This document explains  types of expressions such as column references, 
positional parameters, array and row constructors, function calls, and type 
casts. Each type plays a specific role in how data is processed and represented 
in a query result.
+
+## Column references
+
+A column reference has the form:
+
+```sql
+<correlation>.<columnname>
+```
+
+Here, `<correlation>` is the name of a table (possibly qualified with a schema 
name) or an alias for a table defined with a `FROM` clause or one of the 
keywords `NEW` or `OLD`. `NEW` and `OLD` can appear only in rewrite rules, but 
you can use other correlation names in any SQL statement. If the column name is 
unique across all tables in the query, you can omit the `<correlation>` part of 
the column reference.
+
+## Positional parameters
+
+Positional parameters are arguments to SQL statements or functions that you 
reference by their positions in a series of arguments. For example, `$1` refers 
to the first argument, `$2` to the second argument, and so on. The values of 
positional parameters are set from arguments external to the SQL statement or 
supplied when SQL functions are invoked. Some client libraries support 
specifying data values separately from the SQL command, in which case 
parameters refer to the out-of-line data values. A parameter reference has the 
form:
+
+```sql
+$number
+```
+
+For example:
+
+```sql
+CREATE FUNCTION dept(text) RETURNS dept
+    AS $$ SELECT * FROM dept WHERE name = $1 $$
+    LANGUAGE SQL;
+```
+
+Here, the `$1` references the value of the first function argument whenever 
the function is invoked.
+
+## Subscripts
+
+If an expression yields a value of an array type, you can extract a specific 
element of the array value as follows:
+
+```sql
+<expression>[<subscript>]
+
+```
+
+You can extract multiple adjacent elements, called an array slice, as follows 
(including the brackets):
+
+```sql
+<expression>[<lower_subscript>:<upper_subscript>]
+
+```
+
+Each subscript is an expression and yields an integer value.
+
+Array expressions usually must be in parentheses, but you can omit the 
parentheses when the expression to be subscripted is a column reference or 
positional parameter. You can concatenate multiple subscripts when the original 
array is multidimensional. For example (including the parentheses):
+
+```sql
+mytable.arraycolumn[4]
+```
+
+```sql
+mytable.two_d_column[17][34]
+```
+
+```sql
+$1[10:42]
+```
+
+```sql
+(arrayfunction(a,b))[42]
+```
+
+## Field selection
+
+If an expression yields a value of a composite type (row type), you can 
extract a specific field of the row as follows:
+
+```sql
+<expression>.<fieldname>
+```
+
+The row expression usually must be in parentheses, but you can omit these 
parentheses when the expression to be selected from is a table reference or 
positional parameter. For example:
+
+```sql
+mytable.mycolumn
+```
+
+```sql
+$1.somecolumn
+```
+
+```sql
+(rowfunction(a,b)).col3
+```
+
+A qualified column reference is a special case of field selection syntax.
+
+## Operator invocations
+
+Operator invocations have the following possible syntaxes:
+
+```sql
+<expression operator expression>(binary infix operator)
+```
+
+```sql
+<operator expression>(unary prefix operator)
+```
+
+```sql
+<expression operator>(unary postfix operator)
+```
+
+Where `operator` is an operator token, one of the key words `AND`, `OR`, or 
`NOT`, or qualified operator name in the form:
+
+```sql
+OPERATOR(<schema>.<operatorname>)
+```
+
+Available operators and whether they are unary or binary depends on the 
operators that the system or user defines. For more information about built-in 
operators, see [Built-in Functions and Operators](functions-operators.html).
+
+## Function calls
+
+The syntax for a function call is the name of a function (possibly qualified 
with a schema name), followed by its argument list enclosed in parentheses:
+
+```sql
+function ([expression [, expression ... ]])
+```
+
+For example, the following function call computes the square root of 2:
+
+```sql
+sqrt(2)
+```
+
+## Type casts
+
+A type cast specifies a conversion from one data type to another. A cast 
applied to a value expression of a known type is a run-time type conversion. 
The cast succeeds only if a suitable type conversion is defined. This differs 
from the use of casts with constants. A cast applied to a string literal 
represents the initial assignment of a type to a literal constant value, so it 
succeeds for any type if the contents of the string literal are acceptable 
input syntax for the data type.
+
+Apache Cloudberry supports three types of casts applied to a value expression:
+
+- *Explicit cast* - Apache Cloudberry applies a cast when you explicitly 
specify a cast between two data types. Apache Cloudberry accepts two equivalent 
syntaxes for explicit type casts:
+
+    ```sql
+    CAST ( expression AS type )
+    expression::type
+    ```
+
+    The `CAST` syntax conforms to SQL; the syntax using `::` is historical 
PostgreSQL usage.
+
+- *Assignment cast* - Apache Cloudberry implicitly invokes a cast in 
assignment contexts, when assigning a value to a column of the target data 
type. For example, a [`CREATE CAST`](../../sql-stmt/create-cast.md) command 
with the `AS ASSIGNMENT` clause creates a cast that is applied implicitly in 
the assignment context. This example assignment cast assumes that `tbl1.f1` is 
a column of type `text`. The `INSERT` command is allowed because the value is 
implicitly cast from the `integer` to `text` type.

Review Comment:
   Need to fix this link too.



##########
docs/operate-with-data/sql-queries/cte-queries.md:
##########
@@ -0,0 +1,290 @@
+---
+title: Common Table Expressions
+---
+
+# WITH Queries (Common Table Expressions)
+
+The `WITH` clause provides a way to write auxiliary statements for use in a 
larger query. These statements, which are often referred to as Common Table 
Expressions or CTEs, can be thought of as defining temporary tables that exist 
just for one query.
+
+:::info
+Limitations when using a `WITH` clause in Apache Cloudberry include:
+
+- For a `SELECT` command that includes a `WITH` clause, the clause can contain 
at most a single clause that modifies table data (`INSERT`, `UPDATE`, or 
`DELETE` command).
+- For a data-modifying command (`INSERT`, `UPDATE`, or `DELETE`) that includes 
a `WITH` clause, the clause can only contain a `SELECT` command; it cannot 
contain a data-modifying command.
+:::
+
+By default, Apache Cloudberry enables the `RECURSIVE` keyword for the `WITH` 
clause. `RECURSIVE` can be deactivated by setting the server configuration 
parameter `gp_recursive_cte` to `false`.
+
+## SELECT in a WITH clause
+
+One use of CTEs is to break down complicated queries into simpler parts. The 
examples in this section show the `WITH` clause being used with a `SELECT` 
command. The example `WITH` clauses can be used the same manner with `INSERT`, 
`UPDATE`, or `DELETE`.
+
+A `SELECT` command in the `WITH` clause is evaluated only once per execution 
of the parent query, even if it is referred to more than once by the parent 
query or sibling `WITH` clauses. Thus, expensive calculations that are needed 
in multiple places can be placed within a `WITH` clause to avoid redundant 
work. Another possible application is to prevent unwanted multiple evaluations 
of functions with side-effects. However, the other side of this coin is that 
the optimizer is less able to push restrictions from the parent query down into 
a `WITH` query than an ordinary sub-query. The `WITH` query will generally be 
evaluated as written, without suppression of rows that the parent query might 
discard afterwards. However, evaluation might stop early if the references to 
the query demand only a limited number of rows.
+
+The following example query displays per-product sales totals in only the top 
sales regions:
+
+```sql
+WITH regional_sales AS (
+     SELECT region, SUM(amount) AS total_sales
+     FROM orders
+     GROUP BY region
+  ), top_regions AS (
+     SELECT region
+     FROM regional_sales
+     WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
+  )
+SELECT region,
+    product,
+    SUM(quantity) AS product_units,
+    SUM(amount) AS product_sales
+FROM orders
+WHERE region IN (SELECT region FROM top_regions)
+GROUP BY region, product;
+```
+
+The `WITH` clause defines two auxiliary statements named `regional_sales` and 
`top_regions`, where the output of `regional_sales` is used in `top_regions` 
and the output of `top_regions` is used in the primary `SELECT` query. The 
query could have been written without the `WITH` clause, but would have 
required two levels of nested sub-`SELECT`s.
+
+When you specify the optional `RECURSIVE` keyword, the `WITH` clause can 
accomplish operations not otherwise possible in standard SQL. Using 
`RECURSIVE`, a `WITH` query can refer to its own output. This simple example 
computes the sum of integers from 1 through 100:
+
+```sql
+WITH RECURSIVE t(n) AS (
+    VALUES (1)
+  UNION ALL
+    SELECT n+1 FROM t WHERE n < 100
+)
+SELECT sum(n) FROM t;
+```
+
+The general form of a recursive `WITH` query always follows the pattern of: a 
*non-recursive term*, followed by a `UNION` (or `UNION ALL`), and then a 
*recursive term*, where only the *recursive term* can contain a reference to 
the query output.
+
+```sql
+<non_recursive_term> UNION [ ALL ] <recursive_term>
+```
+
+A recursive `WITH` query that contains a `UNION [ ALL ]` is evaluated as 
follows:
+
+- Evaluate the non-recursive term. For `UNION` (but not `UNION ALL`), discard 
duplicate rows. Include all remaining rows in the result of the recursive 
query, and also place them in a temporary *working table*.
+- As long as the working table is not empty, repeat these steps:
+    1. Evaluate the recursive term, substituting the current contents of the 
working table for the recursive self-reference. For `UNION` (but not `UNION 
ALL`), discard duplicate rows and rows that duplicate any previous result row. 
Include all remaining rows in the result of the recursive query, and also place 
them in a temporary *intermediate table*.
+    2. Replace the contents of the *working table* with the contents of the 
*intermediate table*, then empty the *intermediate table*.
+
+    :::note
+    While `RECURSIVE` allows queries to be specified recursively, Apache 
Cloudberry evaluates such queries iteratively internally.
+    :::
+
+In the example above, the working table has just a single row in each step, 
and it takes on the values from 1 through 100 in successive steps. In the 100th 
step, there is no output because of the `WHERE` clause, and so the query 
terminates.
+
+Recursive `WITH` queries are typically used to deal with hierarchical or 
tree-structured data. For example, this query locates all of the direct and 
indirect sub-parts of a product, given only a table that shows immediate 
inclusions:
+
+```sql
+WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
+    SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
+  UNION ALL
+    SELECT p.sub_part, p.part, p.quantity * pr.quantity
+    FROM included_parts pr, parts p
+    WHERE p.part = pr.sub_part
+)
+SELECT sub_part, SUM(quantity) as total_quantity
+FROM included_parts
+GROUP BY sub_part;
+```
+
+When working with recursive `WITH` queries, you must ensure that the recursive 
part of the query eventually returns no tuples, or else the query loops 
indefinitely.
+
+For some queries, using `UNION` instead of `UNION ALL` can ensure that the 
recursive part of the query eventually returns no tuples by discarding rows 
that duplicate previous output rows. However, often a cycle does not involve 
output rows that are complete duplicates: it might be sufficient to check just 
one or a few fields to see if the same point has been reached before. The 
standard method for handling such situations is to compute an array of the 
visited values. For example, consider the following query that searches a table 
`graph` using a `link` field:
+
+```sql
+WITH RECURSIVE search_graph(id, link, data, depth) AS (
+    SELECT g.id, g.link, g.data, 1
+    FROM graph g
+  UNION ALL
+    SELECT g.id, g.link, g.data, sg.depth + 1
+    FROM graph g, search_graph sg
+    WHERE g.id = sg.link
+)
+SELECT * FROM search_graph;
+```
+
+This query loops if the `link` relationships contain cycles. Because the query 
requires a `depth` output, changing `UNION ALL` to `UNION` does not eliminate 
the looping. Instead the query needs to recognize whether it has reached the 
same row again while following a particular path of links. This modified query 
adds two columns, `path` and `cycle`, to the loop-prone query:
+
+```sql
+WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
+    SELECT g.id, g.link, g.data, 1,
+      ARRAY[g.id],
+      false
+    FROM graph g
+  UNION ALL
+    SELECT g.id, g.link, g.data, sg.depth + 1,
+      path || g.id,
+      g.id = ANY(path)
+    FROM graph g, search_graph sg
+    WHERE g.id = sg.link AND NOT cycle
+)
+SELECT * FROM search_graph;
+```
+
+Aside from detecting cycles, the array value is useful in its own right 
because it represents the "path" taken to reach any particular row.
+
+In the general case where more than one field needs to be checked to recognize 
a cycle, use an array of rows. For example, if we needed to compare fields `f1` 
and `f2`:
+
+```sql
+WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
+    SELECT g.id, g.link, g.data, 1,
+      ARRAY[ROW(g.f1, g.f2)],
+      false
+    FROM graph g
+  UNION ALL
+    SELECT g.id, g.link, g.data, sg.depth + 1,
+      path || ROW(g.f1, g.f2),
+      ROW(g.f1, g.f2) = ANY(path)
+    FROM graph g, search_graph sg
+    WHERE g.id = sg.link AND NOT cycle
+)
+SELECT * FROM search_graph;
+```
+
+**Tip:** Omit the `ROW()` syntax in the case where only one field must be 
checked to recognize a cycle. This uses a simple array rather than a 
composite-type array, gaining efficiency.
+
+**Tip:** The recursive query evaluation algorithm produces its output in 
breadth-first search order. You can display the results in depth-first search 
order by making the outer query `ORDER BY` a "path" column constructed in this 
way.
+
+A helpful technique for testing a query when you are not certain if it might 
loop indefinitely is to place a `LIMIT` in the parent query. For example, the 
following query would loop forever without the `LIMIT` clause:
+
+```sql
+WITH RECURSIVE t(n) AS (
+    SELECT 1
+  UNION ALL
+    SELECT n+1 FROM t
+)
+SELECT n FROM t LIMIT 100;
+```
+
+This technique works because Apache Cloudberry evaluates only as many rows of 
a `WITH` query as are actually fetched by the parent query. *Using this 
technique in production environments is not recommended*, because other systems 
might work differently. Also, the technique might not work if the outer query 
sorts the recursive `WITH` results or joins the results to another table, 
because in such cases the outer query will usually try to fetch all of the 
`WITH` query's output anyway.
+
+A useful property of `WITH` queries is that they are normally evaluated only 
once per execution of the parent query, even if they are referred to more than 
once by the parent query or sibling `WITH` queries. Thus, expensive 
calculations that are needed in multiple places can be placed within a `WITH` 
query to avoid redundant work. Another possible application is to prevent 
unwanted multiple evaluations of functions with side-effects. However, the 
other side of this coin is that the optimizer is not able to push restrictions 
from the parent query down into a multiply-referenced `WITH` query, because 
that might affect all uses of the `WITH` query's output when it should affect 
only one. Apache Cloudberry evalues the multiply-referenced `WITH` query as 
written, without suppression of rows that the parent query might discard 
afterwards. (But, as mentioned above, evaluation might stop early if the 
reference(s) to the query demand only a limited number of rows.)
+
+If a `WITH` query is non-recursive and side-effect-free (that is, it is a 
`SELECT` containing no volatile functions) then it can be folded into the 
parent query, allowing joint optimization of the two query levels. By default, 
this happens if the parent query references the `WITH` query just once, but not 
if it references the `WITH` query more than once. You can override that 
decision by specifying `MATERIALIZED` to force separate calculation of the 
`WITH` query, or by specifying `NOT MATERIALIZED` to force it to be merged into 
the parent query. The latter choice risks duplicate computation of the `WITH` 
query, but it can still give a net savings if each usage of the `WITH` query 
needs only a small part of the `WITH` query's full output.
+
+A simple example of these rules follows:
+
+```sql
+WITH w AS (
+    SELECT * FROM big_table
+)
+SELECT * FROM w WHERE key = 123;
+```
+
+This `WITH` query will be folded, producing the same execution plan as:
+
+```sql
+SELECT * FROM big_table WHERE key = 123;
+```
+
+In particular, if there's an index on `key`, Apache Cloudberry uses it to 
fetch just the rows having `key = 123`. On the other hand, in:
+
+```sql
+WITH w AS (
+    SELECT * FROM big_table
+)
+SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
+WHERE w2.key = 123;
+```
+
+the `WITH` query will be materialized, producing a temporary copy of 
`big_table` that is then joined with itself — without benefit of any index. 
This query will run much more efficiently if written as:
+
+```sql
+WITH w AS NOT MATERIALIZED (
+    SELECT * FROM big_table
+)
+SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
+WHERE w2.key = 123;
+```
+
+so that the parent query's restrictions can be applied directly to scans of 
`big_table`.
+
+An example where `NOT MATERIALIZED` could be undesirable is:
+
+```sql
+WITH w AS (
+    SELECT key, very_expensive_function(val) as f FROM some_table
+)
+SELECT * FROM w AS w1 JOIN w AS w2 ON w1.f = w2.f;
+```
+
+Here, materialization of the `WITH` query ensures that Apache Cloudberry 
evaluations `very_expensive_function` only once per table row, not twice.
+
+## Data-modifying statements in a WITH clause
+
+For a `SELECT` command, you can use the data-modifying commands `INSERT`, 
`UPDATE`, or `DELETE` in a `WITH` clause. This allows you to perform several 
different operations in the same query.
+
+A data-modifying statement in a `WITH` clause is run exactly once, and always 
to completion, independently of whether the primary query reads all (or indeed 
any) of the output. This is different from the rule when using `SELECT` in a 
`WITH` clause, the execution of a `SELECT` continues only as long as the 
primary query demands its output.
+
+This simple CTE query deletes rows from `products`. The `DELETE` in the `WITH` 
clause deletes the specified rows from products, returning their contents by 
means of its `RETURNING` clause.
+
+```sql
+WITH deleted_rows AS (
+    DELETE FROM products
+    WHERE
+        "date" >= '2010-10-01' AND
+        "date" < '2010-11-01'
+    RETURNING *
+)
+SELECT * FROM deleted_rows;
+```
+
+Data-modifying statements in a `WITH` clause must have `RETURNING` clauses, as 
shown in the previous example. It is the output of the `RETURNING` clause, 
*not* the target table of the data-modifying statement, that forms the 
temporary table that can be referred to by the rest of the query. If a 
data-modifying statement in a `WITH` lacks a `RETURNING` clause, then it forms 
no temporary table and cannot be referred to in the rest of the query. Apache 
Cloudberry runs such a statement nonetheless.
+
+If the optional `RECURSIVE` keyword is enabled, recursive self-references in 
data-modifying statements are not allowed. In some cases it is possible to work 
around this limitation by referring to the output of a recursive `WITH`. For 
example, this query would remove all direct and indirect subparts of a product:
+
+```sql
+WITH RECURSIVE included_parts(sub_part, part) AS (
+    SELECT sub_part, part FROM parts WHERE part = 'our_product'
+  UNION ALL
+    SELECT p.sub_part, p.part
+    FROM included_parts pr, parts p
+    WHERE p.part = pr.sub_part
+  )
+DELETE FROM parts
+  WHERE part IN (SELECT part FROM included_parts);
+```
+
+The sub-statements in a `WITH` clause are run concurrently with each other and 
with the main query. Therefore, when using a data-modifying statement in a 
`WITH`, the order in which the specified updates actually happen is 
unpredictable. All of the statements are run wth the same *snapshot*. The 
effects of the statement are not visible on the target tables. This alleviates 
the effects of the unpredictability of the actual order of row updates, and 
means thats the `RETURNING` data is the only way to communicate changes between 
different `WITH` sub-statements and the main query.
+
+In this example, the outer `SELECT` returns the original prices before the 
action of the `UPDATE` in the `WITH` clause:
+
+```sql
+WITH t AS (
+    UPDATE products SET price = price * 1.05
+    RETURNING *
+)
+SELECT * FROM products;
+```
+
+In this example, the outer `SELECT` returns the updated data:
+
+```sql
+WITH t AS (
+    UPDATE products SET price = price * 1.05
+    RETURNING *
+)
+SELECT * FROM t;
+```
+
+Updating the same row twice in a single statement is not supported. The 
effects of such a statement will not be predictable. Only one of the 
modifications takes place, but it is not easy (and sometimes not possible) to 
predict which modification occurs.
+
+Any table used as the target of a data-modifying statement in a `WITH` clause 
must not have a conditional rule, nor an `ALSO` rule, nor an `INSTEAD` rule 
that expands to multiple statements.
+
+## Considerations
+
+When constructing `WITH` queries, keep the following in mind:
+
+- `SELECT FOR UPDATE` cannot be inlined.
+- Apache Cloudberry inlines multiply-referenced CTEs only when requested (by 
specifying `NOT MATERIALIZED`).
+- Multiply-referenced CTEs cannot be inlined if they contain outer 
self-references.
+- Apache Cloudberry does not inline when the CTE includes a volatile function.
+- An `ORDER BY` in the subquery or CTE does not force an ordering for the 
whole query.
+- Apache Cloudberry always materializes a CTE term in a query. Due to these 
reasons:
+

Review Comment:
   Can delete this blank line.



##########
docs/operate-with-data/sql-queries/define-queries.md:
##########
@@ -0,0 +1,21 @@
+---
+title: Define Queries
+---
+
+# Define Queries
+
+Apache Cloudberry is built on top of PostgreSQL and follows the SQL standard 
for data definition, manipulation, and analysis. To define queries effectively, 
you need to understand the rules that decide SQL syntax and semantics.
+
+This topic introduces the key concepts for building SQL queries in Apache 
Cloudberry, including SQL language structure, value expressions, aggregate and 
window functions, subqueries, and expression evaluation rules. Each of these 
areas plays a specific role in enabling accurate, efficient, and expressive 
queries.
+
+The subtopics linked from this page offer detailed explanations and examples 
to help you construct SQL queries that work seamlessly with the Cloudberry 
engine.
+
+
+
+
+
+
+
+
+
+

Review Comment:
   Can delete these blank lines.



##########
docs/operate-with-data/sql-queries/subqueries.md:
##########
@@ -0,0 +1,86 @@
+---
+title: Subqueries
+---
+
+# Subqueries
+
+Subqueries let you embed one query inside another, enabling dynamic and 
conditional logic when defining queries. They are useful for filtering, 
calculating intermediate values, or expressing correlated relationships.
+
+This document distinguishes between scalar and correlated subqueries and 
provides guidance on when and how to use them effectively.
+
+A scalar subquery is a `SELECT` query in parentheses that returns exactly one 
row with one column. Do not use a `SELECT` query that returns multiple rows or 
columns as a scalar subquery. The query runs and uses the returned value in the 
surrounding value expression. A correlated scalar subquery contains references 
to the outer query block.
+
+## Correlated subqueries
+
+A correlated subquery (CSQ) is a `SELECT` query with a `WHERE` clause or 
target list that contains references to the parent outer clause. CSQs 
efficiently express results in terms of results of another query. Apache 
Cloudberry supports correlated subqueries that provide compatibility with many 
existing applications. A CSQ is a scalar or table subquery, depending on 
whether it returns one or multiple rows. Apache Cloudberry does not support 
correlated subqueries with skip-level correlations.
+
+## Correlated subquery examples
+
+### Example 1 – Scalar correlated subquery
+
+```sql
+SELECT * FROM t1 WHERE t1.x 
+            > (SELECT MAX(t2.x) FROM t2 WHERE t2.y = t1.y);
+```
+
+### Example 2 – Correlated EXISTS subquery
+
+```sql
+SELECT * FROM t1 WHERE 
+EXISTS (SELECT 1 FROM t2 WHERE t2.x = t1.x);
+```
+
+Apache Cloudberry uses one of the following methods to run CSQs:
+
+- Unnest the CSQ into join operations – This method is most efficient, and it 
is how Apache Cloudberry runs most CSQs, including queries from the TPC-H 
benchmark.
+- Run the CSQ on every row of the outer query – This method is relatively 
inefficient, and it is how Apache Cloudberry runs queries that contain CSQs in 
the `SELECT` list or are connected by `OR` conditions.
+
+The following examples illustrate how to rewrite some of these types of 
queries to improve performance.
+
+### Example 3 - CSQ in the select list
+
+*Original Query*
+
+```sql
+SELECT T1.a,
+      (SELECT COUNT(DISTINCT T2.z) FROM t2 WHERE t1.x = t2.y) dt2 
+FROM t1;
+```
+
+Rewrite this query to perform an inner join with `t1` first and then perform a 
left join with `t1` again. The rewrite applies for only an equijoin in the 
correlated condition.
+
+*Rewritten Query*
+
+```sql
+SELECT t1.a, dt2 FROM t1 
+       LEFT JOIN 
+        (SELECT t2.y AS csq_y, COUNT(DISTINCT t2.z) AS dt2 
+              FROM t1, t2 WHERE t1.x = t2.y 
+              GROUP BY t1.x) 
+       ON (t1.x = csq_y);
+```
+
+### Example 4 - CSQs connected by OR clauses
+
+*Original Query*
+
+```sql
+SELECT * FROM t1 
+WHERE 
+x > (SELECT COUNT(*) FROM t2 WHERE t1.x = t2.x) 
+OR x < (SELECT COUNT(*) FROM t3 WHERE t1.y = t3.y)
+```
+
+Rewrite this query to separate it into two parts with a union on the `OR` 
conditions.
+
+*Rewritten Query*
+
+```sql
+SELECT * FROM t1 
+WHERE x > (SELECT count(*) FROM t2 WHERE t1.x = t2.x) 
+UNION 
+SELECT * FROM t1 
+WHERE x < (SELECT count(*) FROM t3 WHERE t1.y = t3.y)
+```
+
+To view the query plan, use `EXPLAIN SELECT` or `EXPLAIN ANALYZE SELECT`. 
Subplan nodes in the query plan indicate that the query will run on every row 
of the outer query, and the query is a candidate for rewriting. For more 
information about these statements, see [Query Profiling](query-profiling.html).

Review Comment:
   The `Query Profiling` link doesn't work.



##########
docs/performance/optimize-queries/analyze-query-performance.md:
##########
@@ -0,0 +1,196 @@
+---
+title: Analyze Query Performance
+---
+
+# Analyze Query Performance
+
+Examine the query plans of poorly performing queries to identify possible 
performance tuning opportunities.
+
+Apache Cloudberry devises a query plan for each query. Choosing the right 
query plan to match the query and data structure is necessary for good 
performance. A query plan defines how Apache Cloudberry will run the query in 
the parallel execution environment.
+
+The query optimizer uses data statistics maintained by the database to choose 
a query plan with the lowest possible cost. Cost is measured in disk I/O, shown 
as units of disk page fetches. The goal is to minimize the total execution cost 
for the plan.
+
+View the plan for a given query with the `EXPLAIN` command. `EXPLAIN` shows 
the query optimizer's estimated cost for the query plan. For example:
+
+```sql
+EXPLAIN SELECT * FROM names WHERE id=22;
+```
+
+`EXPLAIN ANALYZE` runs the statement in addition to displaying its plan. This 
is useful for determining how close the optimizer's estimates are to reality. 
For example:
+
+```sql
+EXPLAIN ANALYZE SELECT * FROM names WHERE id=22;
+```
+
+:::note

Review Comment:
   Will this block work for the rest texts? If not, need the `:::` to end it.



##########
docs/operate-with-data/sql-queries/value-expressions.md:
##########
@@ -0,0 +1,270 @@
+---
+title: Value Expressions
+---
+
+# Value Expressions
+
+Value expressions are the core components that allow you to calculate, 
transform, and reference data within a query. When defining queries, 
understanding these expressions is key to writing flexible and effective SQL.
+
+This document explains  types of expressions such as column references, 
positional parameters, array and row constructors, function calls, and type 
casts. Each type plays a specific role in how data is processed and represented 
in a query result.
+
+## Column references
+
+A column reference has the form:
+
+```sql
+<correlation>.<columnname>
+```
+
+Here, `<correlation>` is the name of a table (possibly qualified with a schema 
name) or an alias for a table defined with a `FROM` clause or one of the 
keywords `NEW` or `OLD`. `NEW` and `OLD` can appear only in rewrite rules, but 
you can use other correlation names in any SQL statement. If the column name is 
unique across all tables in the query, you can omit the `<correlation>` part of 
the column reference.
+
+## Positional parameters
+
+Positional parameters are arguments to SQL statements or functions that you 
reference by their positions in a series of arguments. For example, `$1` refers 
to the first argument, `$2` to the second argument, and so on. The values of 
positional parameters are set from arguments external to the SQL statement or 
supplied when SQL functions are invoked. Some client libraries support 
specifying data values separately from the SQL command, in which case 
parameters refer to the out-of-line data values. A parameter reference has the 
form:
+
+```sql
+$number
+```
+
+For example:
+
+```sql
+CREATE FUNCTION dept(text) RETURNS dept
+    AS $$ SELECT * FROM dept WHERE name = $1 $$
+    LANGUAGE SQL;
+```
+
+Here, the `$1` references the value of the first function argument whenever 
the function is invoked.
+
+## Subscripts
+
+If an expression yields a value of an array type, you can extract a specific 
element of the array value as follows:
+
+```sql
+<expression>[<subscript>]
+
+```
+
+You can extract multiple adjacent elements, called an array slice, as follows 
(including the brackets):
+
+```sql
+<expression>[<lower_subscript>:<upper_subscript>]
+
+```
+
+Each subscript is an expression and yields an integer value.
+
+Array expressions usually must be in parentheses, but you can omit the 
parentheses when the expression to be subscripted is a column reference or 
positional parameter. You can concatenate multiple subscripts when the original 
array is multidimensional. For example (including the parentheses):
+
+```sql
+mytable.arraycolumn[4]
+```
+
+```sql
+mytable.two_d_column[17][34]
+```
+
+```sql
+$1[10:42]
+```
+
+```sql
+(arrayfunction(a,b))[42]
+```
+
+## Field selection
+
+If an expression yields a value of a composite type (row type), you can 
extract a specific field of the row as follows:
+
+```sql
+<expression>.<fieldname>
+```
+
+The row expression usually must be in parentheses, but you can omit these 
parentheses when the expression to be selected from is a table reference or 
positional parameter. For example:
+
+```sql
+mytable.mycolumn
+```
+
+```sql
+$1.somecolumn
+```
+
+```sql
+(rowfunction(a,b)).col3
+```
+
+A qualified column reference is a special case of field selection syntax.
+
+## Operator invocations
+
+Operator invocations have the following possible syntaxes:
+
+```sql
+<expression operator expression>(binary infix operator)
+```
+
+```sql
+<operator expression>(unary prefix operator)
+```
+
+```sql
+<expression operator>(unary postfix operator)
+```
+
+Where `operator` is an operator token, one of the key words `AND`, `OR`, or 
`NOT`, or qualified operator name in the form:
+
+```sql
+OPERATOR(<schema>.<operatorname>)
+```
+
+Available operators and whether they are unary or binary depends on the 
operators that the system or user defines. For more information about built-in 
operators, see [Built-in Functions and Operators](functions-operators.html).

Review Comment:
   Need to fix this link too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: add db objects and query docs [cloudberry-site]

Reply via email to