This is an automated email from the ASF dual-hosted git repository. mergebot-role pushed a commit to branch mergebot in repository https://gitbox.apache.org/repos/asf/beam-site.git
commit c55e35553e49231d39371a3c7c98ce56d5a34aa4 Author: Andrew Pilloud <apill...@google.com> AuthorDate: Mon Jul 23 11:24:39 2018 -0700 Pull in SELECT documentation from bigquery --- src/documentation/dsls/sql/select.md | 1855 +++++++++++++++++++++++++++++++++- 1 file changed, 1813 insertions(+), 42 deletions(-) diff --git a/src/documentation/dsls/sql/select.md b/src/documentation/dsls/sql/select.md index 24bd728..c15003e 100644 --- a/src/documentation/dsls/sql/select.md +++ b/src/documentation/dsls/sql/select.md @@ -32,59 +32,1830 @@ batch/streaming model: - [Joins]({{ site.baseurl}}/documentation/dsls/sql/joins) - [Windowing & Triggering]({{ site.baseurl}}/documentation/dsls/sql/windowing-and-triggering/) -Below is a curated grammar of the supported syntax in Beam SQL +Query statements scan one or more tables or expressions and return the computed +result rows. This topic describes the syntax for SQL queries in BigQuery. +## SQL Syntax + + query_statement: + [ WITH with_query_name AS ( query_expr ) [, ...] ] + query_expr + + query_expr: + { select | ( query_expr ) | query_expr set_op query_expr } + [ ORDER BY expression [{ ASC | DESC }] [, ...] ] + [ LIMIT count [ OFFSET skip_rows ] ] + + select: + SELECT [{ ALL | DISTINCT }] + { [ expression. ]* [ EXCEPT ( column_name [, ...] ) ] + [ REPLACE ( expression [ AS ] column_name [, ...] ) ] + | expression [ [ AS ] alias ] } [, ...] + [ FROM from_item [, ...] ] + [ WHERE bool_expression ] + [ GROUP BY { expression [, ...] | ROLLUP ( expression [, ...] ) } ] + [ HAVING bool_expression ] + [ WINDOW window_name AS ( window_definition ) [, ...] ] + + set_op: + UNION { ALL | DISTINCT } | INTERSECT DISTINCT | EXCEPT DISTINCT + + from_item: { + table_name [ [ AS ] alias ] [ FOR SYSTEM TIME AS OF timestamp_expression ] | + join | + ( query_expr ) [ [ AS ] alias ] | + field_path | + { UNNEST( array_expression ) | UNNEST( array_path ) | array_path } + [ [ AS ] alias ] [ WITH OFFSET [ [ AS ] alias ] ] | + with_query_name [ [ AS ] alias ] + } + + join: + from_item [ join_type ] JOIN from_item + [ { ON bool_expression | USING ( join_column [, ...] ) } ] + + join_type: + { INNER | CROSS | FULL [OUTER] | LEFT [OUTER] | RIGHT [OUTER] } + +Notation: + +- Square brackets "\[ \]" indicate optional clauses. +- Parentheses "( )" indicate literal parentheses. +- The vertical bar "|" indicates a logical OR. +- Curly braces "{ }" enclose a set of options. +- A comma followed by an ellipsis within square brackets "\[, ... \]" + indicates that the preceding item can repeat in a comma-separated list. + +## SELECT list + +Syntax: + + SELECT [{ ALL | DISTINCT }] + { [ expression. ]* [ EXCEPT ( column_name [, ...] ) ] + [ REPLACE ( expression [ AS ] column_name [, ...] ) ] + | expression [ [ AS ] alias ] } [, ...] + +The `SELECT` list defines the columns that the query will return. Expressions in +the `SELECT` list can refer to columns in any of the `from_item`s in its +corresponding `FROM` clause. + +Each item in the `SELECT` list is one of: + +- \* +- `expression` +- `expression.*` + +### SELECT \* + +`SELECT *`, often referred to as *select star*, produces one output column for +each column that is visible after executing the full query. + +``` {.codehilite} +SELECT * FROM (SELECT "apple" AS fruit, "carrot" AS vegetable); + ++-------+-----------+ +| fruit | vegetable | ++-------+-----------+ +| apple | carrot | ++-------+-----------+ +``` + +### SELECT `expression` + +Items in a `SELECT` list can be expressions. These expressions evaluate to a +single value and produce one output column, with an optional explicit `alias`. + +If the expression does not have an explicit alias, it receives an implicit alias +according to the rules for [implicit aliases](#implicit_aliases), if possible. +Otherwise, the column is anonymous and you cannot refer to it by name elsewhere +in the query. + +### SELECT `expression.*` {#select-expression_1} + +An item in a `SELECT` list can also take the form of `expression.*`. This +produces one output column for each column or top-level field of `expression`. +The expression must either be a table alias or evaluate to a single value of a +data type with fields, such as a STRUCT. + +The following query produces one output column for each column in the table +`groceries`, aliased as `g`. + +``` {.codehilite} +WITH groceries AS + (SELECT "milk" AS dairy, + "eggs" AS protein, + "bread" AS grain) +SELECT g.* +FROM groceries AS g; + ++-------+---------+-------+ +| dairy | protein | grain | ++-------+---------+-------+ +| milk | eggs | bread | ++-------+---------+-------+ +``` + +More examples: + +``` {.codehilite} +WITH locations AS + (SELECT STRUCT("Seattle" AS city, "Washington" AS state) AS location + UNION ALL + SELECT STRUCT("Phoenix" AS city, "Arizona" AS state) AS location) +SELECT l.location.* +FROM locations l; + ++---------+------------+ +| city | state | ++---------+------------+ +| Seattle | Washington | +| Phoenix | Arizona | ++---------+------------+ +``` + +``` {.codehilite} +WITH locations AS + (SELECT ARRAY<STRUCT<city STRING, state STRING>>[("Seattle", "Washington"), + ("Phoenix", "Arizona")] AS location) +SELECT l.LOCATION[offset(0)].* +FROM locations l; + ++---------+------------+ +| city | state | ++---------+------------+ +| Seattle | Washington | ++---------+------------+ +``` + +### SELECT modifiers + +You can modify the results returned from a `SELECT` query, as follows. + +#### SELECT DISTINCT + +A `SELECT DISTINCT` statement discards duplicate rows and returns only the +remaining rows. `SELECT DISTINCT` cannot return columns of the following types: + +- STRUCT +- ARRAY + +#### SELECT \* EXCEPT + +A `SELECT * EXCEPT` statement specifies the names of one or more columns to +exclude from the result. All matching column names are omitted from the output. + +``` {.codehilite} +WITH orders AS + (SELECT 5 as order_id, + "sprocket" as item_name, + 200 as quantity) +SELECT * EXCEPT (order_id) +FROM orders; + ++-----------+----------+ +| item_name | quantity | ++-----------+----------+ +| sprocket | 200 | ++-----------+----------+ +``` + +**Note:** `SELECT * EXCEPT` does not exclude columns that do not have names. + +#### SELECT \* REPLACE + +A `SELECT * REPLACE` statement specifies one or more `expression AS identifier` +clauses. Each identifier must match a column name from the `SELECT *` statement. +In the output column list, the column that matches the identifier in a `REPLACE` +clause is replaced by the expression in that `REPLACE` clause. + +A `SELECT * REPLACE` statement does not change the names or order of columns. +However, it can change the value and the value type. + +``` {.codehilite} +WITH orders AS + (SELECT 5 as order_id, + "sprocket" as item_name, + 200 as quantity) +SELECT * REPLACE ("widget" AS item_name) +FROM orders; + ++----------+-----------+----------+ +| order_id | item_name | quantity | ++----------+-----------+----------+ +| 5 | widget | 200 | ++----------+-----------+----------+ + +WITH orders AS + (SELECT 5 as order_id, + "sprocket" as item_name, + 200 as quantity) +SELECT * REPLACE (quantity/2 AS quantity) +FROM orders; + ++----------+-----------+----------+ +| order_id | item_name | quantity | ++----------+-----------+----------+ +| 5 | sprocket | 100 | ++----------+-----------+----------+ +``` + +**Note:** `SELECT * REPLACE` does not replace columns that do not have names. + +#### SELECT ALL + +A `SELECT ALL` statement returns all rows, including duplicate rows. `SELECT +ALL` is the default behavior of `SELECT`. + +### Aliases + +See [Aliases](#using_aliases) for information on syntax and visibility for +`SELECT` list aliases. + +[]{#analytic_functions} + +## Analytic functions + +Clauses related to analytic functions are documented elsewhere. + +- `OVER` Clause and `PARTITION BY`: See + [Analytic Functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#analytic-functions). + +- `WINDOW` Clause and Window Functions: See + [WINDOW Clause](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#window-clause). + +## FROM clause + +The `FROM` clause indicates the table or tables from which to retrieve rows, and +specifies how to join those rows together to produce a single stream of rows for +processing in the rest of the query. + +### Syntax + + from_item: { + table_name [ [ AS ] alias ] [ FOR SYSTEM TIME AS OF timestamp_expression ] | + join | + ( query_expr ) [ [ AS ] alias ] | + field_path | + { UNNEST( array_expression ) | UNNEST( array_path ) | array_path } + [ [ AS ] alias ] [ WITH OFFSET [ [ AS ] alias ] ] | + with_query_name [ [ AS ] alias ] + } + +#### table\_name + +The name (optionally qualified) of an existing table. + + SELECT * FROM Roster; + SELECT * FROM dataset.Roster; + SELECT * FROM project.dataset.Roster; + +#### FOR SYSTEM TIME AS OF + +`FOR SYSTEM TIME AS OF` references the historical versions of the table +definition and rows that were current at `timestamp_expression`. + +Limitations: + +The source table in the `FROM` clause containing `FOR SYSTEM TIME AS OF` must +not be any of the following: + +- An `ARRAY` scan, including a + [flattened array](arrays#flattening-arrays-and-repeated-fields) or the + output of the `UNNEST` operator. +- A common table expression defined by a `WITH` clause. + +`timestamp_expression` must be a constant expression. It cannot contain the +following: + +- Subqueries. +- Correlated references (references to columns of a table that appear at a + higher level of the query statement, such as in the `SELECT` list). + +- User-defined functions (UDFs). + +The value of `timestamp_expression` cannot fall into the following ranges: + +- After the current timestamp (in the future). +- More than seven (7) days before the current timestamp. + +A single query statement cannot reference a single table at more than one point +in time, including the current time. That is, a query can reference a table +multiple times at the same timestamp, but not the current version and a +historical version, or two different historical versions. + +Examples: + +The following query returns a historical version of the table from one hour ago. + +``` {.codehilite} +SELECT * +FROM t + FOR SYSTEM TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR); +``` + +The following query returns a historical version of the table at an absolute +point in time. + +``` {.codehilite} +SELECT * +FROM t + FOR SYSTEM TIME AS OF '2017-01-01 10:00:00-07:00'; +``` + +The following query returns an error because the `timestamp_expression` contains +a correlated reference to a column in the containing query. + +``` {.codehilite} +SELECT * +FROM t1 +WHERE t1.a IN (SELECT t2.a + FROM t2 FOR SYSTEM TIME AS OF t1.timestamp_column); +``` + +#### join + +See [JOIN Types](#join_types) below. + +#### select {#select_1} + +`( select ) [ [ AS ] alias ]` is a table [subquery](#subqueries). + +#### field\_path + +In the `FROM` clause, `field_path` is any path that resolves to a field within a +data type. `field_path` can go arbitrarily deep into a nested data structure. + +Some examples of valid `field_path` values include: + + SELECT * FROM T1 t1, t1.array_column; + + SELECT * FROM T1 t1, t1.struct_column.array_field; + + SELECT (SELECT ARRAY_AGG(c) FROM t1.array_column c) FROM T1 t1; + + SELECT a.struct_field1 FROM T1 t1, t1.array_of_structs a; + + SELECT (SELECT STRING_AGG(a.struct_field1) FROM t1.array_of_structs a) FROM T1 t1; + +Field paths in the FROM clause must end in an array field. In addition, field +paths cannot contain arrays before the end of the path. For example, the path +`array_column.some_array.some_array_field` is invalid because it contains an +array before the end of the path. + +Note: If a path has only one name, it is interpreted as a table. To work around +this, wrap the path using `UNNEST`, or use the fully-qualified path. + +#### UNNEST + +The `UNNEST` operator takes an `ARRAY` and returns a table, with one row for +each element in the `ARRAY`. You can also use `UNNEST` outside of the `FROM` +clause with the +[`IN` operator](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#in-operators). + +For input `ARRAY`s of most element types, the output of `UNNEST` generally has +one column. This single column has an optional `alias`, which you can use to +refer to the column elsewhere in the query. `ARRAYS` with these element types +return multiple columns: + +- STRUCT + +`UNNEST` destroys the order of elements in the input `ARRAY`. Use the optional +`WITH OFFSET` clause to return a second column with the array element indexes +(see below). + +For an input `ARRAY` of `STRUCT`s, `UNNEST` returns a row for each `STRUCT`, +with a separate column for each field in the `STRUCT`. The alias for each column +is the name of the corresponding `STRUCT` field. + +**Example** + +``` {.codehilite} +SELECT * +FROM UNNEST(ARRAY<STRUCT<x INT64, y STRING>>[(1, 'foo'), (3, 'bar')]); + ++---+-----+ +| x | y | ++---+-----+ +| 3 | bar | +| 1 | foo | ++---+-----+ +``` + +Because the `UNNEST` operator returns a +[value table](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#value-tables), +you can alias `UNNEST` to define a range variable that you can reference +elsewhere in the query. If you reference the range variable in the `SELECT` +list, the query returns a `STRUCT` containing all of the fields of the original +`STRUCT` in the input table. + +**Example** + +``` {.codehilite} +SELECT *, struct_value +FROM UNNEST(ARRAY<STRUCT<x INT64, y STRING>>[(1, 'foo'), (3, 'bar')]) + AS struct_value; + ++---+-----+--------------+ +| x | y | struct_value | ++---+-----+--------------+ +| 3 | bar | {3, bar} | +| 1 | foo | {1, foo} | ++---+-----+--------------+ +``` + +ARRAY unnesting can be either explicit or implicit. In explicit unnesting, +`array_expression` must return an ARRAY value but does not need to resolve to an +ARRAY, and the `UNNEST` keyword is required. + +Example: + + SELECT * FROM UNNEST ([1, 2, 3]); + +In implicit unnesting, `array_path` must resolve to an ARRAY and the `UNNEST` +keyword is optional. + +Example: + + SELECT x + FROM mytable AS t, + t.struct_typed_column.array_typed_field1 AS x; + +In this scenario, `array_path` can go arbitrarily deep into a data structure, +but the last field must be ARRAY-typed. No previous field in the expression can +be ARRAY-typed because it is not possible to extract a named field from an +ARRAY. + +`UNNEST` treats NULLs as follows: + +- NULL and empty ARRAYs produces zero rows. +- An ARRAY containing NULLs produces rows containing NULL values. + +The optional `WITH` `OFFSET` clause returns a separate column containing the +"offset" value (i.e. counting starts at zero) for each row produced by the +`UNNEST` operation. This column has an optional `alias`; the default alias is +offset. + +Example: + + SELECT * FROM UNNEST ( ) WITH OFFSET AS num; + +See the +[`Arrays topic`](https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays) +for more ways to use `UNNEST`, including construction, flattening, and +filtering. + +#### with\_query\_name + +The query names in a `WITH` clause (see [WITH Clause](#with_clause)) act like +names of temporary tables that you can reference anywhere in the `FROM` clause. +In the example below, `subQ1` and `subQ2` are `with_query_names`. + +Example: + + WITH + subQ1 AS (SELECT * FROM Roster WHERE SchoolID = 52), + subQ2 AS (SELECT SchoolID FROM subQ1) + SELECT DISTINCT * FROM subQ2; + +The `WITH` clause hides any permanent tables with the same name for the duration +of the query, unless you qualify the table name, e.g. `dataset.Roster` or +`project.dataset.Roster`. + +[]{#subqueries} + +### Subqueries + +A subquery is a query that appears inside another statement, and is written +inside parentheses. These are also referred to as "sub-SELECTs" or "nested +SELECTs". The full `SELECT` syntax is valid in subqueries. + +There are two types of subquery: + +- [Expression Subqueries](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#expression-subqueries), + which you can use in a query wherever expressions are valid. Expression + subqueries return a single value. +- Table subqueries, which you can use only in a `FROM` clause. The outer query + treats the result of the subquery as a table. + +Note that there must be parentheses around both types of subqueries. + +Example: + +``` {.codehilite} +SELECT AVG ( PointsScored ) +FROM +( SELECT PointsScored + FROM Stats + WHERE SchoolID = 77 ) +``` + +Optionally, a table subquery can have an alias. + +Example: + +``` {.codehilite} +SELECT r.LastName +FROM +( SELECT * FROM Roster) AS r; +``` + +### Aliases {#aliases_1} + +See [Aliases](#using_aliases) for information on syntax and visibility for +`FROM` clause aliases. + +[]{#join_types} + +## JOIN types + +### Syntax {#syntax_1} + + join: + from_item [ join_type ] JOIN from_item + [ ON bool_expression | USING ( join_column [, ...] ) ] + + join_type: + { INNER | CROSS | FULL [OUTER] | LEFT [OUTER] | RIGHT [OUTER] } + +The `JOIN` clause merges two `from_item`s so that the `SELECT` clause can query +them as one source. The `join_type` and `ON` or `USING` clause (a "join +condition") specify how to combine and discard rows from the two `from_item`s to +form a single source. + +All `JOIN` clauses require a `join_type`. + +A `JOIN` clause requires a join condition unless one of the following conditions +is true: + +- `join_type` is `CROSS`. +- One or both of the `from_item`s is not a table, e.g. an `array_path` or + `field_path`. + +### \[INNER\] JOIN + +An `INNER JOIN`, or simply `JOIN`, effectively calculates the Cartesian product +of the two `from_item`s and discards all rows that do not meet the join +condition. "Effectively" means that it is possible to implement an `INNER JOIN` +without actually calculating the Cartesian product. + +### CROSS JOIN + +`CROSS JOIN` returns the Cartesian product of the two `from_item`s. In other +words, it retains all rows from both `from_item`s and combines each row from the +first `from_item`s with each row from the second `from_item`s. + +**Comma cross joins** + +`CROSS JOIN`s can be written explicitly (see directly above) or implicitly using +a comma to separate the `from_item`s. + +Example of an implicit "comma cross join": + +``` {.codehilite} +SELECT * FROM Roster, TeamMascot; +``` + +Here is the explicit cross join equivalent: + +``` {.codehilite} +SELECT * FROM Roster CROSS JOIN TeamMascot; +``` + +You cannot write comma cross joins inside parentheses. + +Invalid - comma cross join inside parentheses: + +``` {.codehilite} +SELECT * FROM t CROSS JOIN (Roster, TeamMascot); // INVALID. +``` + +See [Sequences of JOINs](#sequences_of_joins) for details on how a comma cross +join behaves in a sequence of JOINs. + +### FULL \[OUTER\] JOIN + +A `FULL OUTER JOIN` (or simply `FULL JOIN`) returns all fields for all rows in +both `from_item`s that meet the join condition. + +`FULL` indicates that *all rows* from both `from_item`s are returned, even if +they do not meet the join condition. + +`OUTER` indicates that if a given row from one `from_item` does not join to any +row in the other `from_item`, the row will return with NULLs for all columns +from the other `from_item`. + +### LEFT \[OUTER\] JOIN + +The result of a `LEFT OUTER JOIN` (or simply `LEFT JOIN`) for two `from_item`s +always retains all rows of the left `from_item` in the `JOIN` clause, even if no +rows in the right `from_item` satisfy the join predicate. + +`LEFT` indicates that all rows from the *left* `from_item` are returned; if a +given row from the left `from_item` does not join to any row in the *right* +`from_item`, the row will return with NULLs for all columns from the right +`from_item`. Rows from the right `from_item` that do not join to any row in the +left `from_item` are discarded. + +### RIGHT \[OUTER\] JOIN + +The result of a `RIGHT OUTER JOIN` (or simply `RIGHT JOIN`) is similar and +symmetric to that of `LEFT OUTER JOIN`. + +[]{#on_clause} + +### ON clause + +The `ON` clause contains a `bool_expression`. A combined row (the result of +joining two rows) meets the join condition if `bool_expression` returns TRUE. + +Example: + +``` {.codehilite} +SELECT * FROM Roster INNER JOIN PlayerStats +ON Roster.LastName = PlayerStats.LastName; +``` + +[]{#using_clause} + +### USING clause + +The `USING` clause requires a `column_list` of one or more columns which occur +in both input tables. It performs an equality comparison on that column, and the +rows meet the join condition if the equality comparison returns TRUE. + +In most cases, a statement with the `USING` keyword is equivalent to using the +`ON` keyword. For example, the statement: + +``` {.codehilite} +SELECT FirstName +FROM Roster INNER JOIN PlayerStats +USING (LastName); +``` + +is equivalent to: + +``` {.codehilite} +SELECT FirstName +FROM Roster INNER JOIN PlayerStats +ON Roster.LastName = PlayerStats.LastName; +``` + +The results from queries with `USING` do differ from queries that use `ON` when +you use `SELECT *`. To illustrate this, consider the query: + +``` {.codehilite} +SELECT * FROM Roster INNER JOIN PlayerStats +USING (LastName); +``` + +This statement returns the rows from `Roster` and `PlayerStats` where +`Roster.LastName` is the same as `PlayerStats.LastName`. The results include a +single `LastName` column. + +By contrast, consider the following query: + +``` {.codehilite} +SELECT * FROM Roster INNER JOIN PlayerStats +ON Roster.LastName = PlayerStats.LastName; +``` + +This statement returns the rows from `Roster` and `PlayerStats` where +`Roster.LastName` is the same as `PlayerStats.LastName`. The results include two +`LastName` columns; one from `Roster` and one from `PlayerStats`. + +[]{#sequences_of_joins} + +### Sequences of JOINs + +The `FROM` clause can contain multiple `JOIN` clauses in sequence. + +Example: + +``` {.codehilite} +SELECT * FROM a LEFT JOIN b ON TRUE LEFT JOIN c ON TRUE; +``` + +where `a`, `b`, and `c` are any `from_item`s. JOINs are bound from left to +right, but you can insert parentheses to group them in a different order. + +Consider the following queries: A (without parentheses) and B (with parentheses) +are equivalent to each other but not to C. The `FULL JOIN` in **bold** binds +first. + +A. + + SELECT * FROM Roster FULL JOIN TeamMascot USING (SchoolID) + FULL JOIN PlayerStats USING (LastName); + +B. + + SELECT * FROM ( (Roster FULL JOIN TeamMascot USING (SchoolID)) + FULL JOIN PlayerStats USING (LastName)); + +C. + + SELECT * FROM (Roster FULL JOIN (TeamMascot FULL JOIN PlayerStats USING + (LastName)) USING (SchoolID)) ; + +When comma cross joins are present in a query with a sequence of JOINs, they +group from left to right like other `JOIN` types. + +Example: + +``` {.codehilite} +SELECT * FROM a JOIN b ON TRUE, b JOIN c ON TRUE; +``` + +The query above is equivalent to + +``` {.codehilite} +SELECT * FROM ((a JOIN b ON TRUE) CROSS JOIN b) JOIN c ON TRUE); +``` + +There cannot be a `RIGHT JOIN` or `FULL JOIN` after a comma join. + +Invalid - `RIGHT JOIN` after a comma cross join: + +``` {.codehilite} +SELECT * FROM Roster, TeamMascot RIGHT JOIN PlayerStats ON TRUE; // INVALID. +``` + +[]{#where_clause} + +## WHERE clause + +### Syntax {#syntax_2} + +``` {.codehilite} +WHERE bool_expression +``` + +The `WHERE` clause filters out rows by evaluating each row against +`bool_expression`, and discards all rows that do not return TRUE (that is, rows +that return FALSE or NULL). + +Example: + +``` {.codehilite} +SELECT * FROM Roster +WHERE SchoolID = 52; +``` + +The `bool_expression` can contain multiple sub-conditions. + +Example: + +``` {.codehilite} +SELECT * FROM Roster +WHERE STARTS_WITH(LastName, "Mc") OR STARTS_WITH(LastName, "Mac"); +``` + +You cannot reference column aliases from the `SELECT` list in the `WHERE` +clause. + +Expressions in an `INNER JOIN` have an equivalent expression in the `WHERE` +clause. For example, a query using `INNER` `JOIN` and `ON` has an equivalent +expression using `CROSS JOIN` and `WHERE`. + +Example - this query: + +``` {.codehilite} +SELECT * FROM Roster INNER JOIN TeamMascot +ON Roster.SchoolID = TeamMascot.SchoolID; +``` + +is equivalent to: + +``` {.codehilite} +SELECT * FROM Roster CROSS JOIN TeamMascot +WHERE Roster.SchoolID = TeamMascot.SchoolID; +``` + +[]{#group_by_clause} + +## GROUP BY clause + +### Syntax {#syntax_3} + + GROUP BY { expression [, ...] | ROLLUP ( expression [, ...] ) } + +The `GROUP BY` clause groups together rows in a table with non-distinct values +for the `expression` in the `GROUP BY` clause. For multiple rows in the source +table with non-distinct values for `expression`, the `GROUP BY` clause produces +a single combined row. `GROUP BY` is commonly used when aggregate functions are +present in the `SELECT` list, or to eliminate redundancy in the output. The data +type of `expression` must be +[groupable](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#data-type-properties). + +Example: + +``` {.codehilite} +SELECT SUM(PointsScored), LastName +FROM PlayerStats +GROUP BY LastName; +``` + +The `GROUP BY` clause can refer to expression names in the `SELECT` list. The +`GROUP BY` clause also allows ordinal references to expressions in the `SELECT` +list using integer values. `1` refers to the first expression in the `SELECT` +list, `2` the second, and so forth. The expression list can combine ordinals and +expression names. + +Example: + +``` {.codehilite} +SELECT SUM(PointsScored), LastName, FirstName +FROM PlayerStats +GROUP BY LastName, FirstName; +``` + +The query above is equivalent to: + +``` {.codehilite} +SELECT SUM(PointsScored), LastName, FirstName +FROM PlayerStats +GROUP BY 2, FirstName; +``` + +`GROUP BY` clauses may also refer to aliases. If a query contains aliases in the +`SELECT` clause, those aliases override names in the corresponding `FROM` +clause. + +Example: + +``` {.codehilite} +SELECT SUM(PointsScored), LastName as last_name +FROM PlayerStats +GROUP BY last_name; +``` + +`GROUP BY ROLLUP` returns the results of `GROUP BY` for prefixes of the +expressions in the `ROLLUP` list, each of which is known as a *grouping set*. +For the `ROLLUP` list `(a, b, c)`, the grouping sets are `(a, b, c)`, `(a, b)`, +`(a)`, `()`. When evaluating the results of `GROUP BY` for a particular grouping +set, `GROUP BY ROLLUP` treats expressions that are not in the grouping set as +having a `NULL` value. A `SELECT` statement like this one: + +``` {.codehilite} +SELECT a, b, SUM(c) FROM Input GROUP BY ROLLUP(a, b); +``` + +uses the rollup list `(a, b)`. The result will include the results of `GROUP BY` +for the grouping sets `(a, b)`, `(a)`, and `()`, which includes all rows. This +returns the same rows as: + +``` {.codehilite} +SELECT NULL, NULL, SUM(c) FROM Input UNION ALL +SELECT a, NULL, SUM(c) FROM Input GROUP BY a UNION ALL +SELECT a, b, SUM(c) FROM Input GROUP BY a, b; +``` + +This allows the computation of aggregates for the grouping sets defined by the +expressions in the `ROLLUP` list and the prefixes of that list. + +Example: + +``` {.codehilite} +WITH Sales AS ( + SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL + SELECT 123, 1, 8.99 UNION ALL + SELECT 456, 1, 4.56 UNION ALL + SELECT 123, 2, 9.99 UNION ALL + SELECT 789, 3, 1.00 UNION ALL + SELECT 456, 3, 4.25 UNION ALL + SELECT 789, 3, 0.99 +) +SELECT + day, + SUM(price) AS total +FROM Sales +GROUP BY ROLLUP(day); +``` + +The query above outputs a row for each day in addition to the rolled up total +across all days, as indicated by a `NULL` day: + +``` {.codehilite} ++------+-------+ +| day | total | ++------+-------+ +| NULL | 39.77 | +| 1 | 23.54 | +| 2 | 9.99 | +| 3 | 6.24 | ++------+-------+ +``` + +Example: + +``` {.codehilite} +WITH Sales AS ( + SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL + SELECT 123, 1, 8.99 UNION ALL + SELECT 456, 1, 4.56 UNION ALL + SELECT 123, 2, 9.99 UNION ALL + SELECT 789, 3, 1.00 UNION ALL + SELECT 456, 3, 4.25 UNION ALL + SELECT 789, 3, 0.99 +) +SELECT + sku, + day, + SUM(price) AS total +FROM Sales +GROUP BY ROLLUP(sku, day) +ORDER BY sku, day; +``` + +The query above returns rows grouped by the following grouping sets: + +- sku and day +- sku (day is `NULL`) +- The empty grouping set (day and sku are `NULL`) + +The sums for these grouping sets correspond to the total for each distinct +sku-day combination, the total for each sku across all days, and the grand +total: + +``` {.codehilite} ++------+------+-------+ +| sku | day | total | ++------+------+-------+ +| NULL | NULL | 39.77 | +| 123 | NULL | 28.97 | +| 123 | 1 | 18.98 | +| 123 | 2 | 9.99 | +| 456 | NULL | 8.81 | +| 456 | 1 | 4.56 | +| 456 | 3 | 4.25 | +| 789 | 3 | 1.99 | +| 789 | NULL | 1.99 | ++------+------+-------+ +``` + +[]{#having_clause} + +## HAVING clause + +### Syntax {#syntax_4} + +``` {.codehilite} +HAVING bool_expression +``` + +The `HAVING` clause is similar to the `WHERE` clause: it filters out rows that +do not return TRUE when they are evaluated against the `bool_expression`. + +As with the `WHERE` clause, the `bool_expression` can be any expression that +returns a boolean, and can contain multiple sub-conditions. + +The `HAVING` clause differs from the `WHERE` clause in that: + +- The `HAVING` clause requires `GROUP BY` or aggregation to be present in the + query. +- The `HAVING` clause occurs after `GROUP BY` and aggregation, and before + `ORDER BY`. This means that the `HAVING` clause is evaluated once for every + aggregated row in the result set. This differs from the `WHERE` clause, + which is evaluated before `GROUP BY` and aggregation. + +The `HAVING` clause can reference columns available via the `FROM` clause, as +well as `SELECT` list aliases. Expressions referenced in the `HAVING` clause +must either appear in the `GROUP BY` clause or they must be the result of an +aggregate function: + +``` {.codehilite} +SELECT LastName +FROM Roster +GROUP BY LastName +HAVING SUM(PointsScored) > 15; +``` + +If a query contains aliases in the `SELECT` clause, those aliases override names +in a `FROM` clause. + +``` {.codehilite} +SELECT LastName, SUM(PointsScored) AS ps +FROM Roster +GROUP BY LastName +HAVING ps > 0; +``` + +[]{#mandatory_aggregation} + +### Mandatory aggregation + +Aggregation does not have to be present in the `HAVING` clause itself, but +aggregation must be present in at least one of the following forms: + +#### Aggregation function in the `SELECT` list. {#aggregation-function-in-the-select-list} + +``` {.codehilite} +SELECT LastName, SUM(PointsScored) AS total +FROM PlayerStats +GROUP BY LastName +HAVING total > 15; +``` + +#### Aggregation function in the 'HAVING' clause. {#aggregation-function-in-the-having-clause} + +``` {.codehilite} +SELECT LastName +FROM PlayerStats +GROUP BY LastName +HAVING SUM(PointsScored) > 15; +``` + +#### Aggregation in both the `SELECT` list and `HAVING` clause. {#aggregation-in-both-the-select-list-and-having-clause} + +When aggregation functions are present in both the `SELECT` list and `HAVING` +clause, the aggregation functions and the columns they reference do not need to +be the same. In the example below, the two aggregation functions, `COUNT()` and +`SUM()`, are different and also use different columns. + +``` {.codehilite} +SELECT LastName, COUNT(*) +FROM PlayerStats +GROUP BY LastName +HAVING SUM(PointsScored) > 15; +``` + +[]{#order_by_clause} + +## ORDER BY clause + +### Syntax {#syntax_5} + + ORDER BY expression [{ ASC | DESC }] [, ...] + +The `ORDER BY` clause specifies a column or expression as the sort criterion for +the result set. If an ORDER BY clause is not present, the order of the results +of a query is not defined. The default sort direction is `ASC`, which sorts the +results in ascending order of `expression` values. `DESC` sorts the results in +descending order. Column aliases from a `FROM` clause or `SELECT` list are +allowed. If a query contains aliases in the `SELECT` clause, those aliases +override names in the corresponding `FROM` clause. + +It is possible to order by multiple columns. In the example below, the result +set is ordered first by `SchoolID` and then by `LastName`: + +``` {.codehilite} +SELECT LastName, PointsScored, OpponentID +FROM PlayerStats +ORDER BY SchoolID, LastName; ``` -query: - { - select - | query UNION [ ALL ] query - | query MINUS [ ALL ] query - | query INTERSECT [ ALL ] query - } - [ ORDER BY orderItem [, orderItem ]* LIMIT count [OFFSET offset] ] -orderItem: - expression [ ASC | DESC ] +The following rules apply when ordering values: -select: - SELECT - { * | projectItem [, projectItem ]* } - FROM tableExpression - [ WHERE booleanExpression ] - [ GROUP BY { groupItem [, groupItem ]* } ] - [ HAVING booleanExpression ] +- NULLs: In the context of the `ORDER BY` clause, NULLs are the minimum + possible value; that is, NULLs appear first in `ASC` sorts and last in + `DESC` sorts. +- Floating point data types: see + [Floating Point Semantics](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#floating-point-semantics) + on ordering and grouping. -projectItem: - expression [ [ AS ] columnAlias ] - | tableAlias . * +When used in conjunction with [set operators](#set-operators), the `ORDER BY` +clause applies to the result set of the entire query; it does not apply only to +the closest `SELECT` statement. For this reason, it can be helpful (though it is +not required) to use parentheses to show the scope of the `ORDER BY`. -tableExpression: - tableReference [, tableReference ]* - | tableExpression [ ( LEFT | RIGHT ) [ OUTER ] ] JOIN tableExpression [ joinCondition ] +This query without parentheses: -booleanExpression: - expression [ IS NULL | IS NOT NULL ] - | expression [ > | >= | = | < | <= | <> ] expression - | booleanExpression [ AND | OR ] booleanExpression - | NOT booleanExpression - | '(' booleanExpression ')' +``` {.codehilite} +SELECT * FROM Roster +UNION ALL +SELECT * FROM TeamMascot +ORDER BY SchoolID; +``` + +is equivalent to this query with parentheses: + +``` {.codehilite} +( SELECT * FROM Roster + UNION ALL + SELECT * FROM TeamMascot ) +ORDER BY SchoolID; +``` + +but is not equivalent to this query, where the `ORDER BY` clause applies only to +the second `SELECT` statement: + +``` {.codehilite} +SELECT * FROM Roster +UNION ALL +( SELECT * FROM TeamMascot + ORDER BY SchoolID ); +``` + +You can also use integer literals as column references in `ORDER BY` clauses. An +integer literal becomes an ordinal (for example, counting starts at 1) into the +`SELECT` list. + +Example - the following two queries are equivalent: + +``` {.codehilite} +SELECT SUM(PointsScored), LastName +FROM PlayerStats +ORDER BY LastName; +``` + +``` {.codehilite} +SELECT SUM(PointsScored), LastName +FROM PlayerStats +ORDER BY 2; +``` + +[]{#set_operators} + +## Set operators + +### Syntax {#syntax_6} + + UNION { ALL | DISTINCT } | INTERSECT DISTINCT | EXCEPT DISTINCT + +Set operators combine results from two or more input queries into a single +result set. You must specify `ALL` or `DISTINCT`; if you specify `ALL`, then all +rows are retained. If `DISTINCT` is specified, duplicate rows are discarded. + +If a given row R appears exactly m times in the first input query and n times in +the second input query (m >= 0, n >= 0): + +- For `UNION ALL`, R appears exactly m + n times in the result. +- For `UNION DISTINCT`, the `DISTINCT` is computed after the `UNION` is + computed, so R appears exactly one time. +- For `INTERSECT DISTINCT`, the `DISTINCT` is computed after the result above + is computed. +- For `EXCEPT DISTINCT`, row R appears once in the output if m > 0 and + n = 0. +- If there are more than two input queries, the above operations generalize + and the output is the same as if the inputs were combined incrementally from + left to right. + +The following rules apply: + +- For set operations other than `UNION ALL`, all column types must support + equality comparison. +- The input queries on each side of the operator must return the same number + of columns. +- The operators pair the columns returned by each input query according to the + columns' positions in their respective `SELECT` lists. That is, the first + column in the first input query is paired with the first column in the + second input query. +- The result set always uses the column names from the first input query. +- The result set always uses the supertypes of input types in corresponding + columns, so paired columns must also have either the same data type or a + common supertype. +- You must use parentheses to separate different set operations; for this + purpose, set operations such as `UNION ALL` and `UNION DISTINCT` are + different. If the statement only repeats the same set operation, parentheses + are not necessary. + +Examples: + +``` {.codehilite} +query1 UNION ALL (query2 UNION DISTINCT query3) +query1 UNION ALL query2 UNION ALL query3 +``` + +Invalid: + + query1 UNION ALL query2 UNION DISTINCT query3 + query1 UNION ALL query2 INTERSECT ALL query3; // INVALID. + +[]{#union} + +### UNION + +The `UNION` operator combines the result sets of two or more input queries by +pairing columns from the result set of each query and vertically concatenating +them. + +[]{#intersect} + +### INTERSECT + +The `INTERSECT` operator returns rows that are found in the result sets of both +the left and right input queries. Unlike `EXCEPT`, the positioning of the input +queries (to the left vs. right of the `INTERSECT` operator) does not matter. + +[]{#except} + +### EXCEPT + +The `EXCEPT` operator returns rows from the left input query that are not +present in the right input query. + +[]{#limit-clause_and_offset_clause} + +## LIMIT clause and OFFSET clause + +### Syntax {#syntax_7} + +``` {.codehilite} +LIMIT count [ OFFSET skip_rows ] +``` + +`LIMIT` specifies a non-negative `count` of type INT64, and no more than `count` +rows will be returned. `LIMIT` `0` returns 0 rows. If there is a set operation, +`LIMIT` is applied after the set operation is evaluated. + +`OFFSET` specifies a non-negative `skip_rows` of type INT64, and only rows from +that offset in the table will be considered. + +These clauses accept only literal or parameter values. + +The rows that are returned by `LIMIT` and `OFFSET` is unspecified unless these +operators are used after `ORDER BY`. + +[]{#with_clause} + +## WITH clause + +The `WITH` clause contains one or more named subqueries which execute every time +a subsequent `SELECT` statement references them. Any clause or subquery can +reference subqueries you define in the `WITH` clause. This includes any `SELECT` +statements on either side of a set operator, such as `UNION`. + +The `WITH` clause is useful primarily for readability, because BigQuery does not +materialize the result of the queries inside the `WITH` clause. If a query +appears in more than one `WITH` clause, it executes in each clause. + +Example: + +``` {.codehilite} +WITH subQ1 AS (SELECT SchoolID FROM Roster), + subQ2 AS (SELECT OpponentID FROM PlayerStats) +SELECT * FROM subQ1 +UNION ALL +SELECT * FROM subQ2; +``` + +Another useful role of the `WITH` clause is to break up more complex queries +into a `WITH` `SELECT` statement and `WITH` clauses, where the less desirable +alternative is writing nested table subqueries. If a `WITH` clause contains +multiple subqueries, the subquery names cannot repeat. + +BigQuery supports `WITH` clauses in subqueries, such as table subqueries, +expression subqueries, and so on. + +``` {.codehilite} +WITH q1 AS (my_query) +SELECT * +FROM + (WITH q2 AS (SELECT * FROM q1) SELECT * FROM q2) +``` + +The following are scoping rules for `WITH` clauses: + +- Aliases are scoped so that the aliases introduced in a `WITH` clause are + visible only in the later subqueries in the same `WITH` clause, and in the + query under the `WITH` clause. +- Aliases introduced in the same `WITH` clause must be unique, but the same + alias can be used in multiple `WITH` clauses in the same query. The local + alias overrides any outer aliases anywhere that the local alias is visible. +- Aliased subqueries in a `WITH` clause can never be correlated. No columns + from outside the query are visible. The only names from outside that are + visible are other `WITH` aliases that were introduced earlier in the same + `WITH` clause. + +Here's an example of a statement that uses aliases in `WITH` subqueries: + +``` {.codehilite} +WITH q1 AS (my_query) +SELECT * +FROM + (WITH q2 AS (SELECT * FROM q1), # q1 resolves to my_query + q3 AS (SELECT * FROM q1), # q1 resolves to my_query + q1 AS (SELECT * FROM q1), # q1 (in the query) resolves to my_query + q4 AS (SELECT * FROM q1) # q1 resolves to the WITH subquery + # on the previous line. + SELECT * FROM q1) # q1 resolves to the third inner WITH subquery. +``` + +BigQuery does not support `WITH RECURSIVE`. + +[]{#using_aliases} + +## Aliases {#aliases_2} + +An alias is a temporary name given to a table, column, or expression present in +a query. You can introduce explicit aliases in the `SELECT` list or `FROM` +clause, or BigQuery will infer an implicit alias for some expressions. +Expressions with neither an explicit nor implicit alias are anonymous and the +query cannot reference them by name. + +[]{#explicit_alias_syntax} + +### Explicit alias syntax + +You can introduce explicit aliases in either the `FROM` clause or the `SELECT` +list. + +In a `FROM` clause, you can introduce explicit aliases for any item, including +tables, arrays, subqueries, and `UNNEST` clauses, using `[AS] alias`. The `AS` +keyword is optional. + +Example: + +``` {.codehilite} +SELECT s.FirstName, s2.SongName +FROM Singers AS s, (SELECT * FROM Songs) AS s2; +``` + +You can introduce explicit aliases for any expression in the `SELECT` list using +`[AS] alias`. The `AS` keyword is optional. + +Example: + +``` {.codehilite} +SELECT s.FirstName AS name, LOWER(s.FirstName) AS lname +FROM Singers s; +``` + +[]{#alias_visibility} + +### Explicit alias visibility + +After you introduce an explicit alias in a query, there are restrictions on +where else in the query you can reference that alias. These restrictions on +alias visibility are the result of BigQuery's name scoping rules. + +[]{#from_clause_aliases} + +#### FROM clause aliases + +BigQuery processes aliases in a `FROM` clause from left to right, and aliases +are visible only to subsequent path expressions in a `FROM` clause. + +Example: + +Assume the `Singers` table had a `Concerts` column of `ARRAY` type. + +``` {.codehilite} +SELECT FirstName +FROM Singers AS s, s.Concerts; +``` + +Invalid: + +``` {.codehilite} +SELECT FirstName +FROM s.Concerts, Singers AS s; // INVALID. +``` + +`FROM` clause aliases are **not** visible to subqueries in the same `FROM` +clause. Subqueries in a `FROM` clause cannot contain correlated references to +other tables in the same `FROM` clause. + +Invalid: + +``` {.codehilite} +SELECT FirstName +FROM Singers AS s, (SELECT (2020 - ReleaseDate) FROM s) // INVALID. +``` + +You can use any column name from a table in the `FROM` as an alias anywhere in +the query, with or without qualification with the table name. + +Example: + +``` {.codehilite} +SELECT FirstName, s.ReleaseDate +FROM Singers s WHERE ReleaseDate = 1975; +``` + +If the `FROM` clause contains an explicit alias, you must use the explicit alias +instead of the implicit alias for the remainder of the query (see +[Implicit Aliases](#implicit_aliases)). A table alias is useful for brevity or +to eliminate ambiguity in cases such as self-joins, where the same table is +scanned multiple times during query processing. + +Example: + +``` {.codehilite} +SELECT * FROM Singers as s, Songs as s2 +ORDER BY s.LastName +``` + +Invalid — `ORDER BY` does not use the table alias: + +``` {.codehilite} +SELECT * FROM Singers as s, Songs as s2 +ORDER BY Singers.LastName; // INVALID. +``` + +[]{#select-list_aliases} + +#### SELECT list aliases + +Aliases in the `SELECT` list are **visible only** to the following clauses: + +- `GROUP BY` clause +- `ORDER BY` clause +- `HAVING` clause + +Example: + +``` {.codehilite} +SELECT LastName AS last, SingerID +FROM Singers +ORDER BY last; +``` + +[]{#aliases_clauses} + +### Explicit aliases in GROUP BY, ORDER BY, and HAVING clauses + +These three clauses, `GROUP BY`, `ORDER BY`, and `HAVING`, can refer to only the +following values: + +- Tables in the `FROM` clause and any of their columns. +- Aliases from the `SELECT` list. + +`GROUP BY` and `ORDER BY` can also refer to a third group: + +- Integer literals, which refer to items in the `SELECT` list. The integer `1` + refers to the first item in the `SELECT` list, `2` refers to the second + item, etc. + +Example: + +``` {.codehilite} +SELECT SingerID AS sid, COUNT(Songid) AS s2id +FROM Songs +GROUP BY 1 +ORDER BY 2 DESC; +``` + +The query above is equivalent to: + +``` {.codehilite} +SELECT SingerID AS sid, COUNT(Songid) AS s2id +FROM Songs +GROUP BY sid +ORDER BY s2id DESC; +``` + +[]{#ambiguous_aliases} + +### Ambiguous aliases + +BigQuery provides an error if a name is ambiguous, meaning it can resolve to +more than one unique object. + +Examples: + +This query contains column names that conflict between tables, since both +`Singers` and `Songs` have a column named `SingerID`: -joinCondition: - ON booleanExpression +``` {.codehilite} +SELECT SingerID +FROM Singers, Songs; +``` + +This query contains aliases that are ambiguous in the `GROUP BY` clause because +they are duplicated in the `SELECT` list: + +``` {.codehilite} +SELECT FirstName AS name, LastName AS name, +FROM Singers +GROUP BY name; +``` + +Ambiguity between a `FROM` clause column name and a `SELECT` list alias in +`GROUP BY`: + +``` {.codehilite} +SELECT UPPER(LastName) AS LastName +FROM Singers +GROUP BY LastName; +``` + +The query above is ambiguous and will produce an error because `LastName` in the +`GROUP BY` clause could refer to the original column `LastName` in `Singers`, or +it could refer to the alias `AS LastName`, whose value is `UPPER(LastName)`. + +The same rules for ambiguity apply to path expressions. Consider the following +query where `table` has columns `x` and `y`, and column `z` is of type STRUCT +and has fields `v`, `w`, and `x`. + +Example: + +``` {.codehilite} +SELECT x, z AS T +FROM table T +GROUP BY T.x; +``` + +The alias `T` is ambiguous and will produce an error because `T.x` in the `GROUP +BY` clause could refer to either `table.x` or `table.z.x`. + +A name is **not** ambiguous in `GROUP BY`, `ORDER BY` or `HAVING` if it is both +a column name and a `SELECT` list alias, as long as the name resolves to the +same underlying object. + +Example: + +``` {.codehilite} +SELECT LastName, BirthYear AS BirthYear +FROM Singers +GROUP BY BirthYear; +``` + +The alias `BirthYear` is not ambiguous because it resolves to the same +underlying column, `Singers.BirthYear`. + +[]{#implicit_aliases} + +### Implicit aliases + +In the `SELECT` list, if there is an expression that does not have an explicit +alias, BigQuery assigns an implicit alias according to the following rules. +There can be multiple columns with the same alias in the `SELECT` list. + +- For identifiers, the alias is the identifier. For example, `SELECT abc` + implies `AS abc`. +- For path expressions, the alias is the last identifier in the path. For + example, `SELECT abc.def.ghi` implies `AS ghi`. +- For field access using the "dot" member field access operator, the alias is + the field name. For example, `SELECT (struct_function()).fname` implies `AS + fname`. + +In all other cases, there is no implicit alias, so the column is anonymous and +cannot be referenced by name. The data from that column will still be returned +and the displayed query results may have a generated label for that column, but +the label cannot be used like an alias. + +In a `FROM` clause, `from_item`s are not required to have an alias. The +following rules apply: + +If there is an expression that does not have an explicit alias, BigQuery assigns +an implicit alias in these cases: + +- For identifiers, the alias is the identifier. For example, `FROM abc` + implies `AS abc`. +- For path expressions, the alias is the last identifier in the path. For + example, `FROM abc.def.ghi` implies `AS ghi` +- The column produced using `WITH OFFSET` has the implicit alias `offset`. + +Table subqueries do not have implicit aliases. + +`FROM UNNEST(x)` does not have an implicit alias. + +[]{#appendix_a_examples_with_sample_data} + +## Appendix A: examples with sample data + +[]{#sample_tables} + +### Sample tables + +The following three tables contain sample data about athletes, their schools, +and the points they score during the season. These tables will be used to +illustrate the behavior of different query clauses. + +Table Roster: + +LastName SchoolID + +-------------------------------------------------------------------------------- + +Adams 50 Buchanan 52 Coolidge 52 Davis 51 Eisenhower 77 + +The Roster table includes a list of player names (LastName) and the unique ID +assigned to their school (SchoolID). + +Table PlayerStats: + +LastName OpponentID PointsScored + +-------------------------------------------------------------------------------- + +Adams 51 3 Buchanan 77 0 Coolidge 77 1 Adams 52 4 Buchanan 50 13 + +The PlayerStats table includes a list of player names (LastName) and the unique +ID assigned to the opponent they played in a given game (OpponentID) and the +number of points scored by the athlete in that game (PointsScored). + +Table TeamMascot: + +SchoolId Mascot + +-------------------------------------------------------------------------------- + +50 Jaguars 51 Knights 52 Lakers 53 Mustangs + +The TeamMascot table includes a list of unique school IDs (SchoolID) and the +mascot for that school (Mascot). + +[]{#join_types_examples} + +### JOIN types {#join-types_1} + +1\) \[INNER\] JOIN + +Example: + +``` {.codehilite} +SELECT * FROM Roster JOIN TeamMascot +ON Roster.SchoolID = TeamMascot.SchoolID; +``` + +Results: + +LastName Roster.SchoolId TeamMascot.SchoolId Mascot + +-------------------------------------------------------------------------------- + +Adams 50 50 Jaguars Buchanan 52 52 Lakers Coolidge 52 52 Lakers Davis 51 51 +Knights + +2\) CROSS JOIN + +Example: + +``` {.codehilite} +SELECT * FROM Roster CROSS JOIN TeamMascot +ON Roster.SchoolID = TeamMascot.SchoolID; +``` + +Results: + +LastName Roster.SchoolId TeamMascot.SchoolId Mascot -tableReference: - tableName [ [ AS ] alias ] +-------------------------------------------------------------------------------- -values: - VALUES expression [, expression ]* +Adams 50 50 Jaguars Adams 50 51 Knights Adams 50 52 Lakers Adams 50 53 Mustangs +Buchanan 52 50 Jaguars Buchanan 52 51 Knights Buchanan 52 52 Lakers Buchanan 52 +53 Mustangs Coolidge 52 50 Jaguars Coolidge 52 51 Knights Coolidge 52 52 Lakers +Coolidge 52 53 Mustangs Davis 51 50 Jaguars Davis 51 51 Knights Davis 51 52 +Lakers Davis 51 53 Mustangs Eisenhower 77 50 Jaguars Eisenhower 77 51 Knights +Eisenhower 77 52 Lakers Eisenhower 77 53 Mustangs + +3\) FULL \[OUTER\] JOIN + +Example: + +``` {.codehilite} +SELECT * FROM Roster FULL JOIN TeamMascot +ON Roster.SchoolID = TeamMascot.SchoolID; +``` + +LastName Roster.SchoolId TeamMascot.SchoolId Mascot + +-------------------------------------------------------------------------------- + +Adams 50 50 Jaguars Buchanan 52 52 Lakers Coolidge 52 52 Lakers Davis 51 51 +Knights Eisenhower 77 NULL NULL NULL NULL 53 Mustangs + +4\) LEFT \[OUTER\] JOIN + +Example: + +``` {.codehilite} +SELECT * FROM Roster LEFT JOIN TeamMascot +ON Roster.SchoolID = TeamMascot.SchoolID; +``` + +Results: + +LastName Roster.SchoolId TeamMascot.SchoolId Mascot + +-------------------------------------------------------------------------------- + +Adams 50 50 Jaguars Buchanan 52 52 Lakers Coolidge 52 52 Lakers Davis 51 51 +Knights Eisenhower 77 NULL NULL + +5\) RIGHT \[OUTER\] JOIN + +Example: + +``` {.codehilite} +SELECT * FROM Roster RIGHT JOIN TeamMascot +ON Roster.SchoolID = TeamMascot.SchoolID; +``` + +Results: + +LastName Roster.SchoolId TeamMascot.SchoolId Mascot + +-------------------------------------------------------------------------------- + +Adams 50 50 Jaguars Davis 51 51 Knights Coolidge 52 52 Lakers Buchanan 52 52 +Lakers NULL NULL 53 Mustangs + +[]{#group_by_clause} + +### GROUP BY clause {#group-by-clause_1} + +Example: + +``` {.codehilite} +SELECT LastName, SUM(PointsScored) +FROM PlayerStats +GROUP BY LastName; +``` + +LastName SUM + +-------------------------------------------------------------------------------- + +Adams 7 Buchanan 13 Coolidge 1 + +[]{#set_operators} + +### Set operators {#set-operators_1} + +[]{#union} + +#### UNION {#union_1} + +The `UNION` operator combines the result sets of two or more `SELECT` statements +by pairing columns from the result set of each `SELECT` statement and vertically +concatenating them. + +Example: + +``` {.codehilite} +SELECT Mascot AS X, SchoolID AS Y +FROM TeamMascot +UNION ALL +SELECT LastName, PointsScored +FROM PlayerStats; +``` + +Results: + +X Y + +-------------------------------------------------------------------------------- + +Mustangs 50 Knights 51 Lakers 52 Mustangs 53 Adams 3 Buchanan 0 Coolidge 1 Adams +4 Buchanan 13 + +[]{#intersect} + +#### INTERSECT {#intersect_1} + +This query returns the last names that are present in both Roster and +PlayerStats. + +``` {.codehilite} +SELECT LastName +FROM Roster +INTERSECT DISTINCT +SELECT LastName +FROM PlayerStats; +``` + +Results: + +LastName + +-------------------------------------------------------------------------------- + +Adams Coolidge Buchanan + +[]{#except} + +#### EXCEPT {#except_1} + +The query below returns last names in Roster that are **not** present in +PlayerStats. + +``` {.codehilite} +SELECT LastName +FROM Roster +EXCEPT DISTINCT +SELECT LastName +FROM PlayerStats; +``` + +Results: + +LastName + +-------------------------------------------------------------------------------- + +Eisenhower Davis + +Reversing the order of the `SELECT` statements will return last names in +PlayerStats that are **not** present in Roster: + +``` {.codehilite} +SELECT LastName +FROM PlayerStats +EXCEPT DISTINCT +SELECT LastName +FROM Roster; +``` -groupItem: - expression - | '(' expression [, expression ]* ')' - | HOP '(' expression [, expression ]* ')' - | TUMBLE '(' expression [, expression ]* ')' - | SESSION '(' expression [, expression ]* ')' +Results: +``` {.codehilite} +(empty) ``` +> Portions of this page are modifications based on work created and +> [shared by Google](https://developers.google.com/terms/site-policies) +> and used according to terms described in the [Creative Commons 3.0 +> Attribution License](http://creativecommons.org/licenses/by/3.0/).