This is an automated email from the ASF dual-hosted git repository. mergebot-role pushed a commit to branch mergebot in repository https://gitbox.apache.org/repos/asf/beam-site.git
commit 495dc33304989b45f8493c4e270ab82a5a11f7c9 Author: Andrew Pilloud <[email protected]> AuthorDate: Mon Sep 24 15:11:40 2018 -0700 Update SELECT doc for Beam --- src/documentation/dsls/sql/select.md | 1286 ++-------------------------------- 1 file changed, 70 insertions(+), 1216 deletions(-) diff --git a/src/documentation/dsls/sql/select.md b/src/documentation/dsls/sql/select.md index c15003e..f3a135f 100644 --- a/src/documentation/dsls/sql/select.md +++ b/src/documentation/dsls/sql/select.md @@ -33,7 +33,7 @@ batch/streaming model: - [Windowing & Triggering]({{ site.baseurl}}/documentation/dsls/sql/windowing-and-triggering/) Query statements scan one or more tables or expressions and return the computed -result rows. This topic describes the syntax for SQL queries in BigQuery. +result rows. This topic describes the syntax for SQL queries in Beam. ## SQL Syntax @@ -43,7 +43,6 @@ result rows. This topic describes the syntax for SQL queries in BigQuery. query_expr: { select | ( query_expr ) | query_expr set_op query_expr } - [ ORDER BY expression [{ ASC | DESC }] [, ...] ] [ LIMIT count [ OFFSET skip_rows ] ] select: @@ -55,18 +54,14 @@ result rows. This topic describes the syntax for SQL queries in BigQuery. [ WHERE bool_expression ] [ GROUP BY { expression [, ...] | ROLLUP ( expression [, ...] ) } ] [ HAVING bool_expression ] - [ WINDOW window_name AS ( window_definition ) [, ...] ] set_op: UNION { ALL | DISTINCT } | INTERSECT DISTINCT | EXCEPT DISTINCT from_item: { - table_name [ [ AS ] alias ] [ FOR SYSTEM TIME AS OF timestamp_expression ] | + table_name [ [ AS ] alias ] | join | - ( query_expr ) [ [ AS ] alias ] | - field_path | - { UNNEST( array_expression ) | UNNEST( array_path ) | array_path } - [ [ AS ] alias ] [ WITH OFFSET [ [ AS ] alias ] ] | + ( query_expr ) [ [ AS ] alias ] with_query_name [ [ AS ] alias ] } @@ -91,8 +86,7 @@ Notation: Syntax: SELECT [{ ALL | DISTINCT }] - { [ expression. ]* [ EXCEPT ( column_name [, ...] ) ] - [ REPLACE ( expression [ AS ] column_name [, ...] ) ] + { [ expression. ]* | expression [ [ AS ] alias ] } [, ...] The `SELECT` list defines the columns that the query will return. Expressions in @@ -110,8 +104,8 @@ Each item in the `SELECT` list is one of: `SELECT *`, often referred to as *select star*, produces one output column for each column that is visible after executing the full query. -``` {.codehilite} -SELECT * FROM (SELECT "apple" AS fruit, "carrot" AS vegetable); +``` +SELECT * FROM (SELECT 'apple' AS fruit, 'carrot' AS vegetable); +-------+-----------+ | fruit | vegetable | @@ -126,7 +120,7 @@ Items in a `SELECT` list can be expressions. These expressions evaluate to a single value and produce one output column, with an optional explicit `alias`. If the expression does not have an explicit alias, it receives an implicit alias -according to the rules for [implicit aliases](#implicit_aliases), if possible. +according to the rules for [implicit aliases](#implicit-aliases), if possible. Otherwise, the column is anonymous and you cannot refer to it by name elsewhere in the query. @@ -134,17 +128,16 @@ in the query. An item in a `SELECT` list can also take the form of `expression.*`. This produces one output column for each column or top-level field of `expression`. -The expression must either be a table alias or evaluate to a single value of a -data type with fields, such as a STRUCT. +The expression must be a table alias. The following query produces one output column for each column in the table `groceries`, aliased as `g`. -``` {.codehilite} +``` WITH groceries AS - (SELECT "milk" AS dairy, - "eggs" AS protein, - "bread" AS grain) + (SELECT 'milk' AS dairy, + 'eggs' AS protein, + 'bread' AS grain) SELECT g.* FROM groceries AS g; @@ -155,38 +148,6 @@ FROM groceries AS g; +-------+---------+-------+ ``` -More examples: - -``` {.codehilite} -WITH locations AS - (SELECT STRUCT("Seattle" AS city, "Washington" AS state) AS location - UNION ALL - SELECT STRUCT("Phoenix" AS city, "Arizona" AS state) AS location) -SELECT l.location.* -FROM locations l; - -+---------+------------+ -| city | state | -+---------+------------+ -| Seattle | Washington | -| Phoenix | Arizona | -+---------+------------+ -``` - -``` {.codehilite} -WITH locations AS - (SELECT ARRAY<STRUCT<city STRING, state STRING>>[("Seattle", "Washington"), - ("Phoenix", "Arizona")] AS location) -SELECT l.LOCATION[offset(0)].* -FROM locations l; - -+---------+------------+ -| city | state | -+---------+------------+ -| Seattle | Washington | -+---------+------------+ -``` - ### SELECT modifiers You can modify the results returned from a `SELECT` query, as follows. @@ -199,68 +160,6 @@ remaining rows. `SELECT DISTINCT` cannot return columns of the following types: - STRUCT - ARRAY -#### SELECT \* EXCEPT - -A `SELECT * EXCEPT` statement specifies the names of one or more columns to -exclude from the result. All matching column names are omitted from the output. - -``` {.codehilite} -WITH orders AS - (SELECT 5 as order_id, - "sprocket" as item_name, - 200 as quantity) -SELECT * EXCEPT (order_id) -FROM orders; - -+-----------+----------+ -| item_name | quantity | -+-----------+----------+ -| sprocket | 200 | -+-----------+----------+ -``` - -**Note:** `SELECT * EXCEPT` does not exclude columns that do not have names. - -#### SELECT \* REPLACE - -A `SELECT * REPLACE` statement specifies one or more `expression AS identifier` -clauses. Each identifier must match a column name from the `SELECT *` statement. -In the output column list, the column that matches the identifier in a `REPLACE` -clause is replaced by the expression in that `REPLACE` clause. - -A `SELECT * REPLACE` statement does not change the names or order of columns. -However, it can change the value and the value type. - -``` {.codehilite} -WITH orders AS - (SELECT 5 as order_id, - "sprocket" as item_name, - 200 as quantity) -SELECT * REPLACE ("widget" AS item_name) -FROM orders; - -+----------+-----------+----------+ -| order_id | item_name | quantity | -+----------+-----------+----------+ -| 5 | widget | 200 | -+----------+-----------+----------+ - -WITH orders AS - (SELECT 5 as order_id, - "sprocket" as item_name, - 200 as quantity) -SELECT * REPLACE (quantity/2 AS quantity) -FROM orders; - -+----------+-----------+----------+ -| order_id | item_name | quantity | -+----------+-----------+----------+ -| 5 | sprocket | 100 | -+----------+-----------+----------+ -``` - -**Note:** `SELECT * REPLACE` does not replace columns that do not have names. - #### SELECT ALL A `SELECT ALL` statement returns all rows, including duplicate rows. `SELECT @@ -268,21 +167,9 @@ ALL` is the default behavior of `SELECT`. ### Aliases -See [Aliases](#using_aliases) for information on syntax and visibility for +See [Aliases](#aliases_2) for information on syntax and visibility for `SELECT` list aliases. -[]{#analytic_functions} - -## Analytic functions - -Clauses related to analytic functions are documented elsewhere. - -- `OVER` Clause and `PARTITION BY`: See - [Analytic Functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#analytic-functions). - -- `WINDOW` Clause and Window Functions: See - [WINDOW Clause](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#window-clause). - ## FROM clause The `FROM` clause indicates the table or tables from which to retrieve rows, and @@ -292,12 +179,9 @@ processing in the rest of the query. ### Syntax from_item: { - table_name [ [ AS ] alias ] [ FOR SYSTEM TIME AS OF timestamp_expression ] | + table_name [ [ AS ] alias ] | join | ( query_expr ) [ [ AS ] alias ] | - field_path | - { UNNEST( array_expression ) | UNNEST( array_path ) | array_path } - [ [ AS ] alias ] [ WITH OFFSET [ [ AS ] alias ] ] | with_query_name [ [ AS ] alias ] } @@ -306,207 +190,19 @@ processing in the rest of the query. The name (optionally qualified) of an existing table. SELECT * FROM Roster; - SELECT * FROM dataset.Roster; - SELECT * FROM project.dataset.Roster; - -#### FOR SYSTEM TIME AS OF - -`FOR SYSTEM TIME AS OF` references the historical versions of the table -definition and rows that were current at `timestamp_expression`. - -Limitations: - -The source table in the `FROM` clause containing `FOR SYSTEM TIME AS OF` must -not be any of the following: - -- An `ARRAY` scan, including a - [flattened array](arrays#flattening-arrays-and-repeated-fields) or the - output of the `UNNEST` operator. -- A common table expression defined by a `WITH` clause. - -`timestamp_expression` must be a constant expression. It cannot contain the -following: - -- Subqueries. -- Correlated references (references to columns of a table that appear at a - higher level of the query statement, such as in the `SELECT` list). - -- User-defined functions (UDFs). - -The value of `timestamp_expression` cannot fall into the following ranges: - -- After the current timestamp (in the future). -- More than seven (7) days before the current timestamp. - -A single query statement cannot reference a single table at more than one point -in time, including the current time. That is, a query can reference a table -multiple times at the same timestamp, but not the current version and a -historical version, or two different historical versions. - -Examples: - -The following query returns a historical version of the table from one hour ago. - -``` {.codehilite} -SELECT * -FROM t - FOR SYSTEM TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR); -``` - -The following query returns a historical version of the table at an absolute -point in time. - -``` {.codehilite} -SELECT * -FROM t - FOR SYSTEM TIME AS OF '2017-01-01 10:00:00-07:00'; -``` - -The following query returns an error because the `timestamp_expression` contains -a correlated reference to a column in the containing query. - -``` {.codehilite} -SELECT * -FROM t1 -WHERE t1.a IN (SELECT t2.a - FROM t2 FOR SYSTEM TIME AS OF t1.timestamp_column); -``` + SELECT * FROM beam.Roster; #### join -See [JOIN Types](#join_types) below. +See [JOIN Types](#join-types) below and [Joins]({{ site.baseurl}}/documentation/dsls/sql/joins). #### select {#select_1} `( select ) [ [ AS ] alias ]` is a table [subquery](#subqueries). -#### field\_path - -In the `FROM` clause, `field_path` is any path that resolves to a field within a -data type. `field_path` can go arbitrarily deep into a nested data structure. - -Some examples of valid `field_path` values include: - - SELECT * FROM T1 t1, t1.array_column; - - SELECT * FROM T1 t1, t1.struct_column.array_field; - - SELECT (SELECT ARRAY_AGG(c) FROM t1.array_column c) FROM T1 t1; - - SELECT a.struct_field1 FROM T1 t1, t1.array_of_structs a; - - SELECT (SELECT STRING_AGG(a.struct_field1) FROM t1.array_of_structs a) FROM T1 t1; - -Field paths in the FROM clause must end in an array field. In addition, field -paths cannot contain arrays before the end of the path. For example, the path -`array_column.some_array.some_array_field` is invalid because it contains an -array before the end of the path. - -Note: If a path has only one name, it is interpreted as a table. To work around -this, wrap the path using `UNNEST`, or use the fully-qualified path. - -#### UNNEST - -The `UNNEST` operator takes an `ARRAY` and returns a table, with one row for -each element in the `ARRAY`. You can also use `UNNEST` outside of the `FROM` -clause with the -[`IN` operator](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#in-operators). - -For input `ARRAY`s of most element types, the output of `UNNEST` generally has -one column. This single column has an optional `alias`, which you can use to -refer to the column elsewhere in the query. `ARRAYS` with these element types -return multiple columns: - -- STRUCT - -`UNNEST` destroys the order of elements in the input `ARRAY`. Use the optional -`WITH OFFSET` clause to return a second column with the array element indexes -(see below). - -For an input `ARRAY` of `STRUCT`s, `UNNEST` returns a row for each `STRUCT`, -with a separate column for each field in the `STRUCT`. The alias for each column -is the name of the corresponding `STRUCT` field. - -**Example** - -``` {.codehilite} -SELECT * -FROM UNNEST(ARRAY<STRUCT<x INT64, y STRING>>[(1, 'foo'), (3, 'bar')]); - -+---+-----+ -| x | y | -+---+-----+ -| 3 | bar | -| 1 | foo | -+---+-----+ -``` - -Because the `UNNEST` operator returns a -[value table](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#value-tables), -you can alias `UNNEST` to define a range variable that you can reference -elsewhere in the query. If you reference the range variable in the `SELECT` -list, the query returns a `STRUCT` containing all of the fields of the original -`STRUCT` in the input table. - -**Example** - -``` {.codehilite} -SELECT *, struct_value -FROM UNNEST(ARRAY<STRUCT<x INT64, y STRING>>[(1, 'foo'), (3, 'bar')]) - AS struct_value; - -+---+-----+--------------+ -| x | y | struct_value | -+---+-----+--------------+ -| 3 | bar | {3, bar} | -| 1 | foo | {1, foo} | -+---+-----+--------------+ -``` - -ARRAY unnesting can be either explicit or implicit. In explicit unnesting, -`array_expression` must return an ARRAY value but does not need to resolve to an -ARRAY, and the `UNNEST` keyword is required. - -Example: - - SELECT * FROM UNNEST ([1, 2, 3]); - -In implicit unnesting, `array_path` must resolve to an ARRAY and the `UNNEST` -keyword is optional. - -Example: - - SELECT x - FROM mytable AS t, - t.struct_typed_column.array_typed_field1 AS x; - -In this scenario, `array_path` can go arbitrarily deep into a data structure, -but the last field must be ARRAY-typed. No previous field in the expression can -be ARRAY-typed because it is not possible to extract a named field from an -ARRAY. - -`UNNEST` treats NULLs as follows: - -- NULL and empty ARRAYs produces zero rows. -- An ARRAY containing NULLs produces rows containing NULL values. - -The optional `WITH` `OFFSET` clause returns a separate column containing the -"offset" value (i.e. counting starts at zero) for each row produced by the -`UNNEST` operation. This column has an optional `alias`; the default alias is -offset. - -Example: - - SELECT * FROM UNNEST ( ) WITH OFFSET AS num; - -See the -[`Arrays topic`](https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays) -for more ways to use `UNNEST`, including construction, flattening, and -filtering. - #### with\_query\_name -The query names in a `WITH` clause (see [WITH Clause](#with_clause)) act like +The query names in a `WITH` clause (see [WITH Clause](#with-clause)) act like names of temporary tables that you can reference anywhere in the `FROM` clause. In the example below, `subQ1` and `subQ2` are `with_query_names`. @@ -518,10 +214,7 @@ Example: SELECT DISTINCT * FROM subQ2; The `WITH` clause hides any permanent tables with the same name for the duration -of the query, unless you qualify the table name, e.g. `dataset.Roster` or -`project.dataset.Roster`. - -[]{#subqueries} +of the query, unless you qualify the table name, e.g. `beam.Roster`. ### Subqueries @@ -531,7 +224,7 @@ SELECTs". The full `SELECT` syntax is valid in subqueries. There are two types of subquery: -- [Expression Subqueries](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#expression-subqueries), +- Expression Subqueries which you can use in a query wherever expressions are valid. Expression subqueries return a single value. - Table subqueries, which you can use only in a `FROM` clause. The outer query @@ -541,7 +234,7 @@ Note that there must be parentheses around both types of subqueries. Example: -``` {.codehilite} +``` SELECT AVG ( PointsScored ) FROM ( SELECT PointsScored @@ -553,7 +246,7 @@ Optionally, a table subquery can have an alias. Example: -``` {.codehilite} +``` SELECT r.LastName FROM ( SELECT * FROM Roster) AS r; @@ -561,13 +254,13 @@ FROM ### Aliases {#aliases_1} -See [Aliases](#using_aliases) for information on syntax and visibility for +See [Aliases](#aliases_2) for information on syntax and visibility for `FROM` clause aliases. -[]{#join_types} - ## JOIN types +Also see [Joins]({{ site.baseurl}}/documentation/dsls/sql/joins). + ### Syntax {#syntax_1} join: @@ -600,37 +293,7 @@ without actually calculating the Cartesian product. ### CROSS JOIN -`CROSS JOIN` returns the Cartesian product of the two `from_item`s. In other -words, it retains all rows from both `from_item`s and combines each row from the -first `from_item`s with each row from the second `from_item`s. - -**Comma cross joins** - -`CROSS JOIN`s can be written explicitly (see directly above) or implicitly using -a comma to separate the `from_item`s. - -Example of an implicit "comma cross join": - -``` {.codehilite} -SELECT * FROM Roster, TeamMascot; -``` - -Here is the explicit cross join equivalent: - -``` {.codehilite} -SELECT * FROM Roster CROSS JOIN TeamMascot; -``` - -You cannot write comma cross joins inside parentheses. - -Invalid - comma cross join inside parentheses: - -``` {.codehilite} -SELECT * FROM t CROSS JOIN (Roster, TeamMascot); // INVALID. -``` - -See [Sequences of JOINs](#sequences_of_joins) for details on how a comma cross -join behaves in a sequence of JOINs. +`CROSS JOIN` is generally not yet supported. ### FULL \[OUTER\] JOIN @@ -638,12 +301,16 @@ A `FULL OUTER JOIN` (or simply `FULL JOIN`) returns all fields for all rows in both `from_item`s that meet the join condition. `FULL` indicates that *all rows* from both `from_item`s are returned, even if -they do not meet the join condition. +they do not meet the join condition. For streaming jobs, all rows that are +not late according to default trigger and belonging to the same window +if there's non-global window applied. `OUTER` indicates that if a given row from one `from_item` does not join to any row in the other `from_item`, the row will return with NULLs for all columns from the other `from_item`. +Also see [Joins]({{ site.baseurl}}/documentation/dsls/sql/joins). + ### LEFT \[OUTER\] JOIN The result of a `LEFT OUTER JOIN` (or simply `LEFT JOIN`) for two `from_item`s @@ -661,8 +328,6 @@ left `from_item` are discarded. The result of a `RIGHT OUTER JOIN` (or simply `RIGHT JOIN`) is similar and symmetric to that of `LEFT OUTER JOIN`. -[]{#on_clause} - ### ON clause The `ON` clause contains a `bool_expression`. A combined row (the result of @@ -670,13 +335,11 @@ joining two rows) meets the join condition if `bool_expression` returns TRUE. Example: -``` {.codehilite} +``` SELECT * FROM Roster INNER JOIN PlayerStats ON Roster.LastName = PlayerStats.LastName; ``` -[]{#using_clause} - ### USING clause The `USING` clause requires a `column_list` of one or more columns which occur @@ -686,7 +349,7 @@ rows meet the join condition if the equality comparison returns TRUE. In most cases, a statement with the `USING` keyword is equivalent to using the `ON` keyword. For example, the statement: -``` {.codehilite} +``` SELECT FirstName FROM Roster INNER JOIN PlayerStats USING (LastName); @@ -694,7 +357,7 @@ USING (LastName); is equivalent to: -``` {.codehilite} +``` SELECT FirstName FROM Roster INNER JOIN PlayerStats ON Roster.LastName = PlayerStats.LastName; @@ -703,7 +366,7 @@ ON Roster.LastName = PlayerStats.LastName; The results from queries with `USING` do differ from queries that use `ON` when you use `SELECT *`. To illustrate this, consider the query: -``` {.codehilite} +``` SELECT * FROM Roster INNER JOIN PlayerStats USING (LastName); ``` @@ -714,7 +377,7 @@ single `LastName` column. By contrast, consider the following query: -``` {.codehilite} +``` SELECT * FROM Roster INNER JOIN PlayerStats ON Roster.LastName = PlayerStats.LastName; ``` @@ -723,70 +386,24 @@ This statement returns the rows from `Roster` and `PlayerStats` where `Roster.LastName` is the same as `PlayerStats.LastName`. The results include two `LastName` columns; one from `Roster` and one from `PlayerStats`. -[]{#sequences_of_joins} - ### Sequences of JOINs The `FROM` clause can contain multiple `JOIN` clauses in sequence. Example: -``` {.codehilite} +``` SELECT * FROM a LEFT JOIN b ON TRUE LEFT JOIN c ON TRUE; ``` where `a`, `b`, and `c` are any `from_item`s. JOINs are bound from left to right, but you can insert parentheses to group them in a different order. -Consider the following queries: A (without parentheses) and B (with parentheses) -are equivalent to each other but not to C. The `FULL JOIN` in **bold** binds -first. - -A. - - SELECT * FROM Roster FULL JOIN TeamMascot USING (SchoolID) - FULL JOIN PlayerStats USING (LastName); - -B. - - SELECT * FROM ( (Roster FULL JOIN TeamMascot USING (SchoolID)) - FULL JOIN PlayerStats USING (LastName)); - -C. - - SELECT * FROM (Roster FULL JOIN (TeamMascot FULL JOIN PlayerStats USING - (LastName)) USING (SchoolID)) ; - -When comma cross joins are present in a query with a sequence of JOINs, they -group from left to right like other `JOIN` types. - -Example: - -``` {.codehilite} -SELECT * FROM a JOIN b ON TRUE, b JOIN c ON TRUE; -``` - -The query above is equivalent to - -``` {.codehilite} -SELECT * FROM ((a JOIN b ON TRUE) CROSS JOIN b) JOIN c ON TRUE); -``` - -There cannot be a `RIGHT JOIN` or `FULL JOIN` after a comma join. - -Invalid - `RIGHT JOIN` after a comma cross join: - -``` {.codehilite} -SELECT * FROM Roster, TeamMascot RIGHT JOIN PlayerStats ON TRUE; // INVALID. -``` - -[]{#where_clause} - ## WHERE clause ### Syntax {#syntax_2} -``` {.codehilite} +``` WHERE bool_expression ``` @@ -796,7 +413,7 @@ that return FALSE or NULL). Example: -``` {.codehilite} +``` SELECT * FROM Roster WHERE SchoolID = 52; ``` @@ -805,9 +422,9 @@ The `bool_expression` can contain multiple sub-conditions. Example: -``` {.codehilite} +``` SELECT * FROM Roster -WHERE STARTS_WITH(LastName, "Mc") OR STARTS_WITH(LastName, "Mac"); +WHERE LastName LIKE 'Mc%' OR LastName LIKE 'Mac%'; ``` You cannot reference column aliases from the `SELECT` list in the `WHERE` @@ -819,22 +436,22 @@ expression using `CROSS JOIN` and `WHERE`. Example - this query: -``` {.codehilite} +``` SELECT * FROM Roster INNER JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID; ``` is equivalent to: -``` {.codehilite} +``` SELECT * FROM Roster CROSS JOIN TeamMascot WHERE Roster.SchoolID = TeamMascot.SchoolID; ``` -[]{#group_by_clause} - ## GROUP BY clause +Also see [Windowing & Triggering]({{ site.baseurl}}/documentation/dsls/sql/windowing-and-triggering/) + ### Syntax {#syntax_3} GROUP BY { expression [, ...] | ROLLUP ( expression [, ...] ) } @@ -843,163 +460,21 @@ The `GROUP BY` clause groups together rows in a table with non-distinct values for the `expression` in the `GROUP BY` clause. For multiple rows in the source table with non-distinct values for `expression`, the `GROUP BY` clause produces a single combined row. `GROUP BY` is commonly used when aggregate functions are -present in the `SELECT` list, or to eliminate redundancy in the output. The data -type of `expression` must be -[groupable](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#data-type-properties). +present in the `SELECT` list, or to eliminate redundancy in the output. Example: -``` {.codehilite} +``` SELECT SUM(PointsScored), LastName FROM PlayerStats GROUP BY LastName; ``` -The `GROUP BY` clause can refer to expression names in the `SELECT` list. The -`GROUP BY` clause also allows ordinal references to expressions in the `SELECT` -list using integer values. `1` refers to the first expression in the `SELECT` -list, `2` the second, and so forth. The expression list can combine ordinals and -expression names. - -Example: - -``` {.codehilite} -SELECT SUM(PointsScored), LastName, FirstName -FROM PlayerStats -GROUP BY LastName, FirstName; -``` - -The query above is equivalent to: - -``` {.codehilite} -SELECT SUM(PointsScored), LastName, FirstName -FROM PlayerStats -GROUP BY 2, FirstName; -``` - -`GROUP BY` clauses may also refer to aliases. If a query contains aliases in the -`SELECT` clause, those aliases override names in the corresponding `FROM` -clause. - -Example: - -``` {.codehilite} -SELECT SUM(PointsScored), LastName as last_name -FROM PlayerStats -GROUP BY last_name; -``` - -`GROUP BY ROLLUP` returns the results of `GROUP BY` for prefixes of the -expressions in the `ROLLUP` list, each of which is known as a *grouping set*. -For the `ROLLUP` list `(a, b, c)`, the grouping sets are `(a, b, c)`, `(a, b)`, -`(a)`, `()`. When evaluating the results of `GROUP BY` for a particular grouping -set, `GROUP BY ROLLUP` treats expressions that are not in the grouping set as -having a `NULL` value. A `SELECT` statement like this one: - -``` {.codehilite} -SELECT a, b, SUM(c) FROM Input GROUP BY ROLLUP(a, b); -``` - -uses the rollup list `(a, b)`. The result will include the results of `GROUP BY` -for the grouping sets `(a, b)`, `(a)`, and `()`, which includes all rows. This -returns the same rows as: - -``` {.codehilite} -SELECT NULL, NULL, SUM(c) FROM Input UNION ALL -SELECT a, NULL, SUM(c) FROM Input GROUP BY a UNION ALL -SELECT a, b, SUM(c) FROM Input GROUP BY a, b; -``` - -This allows the computation of aggregates for the grouping sets defined by the -expressions in the `ROLLUP` list and the prefixes of that list. - -Example: - -``` {.codehilite} -WITH Sales AS ( - SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL - SELECT 123, 1, 8.99 UNION ALL - SELECT 456, 1, 4.56 UNION ALL - SELECT 123, 2, 9.99 UNION ALL - SELECT 789, 3, 1.00 UNION ALL - SELECT 456, 3, 4.25 UNION ALL - SELECT 789, 3, 0.99 -) -SELECT - day, - SUM(price) AS total -FROM Sales -GROUP BY ROLLUP(day); -``` - -The query above outputs a row for each day in addition to the rolled up total -across all days, as indicated by a `NULL` day: - -``` {.codehilite} -+------+-------+ -| day | total | -+------+-------+ -| NULL | 39.77 | -| 1 | 23.54 | -| 2 | 9.99 | -| 3 | 6.24 | -+------+-------+ -``` - -Example: - -``` {.codehilite} -WITH Sales AS ( - SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL - SELECT 123, 1, 8.99 UNION ALL - SELECT 456, 1, 4.56 UNION ALL - SELECT 123, 2, 9.99 UNION ALL - SELECT 789, 3, 1.00 UNION ALL - SELECT 456, 3, 4.25 UNION ALL - SELECT 789, 3, 0.99 -) -SELECT - sku, - day, - SUM(price) AS total -FROM Sales -GROUP BY ROLLUP(sku, day) -ORDER BY sku, day; -``` - -The query above returns rows grouped by the following grouping sets: - -- sku and day -- sku (day is `NULL`) -- The empty grouping set (day and sku are `NULL`) - -The sums for these grouping sets correspond to the total for each distinct -sku-day combination, the total for each sku across all days, and the grand -total: - -``` {.codehilite} -+------+------+-------+ -| sku | day | total | -+------+------+-------+ -| NULL | NULL | 39.77 | -| 123 | NULL | 28.97 | -| 123 | 1 | 18.98 | -| 123 | 2 | 9.99 | -| 456 | NULL | 8.81 | -| 456 | 1 | 4.56 | -| 456 | 3 | 4.25 | -| 789 | 3 | 1.99 | -| 789 | NULL | 1.99 | -+------+------+-------+ -``` - -[]{#having_clause} - ## HAVING clause ### Syntax {#syntax_4} -``` {.codehilite} +``` HAVING bool_expression ``` @@ -1013,8 +488,8 @@ The `HAVING` clause differs from the `WHERE` clause in that: - The `HAVING` clause requires `GROUP BY` or aggregation to be present in the query. -- The `HAVING` clause occurs after `GROUP BY` and aggregation, and before - `ORDER BY`. This means that the `HAVING` clause is evaluated once for every +- The `HAVING` clause occurs after `GROUP BY` and aggregation. + This means that the `HAVING` clause is evaluated once for every aggregated row in the result set. This differs from the `WHERE` clause, which is evaluated before `GROUP BY` and aggregation. @@ -1023,149 +498,13 @@ well as `SELECT` list aliases. Expressions referenced in the `HAVING` clause must either appear in the `GROUP BY` clause or they must be the result of an aggregate function: -``` {.codehilite} -SELECT LastName -FROM Roster -GROUP BY LastName -HAVING SUM(PointsScored) > 15; -``` - -If a query contains aliases in the `SELECT` clause, those aliases override names -in a `FROM` clause. - -``` {.codehilite} -SELECT LastName, SUM(PointsScored) AS ps -FROM Roster -GROUP BY LastName -HAVING ps > 0; -``` - -[]{#mandatory_aggregation} - -### Mandatory aggregation - -Aggregation does not have to be present in the `HAVING` clause itself, but -aggregation must be present in at least one of the following forms: - -#### Aggregation function in the `SELECT` list. {#aggregation-function-in-the-select-list} - -``` {.codehilite} -SELECT LastName, SUM(PointsScored) AS total -FROM PlayerStats -GROUP BY LastName -HAVING total > 15; ``` - -#### Aggregation function in the 'HAVING' clause. {#aggregation-function-in-the-having-clause} - -``` {.codehilite} SELECT LastName -FROM PlayerStats -GROUP BY LastName -HAVING SUM(PointsScored) > 15; -``` - -#### Aggregation in both the `SELECT` list and `HAVING` clause. {#aggregation-in-both-the-select-list-and-having-clause} - -When aggregation functions are present in both the `SELECT` list and `HAVING` -clause, the aggregation functions and the columns they reference do not need to -be the same. In the example below, the two aggregation functions, `COUNT()` and -`SUM()`, are different and also use different columns. - -``` {.codehilite} -SELECT LastName, COUNT(*) -FROM PlayerStats +FROM Roster GROUP BY LastName HAVING SUM(PointsScored) > 15; ``` -[]{#order_by_clause} - -## ORDER BY clause - -### Syntax {#syntax_5} - - ORDER BY expression [{ ASC | DESC }] [, ...] - -The `ORDER BY` clause specifies a column or expression as the sort criterion for -the result set. If an ORDER BY clause is not present, the order of the results -of a query is not defined. The default sort direction is `ASC`, which sorts the -results in ascending order of `expression` values. `DESC` sorts the results in -descending order. Column aliases from a `FROM` clause or `SELECT` list are -allowed. If a query contains aliases in the `SELECT` clause, those aliases -override names in the corresponding `FROM` clause. - -It is possible to order by multiple columns. In the example below, the result -set is ordered first by `SchoolID` and then by `LastName`: - -``` {.codehilite} -SELECT LastName, PointsScored, OpponentID -FROM PlayerStats -ORDER BY SchoolID, LastName; -``` - -The following rules apply when ordering values: - -- NULLs: In the context of the `ORDER BY` clause, NULLs are the minimum - possible value; that is, NULLs appear first in `ASC` sorts and last in - `DESC` sorts. -- Floating point data types: see - [Floating Point Semantics](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#floating-point-semantics) - on ordering and grouping. - -When used in conjunction with [set operators](#set-operators), the `ORDER BY` -clause applies to the result set of the entire query; it does not apply only to -the closest `SELECT` statement. For this reason, it can be helpful (though it is -not required) to use parentheses to show the scope of the `ORDER BY`. - -This query without parentheses: - -``` {.codehilite} -SELECT * FROM Roster -UNION ALL -SELECT * FROM TeamMascot -ORDER BY SchoolID; -``` - -is equivalent to this query with parentheses: - -``` {.codehilite} -( SELECT * FROM Roster - UNION ALL - SELECT * FROM TeamMascot ) -ORDER BY SchoolID; -``` - -but is not equivalent to this query, where the `ORDER BY` clause applies only to -the second `SELECT` statement: - -``` {.codehilite} -SELECT * FROM Roster -UNION ALL -( SELECT * FROM TeamMascot - ORDER BY SchoolID ); -``` - -You can also use integer literals as column references in `ORDER BY` clauses. An -integer literal becomes an ordinal (for example, counting starts at 1) into the -`SELECT` list. - -Example - the following two queries are equivalent: - -``` {.codehilite} -SELECT SUM(PointsScored), LastName -FROM PlayerStats -ORDER BY LastName; -``` - -``` {.codehilite} -SELECT SUM(PointsScored), LastName -FROM PlayerStats -ORDER BY 2; -``` - -[]{#set_operators} - ## Set operators ### Syntax {#syntax_6} @@ -1211,7 +550,7 @@ The following rules apply: Examples: -``` {.codehilite} +``` query1 UNION ALL (query2 UNION DISTINCT query3) query1 UNION ALL query2 UNION ALL query3 ``` @@ -1221,52 +560,41 @@ Invalid: query1 UNION ALL query2 UNION DISTINCT query3 query1 UNION ALL query2 INTERSECT ALL query3; // INVALID. -[]{#union} - ### UNION The `UNION` operator combines the result sets of two or more input queries by pairing columns from the result set of each query and vertically concatenating them. -[]{#intersect} - ### INTERSECT The `INTERSECT` operator returns rows that are found in the result sets of both the left and right input queries. Unlike `EXCEPT`, the positioning of the input queries (to the left vs. right of the `INTERSECT` operator) does not matter. -[]{#except} - ### EXCEPT The `EXCEPT` operator returns rows from the left input query that are not present in the right input query. -[]{#limit-clause_and_offset_clause} - ## LIMIT clause and OFFSET clause ### Syntax {#syntax_7} -``` {.codehilite} +``` LIMIT count [ OFFSET skip_rows ] ``` -`LIMIT` specifies a non-negative `count` of type INT64, and no more than `count` +`LIMIT` specifies a non-negative `count` of type INTEGER, and no more than `count` rows will be returned. `LIMIT` `0` returns 0 rows. If there is a set operation, `LIMIT` is applied after the set operation is evaluated. -`OFFSET` specifies a non-negative `skip_rows` of type INT64, and only rows from +`OFFSET` specifies a non-negative `skip_rows` of type INTEGER, and only rows from that offset in the table will be considered. These clauses accept only literal or parameter values. -The rows that are returned by `LIMIT` and `OFFSET` is unspecified unless these -operators are used after `ORDER BY`. - -[]{#with_clause} +The rows that are returned by `LIMIT` and `OFFSET` is unspecified. ## WITH clause @@ -1275,13 +603,9 @@ a subsequent `SELECT` statement references them. Any clause or subquery can reference subqueries you define in the `WITH` clause. This includes any `SELECT` statements on either side of a set operator, such as `UNION`. -The `WITH` clause is useful primarily for readability, because BigQuery does not -materialize the result of the queries inside the `WITH` clause. If a query -appears in more than one `WITH` clause, it executes in each clause. - Example: -``` {.codehilite} +``` WITH subQ1 AS (SELECT SchoolID FROM Roster), subQ2 AS (SELECT OpponentID FROM PlayerStats) SELECT * FROM subQ1 @@ -1289,62 +613,14 @@ UNION ALL SELECT * FROM subQ2; ``` -Another useful role of the `WITH` clause is to break up more complex queries -into a `WITH` `SELECT` statement and `WITH` clauses, where the less desirable -alternative is writing nested table subqueries. If a `WITH` clause contains -multiple subqueries, the subquery names cannot repeat. - -BigQuery supports `WITH` clauses in subqueries, such as table subqueries, -expression subqueries, and so on. - -``` {.codehilite} -WITH q1 AS (my_query) -SELECT * -FROM - (WITH q2 AS (SELECT * FROM q1) SELECT * FROM q2) -``` - -The following are scoping rules for `WITH` clauses: - -- Aliases are scoped so that the aliases introduced in a `WITH` clause are - visible only in the later subqueries in the same `WITH` clause, and in the - query under the `WITH` clause. -- Aliases introduced in the same `WITH` clause must be unique, but the same - alias can be used in multiple `WITH` clauses in the same query. The local - alias overrides any outer aliases anywhere that the local alias is visible. -- Aliased subqueries in a `WITH` clause can never be correlated. No columns - from outside the query are visible. The only names from outside that are - visible are other `WITH` aliases that were introduced earlier in the same - `WITH` clause. - -Here's an example of a statement that uses aliases in `WITH` subqueries: - -``` {.codehilite} -WITH q1 AS (my_query) -SELECT * -FROM - (WITH q2 AS (SELECT * FROM q1), # q1 resolves to my_query - q3 AS (SELECT * FROM q1), # q1 resolves to my_query - q1 AS (SELECT * FROM q1), # q1 (in the query) resolves to my_query - q4 AS (SELECT * FROM q1) # q1 resolves to the WITH subquery - # on the previous line. - SELECT * FROM q1) # q1 resolves to the third inner WITH subquery. -``` - -BigQuery does not support `WITH RECURSIVE`. - -[]{#using_aliases} - ## Aliases {#aliases_2} An alias is a temporary name given to a table, column, or expression present in a query. You can introduce explicit aliases in the `SELECT` list or `FROM` -clause, or BigQuery will infer an implicit alias for some expressions. +clause, or Beam will infer an implicit alias for some expressions. Expressions with neither an explicit nor implicit alias are anonymous and the query cannot reference them by name. -[]{#explicit_alias_syntax} - ### Explicit alias syntax You can introduce explicit aliases in either the `FROM` clause or the `SELECT` @@ -1356,9 +632,9 @@ keyword is optional. Example: -``` {.codehilite} +``` SELECT s.FirstName, s2.SongName -FROM Singers AS s, (SELECT * FROM Songs) AS s2; +FROM Singers AS s JOIN Songs AS s2 ON s.SingerID = s2.SingerID; ``` You can introduce explicit aliases for any expression in the `SELECT` list using @@ -1366,140 +642,25 @@ You can introduce explicit aliases for any expression in the `SELECT` list using Example: -``` {.codehilite} +``` SELECT s.FirstName AS name, LOWER(s.FirstName) AS lname FROM Singers s; ``` -[]{#alias_visibility} - ### Explicit alias visibility After you introduce an explicit alias in a query, there are restrictions on where else in the query you can reference that alias. These restrictions on -alias visibility are the result of BigQuery's name scoping rules. - -[]{#from_clause_aliases} +alias visibility are the result of Beam's name scoping rules. #### FROM clause aliases -BigQuery processes aliases in a `FROM` clause from left to right, and aliases -are visible only to subsequent path expressions in a `FROM` clause. - -Example: - -Assume the `Singers` table had a `Concerts` column of `ARRAY` type. - -``` {.codehilite} -SELECT FirstName -FROM Singers AS s, s.Concerts; -``` - -Invalid: - -``` {.codehilite} -SELECT FirstName -FROM s.Concerts, Singers AS s; // INVALID. -``` - -`FROM` clause aliases are **not** visible to subqueries in the same `FROM` -clause. Subqueries in a `FROM` clause cannot contain correlated references to -other tables in the same `FROM` clause. - -Invalid: - -``` {.codehilite} -SELECT FirstName -FROM Singers AS s, (SELECT (2020 - ReleaseDate) FROM s) // INVALID. -``` - -You can use any column name from a table in the `FROM` as an alias anywhere in -the query, with or without qualification with the table name. - -Example: - -``` {.codehilite} -SELECT FirstName, s.ReleaseDate -FROM Singers s WHERE ReleaseDate = 1975; -``` - -If the `FROM` clause contains an explicit alias, you must use the explicit alias -instead of the implicit alias for the remainder of the query (see -[Implicit Aliases](#implicit_aliases)). A table alias is useful for brevity or -to eliminate ambiguity in cases such as self-joins, where the same table is -scanned multiple times during query processing. - -Example: - -``` {.codehilite} -SELECT * FROM Singers as s, Songs as s2 -ORDER BY s.LastName -``` - -Invalid — `ORDER BY` does not use the table alias: - -``` {.codehilite} -SELECT * FROM Singers as s, Songs as s2 -ORDER BY Singers.LastName; // INVALID. -``` - -[]{#select-list_aliases} - -#### SELECT list aliases - -Aliases in the `SELECT` list are **visible only** to the following clauses: - -- `GROUP BY` clause -- `ORDER BY` clause -- `HAVING` clause - -Example: - -``` {.codehilite} -SELECT LastName AS last, SingerID -FROM Singers -ORDER BY last; -``` - -[]{#aliases_clauses} - -### Explicit aliases in GROUP BY, ORDER BY, and HAVING clauses - -These three clauses, `GROUP BY`, `ORDER BY`, and `HAVING`, can refer to only the -following values: - -- Tables in the `FROM` clause and any of their columns. -- Aliases from the `SELECT` list. - -`GROUP BY` and `ORDER BY` can also refer to a third group: - -- Integer literals, which refer to items in the `SELECT` list. The integer `1` - refers to the first item in the `SELECT` list, `2` refers to the second - item, etc. - -Example: - -``` {.codehilite} -SELECT SingerID AS sid, COUNT(Songid) AS s2id -FROM Songs -GROUP BY 1 -ORDER BY 2 DESC; -``` - -The query above is equivalent to: - -``` {.codehilite} -SELECT SingerID AS sid, COUNT(Songid) AS s2id -FROM Songs -GROUP BY sid -ORDER BY s2id DESC; -``` - -[]{#ambiguous_aliases} +Beam processes aliases in a `FROM` clause from left to right, and aliases +are visible only to subsequent `JOIN` clauses. ### Ambiguous aliases -BigQuery provides an error if a name is ambiguous, meaning it can resolve to +Beam provides an error if a name is ambiguous, meaning it can resolve to more than one unique object. Examples: @@ -1507,69 +668,15 @@ Examples: This query contains column names that conflict between tables, since both `Singers` and `Songs` have a column named `SingerID`: -``` {.codehilite} +``` SELECT SingerID FROM Singers, Songs; ``` -This query contains aliases that are ambiguous in the `GROUP BY` clause because -they are duplicated in the `SELECT` list: - -``` {.codehilite} -SELECT FirstName AS name, LastName AS name, -FROM Singers -GROUP BY name; -``` - -Ambiguity between a `FROM` clause column name and a `SELECT` list alias in -`GROUP BY`: - -``` {.codehilite} -SELECT UPPER(LastName) AS LastName -FROM Singers -GROUP BY LastName; -``` - -The query above is ambiguous and will produce an error because `LastName` in the -`GROUP BY` clause could refer to the original column `LastName` in `Singers`, or -it could refer to the alias `AS LastName`, whose value is `UPPER(LastName)`. - -The same rules for ambiguity apply to path expressions. Consider the following -query where `table` has columns `x` and `y`, and column `z` is of type STRUCT -and has fields `v`, `w`, and `x`. - -Example: - -``` {.codehilite} -SELECT x, z AS T -FROM table T -GROUP BY T.x; -``` - -The alias `T` is ambiguous and will produce an error because `T.x` in the `GROUP -BY` clause could refer to either `table.x` or `table.z.x`. - -A name is **not** ambiguous in `GROUP BY`, `ORDER BY` or `HAVING` if it is both -a column name and a `SELECT` list alias, as long as the name resolves to the -same underlying object. - -Example: - -``` {.codehilite} -SELECT LastName, BirthYear AS BirthYear -FROM Singers -GROUP BY BirthYear; -``` - -The alias `BirthYear` is not ambiguous because it resolves to the same -underlying column, `Singers.BirthYear`. - -[]{#implicit_aliases} - ### Implicit aliases In the `SELECT` list, if there is an expression that does not have an explicit -alias, BigQuery assigns an implicit alias according to the following rules. +alias, Beam assigns an implicit alias according to the following rules. There can be multiple columns with the same alias in the `SELECT` list. - For identifiers, the alias is the identifier. For example, `SELECT abc` @@ -1588,274 +695,21 @@ the label cannot be used like an alias. In a `FROM` clause, `from_item`s are not required to have an alias. The following rules apply: -If there is an expression that does not have an explicit alias, BigQuery assigns +If there is an expression that does not have an explicit alias, Beam assigns an implicit alias in these cases: - For identifiers, the alias is the identifier. For example, `FROM abc` implies `AS abc`. - For path expressions, the alias is the last identifier in the path. For example, `FROM abc.def.ghi` implies `AS ghi` -- The column produced using `WITH OFFSET` has the implicit alias `offset`. Table subqueries do not have implicit aliases. `FROM UNNEST(x)` does not have an implicit alias. -[]{#appendix_a_examples_with_sample_data} - -## Appendix A: examples with sample data - -[]{#sample_tables} - -### Sample tables - -The following three tables contain sample data about athletes, their schools, -and the points they score during the season. These tables will be used to -illustrate the behavior of different query clauses. - -Table Roster: - -LastName SchoolID - --------------------------------------------------------------------------------- - -Adams 50 Buchanan 52 Coolidge 52 Davis 51 Eisenhower 77 - -The Roster table includes a list of player names (LastName) and the unique ID -assigned to their school (SchoolID). - -Table PlayerStats: - -LastName OpponentID PointsScored - --------------------------------------------------------------------------------- - -Adams 51 3 Buchanan 77 0 Coolidge 77 1 Adams 52 4 Buchanan 50 13 - -The PlayerStats table includes a list of player names (LastName) and the unique -ID assigned to the opponent they played in a given game (OpponentID) and the -number of points scored by the athlete in that game (PointsScored). - -Table TeamMascot: - -SchoolId Mascot - --------------------------------------------------------------------------------- - -50 Jaguars 51 Knights 52 Lakers 53 Mustangs - -The TeamMascot table includes a list of unique school IDs (SchoolID) and the -mascot for that school (Mascot). - -[]{#join_types_examples} - -### JOIN types {#join-types_1} - -1\) \[INNER\] JOIN - -Example: - -``` {.codehilite} -SELECT * FROM Roster JOIN TeamMascot -ON Roster.SchoolID = TeamMascot.SchoolID; -``` - -Results: - -LastName Roster.SchoolId TeamMascot.SchoolId Mascot - --------------------------------------------------------------------------------- - -Adams 50 50 Jaguars Buchanan 52 52 Lakers Coolidge 52 52 Lakers Davis 51 51 -Knights - -2\) CROSS JOIN - -Example: - -``` {.codehilite} -SELECT * FROM Roster CROSS JOIN TeamMascot -ON Roster.SchoolID = TeamMascot.SchoolID; -``` - -Results: - -LastName Roster.SchoolId TeamMascot.SchoolId Mascot - --------------------------------------------------------------------------------- - -Adams 50 50 Jaguars Adams 50 51 Knights Adams 50 52 Lakers Adams 50 53 Mustangs -Buchanan 52 50 Jaguars Buchanan 52 51 Knights Buchanan 52 52 Lakers Buchanan 52 -53 Mustangs Coolidge 52 50 Jaguars Coolidge 52 51 Knights Coolidge 52 52 Lakers -Coolidge 52 53 Mustangs Davis 51 50 Jaguars Davis 51 51 Knights Davis 51 52 -Lakers Davis 51 53 Mustangs Eisenhower 77 50 Jaguars Eisenhower 77 51 Knights -Eisenhower 77 52 Lakers Eisenhower 77 53 Mustangs - -3\) FULL \[OUTER\] JOIN - -Example: - -``` {.codehilite} -SELECT * FROM Roster FULL JOIN TeamMascot -ON Roster.SchoolID = TeamMascot.SchoolID; -``` - -LastName Roster.SchoolId TeamMascot.SchoolId Mascot - --------------------------------------------------------------------------------- - -Adams 50 50 Jaguars Buchanan 52 52 Lakers Coolidge 52 52 Lakers Davis 51 51 -Knights Eisenhower 77 NULL NULL NULL NULL 53 Mustangs - -4\) LEFT \[OUTER\] JOIN - -Example: - -``` {.codehilite} -SELECT * FROM Roster LEFT JOIN TeamMascot -ON Roster.SchoolID = TeamMascot.SchoolID; -``` - -Results: - -LastName Roster.SchoolId TeamMascot.SchoolId Mascot - --------------------------------------------------------------------------------- - -Adams 50 50 Jaguars Buchanan 52 52 Lakers Coolidge 52 52 Lakers Davis 51 51 -Knights Eisenhower 77 NULL NULL - -5\) RIGHT \[OUTER\] JOIN - -Example: - -``` {.codehilite} -SELECT * FROM Roster RIGHT JOIN TeamMascot -ON Roster.SchoolID = TeamMascot.SchoolID; -``` - -Results: - -LastName Roster.SchoolId TeamMascot.SchoolId Mascot - --------------------------------------------------------------------------------- - -Adams 50 50 Jaguars Davis 51 51 Knights Coolidge 52 52 Lakers Buchanan 52 52 -Lakers NULL NULL 53 Mustangs - -[]{#group_by_clause} - -### GROUP BY clause {#group-by-clause_1} - -Example: - -``` {.codehilite} -SELECT LastName, SUM(PointsScored) -FROM PlayerStats -GROUP BY LastName; -``` - -LastName SUM - --------------------------------------------------------------------------------- - -Adams 7 Buchanan 13 Coolidge 1 - -[]{#set_operators} - -### Set operators {#set-operators_1} - -[]{#union} - -#### UNION {#union_1} - -The `UNION` operator combines the result sets of two or more `SELECT` statements -by pairing columns from the result set of each `SELECT` statement and vertically -concatenating them. - -Example: - -``` {.codehilite} -SELECT Mascot AS X, SchoolID AS Y -FROM TeamMascot -UNION ALL -SELECT LastName, PointsScored -FROM PlayerStats; -``` - -Results: - -X Y - --------------------------------------------------------------------------------- - -Mustangs 50 Knights 51 Lakers 52 Mustangs 53 Adams 3 Buchanan 0 Coolidge 1 Adams -4 Buchanan 13 - -[]{#intersect} - -#### INTERSECT {#intersect_1} - -This query returns the last names that are present in both Roster and -PlayerStats. - -``` {.codehilite} -SELECT LastName -FROM Roster -INTERSECT DISTINCT -SELECT LastName -FROM PlayerStats; -``` - -Results: - -LastName - --------------------------------------------------------------------------------- - -Adams Coolidge Buchanan - -[]{#except} - -#### EXCEPT {#except_1} - -The query below returns last names in Roster that are **not** present in -PlayerStats. - -``` {.codehilite} -SELECT LastName -FROM Roster -EXCEPT DISTINCT -SELECT LastName -FROM PlayerStats; -``` - -Results: - -LastName - --------------------------------------------------------------------------------- - -Eisenhower Davis - -Reversing the order of the `SELECT` statements will return last names in -PlayerStats that are **not** present in Roster: - -``` {.codehilite} -SELECT LastName -FROM PlayerStats -EXCEPT DISTINCT -SELECT LastName -FROM Roster; -``` - -Results: - -``` {.codehilite} -(empty) -``` - -> Portions of this page are modifications based on work created and +> Portions of this page are modifications based on +> [work](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax) +> created and > [shared by Google](https://developers.google.com/terms/site-policies) > and used according to terms described in the [Creative Commons 3.0 > Attribution License](http://creativecommons.org/licenses/by/3.0/).
