This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new d31c6a8eadf0 [MINOR][DOCS] Clean markup in NULL semantics documentation d31c6a8eadf0 is described below commit d31c6a8eadf0b828f57bb174625005c1bf5b0b01 Author: Nicholas Chammas <nicholas.cham...@gmail.com> AuthorDate: Mon Feb 5 10:57:05 2024 +0800 [MINOR][DOCS] Clean markup in NULL semantics documentation ### What changes were proposed in this pull request? - Replace the manual table of contents with an automatically generated one. - Tweak various bits of markup to clean up the presentation. - Remove the unused `style` block. ### Why are the changes needed? Various bits of markup on this page are inconsistent with the rest of our documentation and are not necessary. ### Does this PR introduce _any_ user-facing change? Yes, it changes the TOC and formatting of the NULL semantics page. ### How was this patch tested? I built the docs and reviewed them in my browser. 3.5.0 vs. this branch: <img height="300px" src="https://github.com/apache/spark/assets/1039369/b8dc5bab-69c3-4267-80f4-815927d4afbb" /> <img height="300px" src="https://github.com/apache/spark/assets/1039369/e189bb44-5839-460f-8c4e-5c9c2ba60297" /> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45014 from nchammas/null-semantics-html. Authored-by: Nicholas Chammas <nicholas.cham...@gmail.com> Signed-off-by: Kent Yao <y...@apache.org> --- docs/sql-ref-null-semantics.md | 85 +++++++++++++++++------------------------- 1 file changed, 35 insertions(+), 50 deletions(-) diff --git a/docs/sql-ref-null-semantics.md b/docs/sql-ref-null-semantics.md index 9128a1d0c966..36493bced767 100644 --- a/docs/sql-ref-null-semantics.md +++ b/docs/sql-ref-null-semantics.md @@ -19,38 +19,22 @@ license: | limitations under the License. --- -### Description - A table consists of a set of rows and each row contains a set of columns. A column is associated with a data type and represents a specific attribute of an entity (for example, `age` is a column of an entity called `person`). Sometimes, the value of a column specific to a row is not known at the time the row comes into existence. -In `SQL`, such values are represented as `NULL`. This section details the +In SQL, such values are represented as `NULL`. This section details the semantics of `NULL` values handling in various operators, expressions and -other `SQL` constructs. - -1. [Null handling in comparison operators](#comp-operators) -2. [Null handling in Logical operators](#logical-operators) -3. [Null handling in Expressions](#expressions) - 1. [Null handling in null-intolerant expressions](#null-intolerant) - 2. [Null handling Expressions that can process null value operands](#can-process-null) - 3. [Null handling in built-in aggregate expressions](#built-in-aggregate) -4. [Null handling in WHERE, HAVING and JOIN conditions](#condition-expressions) -5. [Null handling in GROUP BY and DISTINCT](#aggregate-operator) -6. [Null handling in ORDER BY](#order-by) -7. [Null handling in UNION, INTERSECT, EXCEPT](#set-operators) -8. [Null handling in EXISTS and NOT EXISTS subquery](#exists-not-exists) -9. [Null handling in IN and NOT IN subquery](#in-not-in) - -<style type="text/css"> -.tsclass {font-size:12px;color:#333333;width:40%;border-width: 2px;border-color: #729ea5;border-collapse: collapse;} -.tsclass th {text-align: left;} -</style> +other SQL constructs. + +* This will become a table of contents (this text will be scraped). +{:toc} The following illustrates the schema layout and data of a table named `person`. The data contains `NULL` values in -the `age` column and this table will be used in various examples in the sections below.<br/> -**<u>TABLE: person</u>** +the `age` column and this table will be used in various examples in the sections below. + +**TABLE: person** |Id |Name|Age| |---|----|---| @@ -62,12 +46,12 @@ the `age` column and this table will be used in various examples in the sections |600|Michelle|30| |700|Dan|50| -### Comparison Operators <a name="comp-operators"></a> +### Comparison Operators -Apache spark supports the standard comparison operators such as '>', '>=', '=', '<' and '<='. +Apache spark supports the standard comparison operators such as `>`, `>=`, `=`, `<` and `<=`. The result of these operators is unknown or `NULL` when one of the operands or both the operands are unknown or `NULL`. In order to compare the `NULL` values for equality, Spark provides a null-safe -equal operator ('<=>'), which returns `False` when one of the operand is `NULL` and returns 'True` when +equal operator (`<=>`), which returns `False` when one of the operand is `NULL` and returns `True` when both the operands are `NULL`. The following table illustrates the behaviour of comparison operators when one or both operands are `NULL`: @@ -77,7 +61,8 @@ one or both operands are `NULL`: |Any value|NULL|NULL|NULL|NULL|NULL|NULL|False| |NULL|NULL|NULL|NULL|NULL|NULL|NULL|True| -### Examples + +**Examples** ```sql -- Normal comparison operators return `NULL` when one of the operand is `NULL`. @@ -113,7 +98,7 @@ SELECT NULL <=> NULL; +-----------------+ ``` -### Logical Operators <a name="logical-operators"></a> +### Logical Operators Spark supports standard logical operators such as `AND`, `OR` and `NOT`. These operators take `Boolean` expressions as the arguments and return a `Boolean` value. @@ -132,7 +117,7 @@ The following tables illustrate the behavior of logical operators when one or bo |-------|---| |NULL|NULL| -### Examples +**Examples** ```sql -- Normal comparison operators return `NULL` when one of the operands is `NULL`. @@ -160,7 +145,7 @@ SELECT NOT(null) AS expression_output; +-----------------+ ``` -### Expressions <a name="expressions"></a> +### Expressions The comparison operators and logical operators are treated as expressions in Spark. Other than these two kinds of expressions, Spark supports other form of @@ -170,12 +155,12 @@ in Spark can be broadly classified as : - Expressions that can process `NULL` value operands - The result of these expressions depends on the expression itself. -#### Null Intolerant Expressions <a name="null-intolerant"></a> +#### Null Intolerant Expressions Null intolerant expressions return `NULL` when one or more arguments of expression are `NULL` and most of the expressions fall in this category. -##### Examples +**Examples** ```sql SELECT concat('John', null) AS expression_output; @@ -200,7 +185,7 @@ SELECT to_date(null) AS expression_output; +-----------------+ ``` -#### Expressions That Can Process Null Value Operands <a name="can-process-null"></a> +#### Expressions That Can Process Null Value Operands This class of expressions are designed to handle `NULL` values. The result of the expressions depends on the expression itself. As an example, function expression `isnull` @@ -219,7 +204,7 @@ returns the first non `NULL` value in its list of operands. However, `coalesce` - ATLEASTNNONNULLS - IN -##### Examples +**Examples** ```sql SELECT isnull(null) AS expression_output; @@ -253,7 +238,7 @@ SELECT isnan(null) AS expression_output; +-----------------+ ``` -#### Builtin Aggregate Expressions <a name="built-in-aggregate"></a> +#### Builtin Aggregate Expressions Aggregate functions compute a single result by processing a set of input rows. Below are the rules of how `NULL` values are handled by aggregate functions. @@ -269,7 +254,7 @@ the rules of how `NULL` values are handled by aggregate functions. - ANY - SOME -#### Examples +**Examples** ```sql -- `count(*)` does not skip `NULL` values. @@ -314,14 +299,14 @@ SELECT max(age) FROM person where 1 = 0; +--------+ ``` -### Condition Expressions in WHERE, HAVING and JOIN Clauses <a name="condition-expressions"></a> +### Condition Expressions in WHERE, HAVING and JOIN Clauses `WHERE`, `HAVING` operators filter rows based on the user specified condition. A `JOIN` operator is used to combine rows from two tables based on a join condition. For all the three operators, a condition expression is a boolean expression and can return - <code>True, False or Unknown (NULL)</code>. They are "satisfied" if the result of the condition is `True`. + `True`, `False`, or Unknown (`NULL`). They are "satisfied" if the result of the condition is `True`. -#### Examples +**Examples** ```sql -- Persons whose age is unknown (`NULL`) are filtered out from the result set. @@ -393,14 +378,14 @@ SELECT * FROM person p1, person p2 +--------+----+--------+----+ ``` -### Aggregate Operator (GROUP BY, DISTINCT) <a name="aggregate-operator"></a> +### Aggregate Operator (GROUP BY, DISTINCT) As discussed in the previous section [comparison operator](sql-ref-null-semantics.html#comparison-operators), two `NULL` values are not equal. However, for the purpose of grouping and distinct processing, the two or more values with `NULL data`are grouped together into the same bucket. This behaviour is conformant with SQL standard and with other enterprise database management systems. -#### Examples +**Examples** ```sql -- `NULL` values are put in one bucket in `GROUP BY` processing. @@ -426,13 +411,13 @@ SELECT DISTINCT age FROM person; +----+ ``` -### Sort Operator (ORDER BY Clause) <a name="order-by"></a> +### Sort Operator (ORDER BY Clause) Spark SQL supports null ordering specification in `ORDER BY` clause. Spark processes the `ORDER BY` clause by placing all the `NULL` values at first or at last depending on the null ordering specification. By default, all the `NULL` values are placed at first. -#### Examples +**Examples** ```sql -- `NULL` values are shown at first and other values @@ -481,13 +466,13 @@ SELECT age, name FROM person ORDER BY age DESC NULLS LAST; +----+--------+ ``` -### Set Operators (UNION, INTERSECT, EXCEPT) <a name="set-operators"></a> +### Set Operators (UNION, INTERSECT, EXCEPT) `NULL` values are compared in a null-safe manner for equality in the context of set operations. That means when comparing rows, two `NULL` values are considered equal unlike the regular `EqualTo`(`=`) operator. -#### Examples +**Examples** ```sql CREATE VIEW unknown_age SELECT * FROM person WHERE age IS NULL; @@ -539,7 +524,7 @@ SELECT name, age FROM person +--------+----+ ``` -### EXISTS/NOT EXISTS Subquery <a name="exists-not-exists"></a> +### EXISTS/NOT EXISTS Subquery In Spark, EXISTS and NOT EXISTS expressions are allowed inside a WHERE clause. These are boolean expressions which return either `TRUE` or @@ -552,7 +537,7 @@ These two expressions are not affected by presence of NULL in the result of the subquery. They are normally faster because they can be converted to semijoins / anti-semijoins without special provisions for null awareness. -#### Examples +**Examples** ```sql -- Even if subquery produces rows with `NULL` values, the `EXISTS` expression @@ -593,7 +578,7 @@ SELECT * FROM person WHERE NOT EXISTS (SELECT 1 WHERE 1 = 0); +--------+----+ ``` -### IN/NOT IN Subquery <a name="in-not-in"></a> +### IN/NOT IN Subquery In Spark, `IN` and `NOT IN` expressions are allowed inside a WHERE clause of a query. Unlike the `EXISTS` expression, `IN` expression can return a `TRUE`, @@ -615,7 +600,7 @@ NOT IN always returns UNKNOWN when the list contains `NULL`, regardless of the i This is because IN returns UNKNOWN if the value is not in the list containing `NULL`, and because NOT UNKNOWN is again UNKNOWN. -#### Examples +**Examples** ```sql -- The subquery has only `NULL` value in its result set. Therefore, --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org