This is an automated email from the ASF dual-hosted git repository.
gengliang pushed a commit to branch branch-4.1
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.1 by this push:
new e31ed5ddb379 [SPARK-51695][SQL][DOCS][FOLLOW-UP] Clarify NULL handling
semantics in UNIQUE constraint Javadoc
e31ed5ddb379 is described below
commit e31ed5ddb379a0da49fd2be9042e9036f5266cee
Author: Yan Yan <[email protected]>
AuthorDate: Wed Feb 25 17:04:24 2026 -0800
[SPARK-51695][SQL][DOCS][FOLLOW-UP] Clarify NULL handling semantics in
UNIQUE constraint Javadoc
### What changes were proposed in this pull request?
Document NULLS DISTINCT behavior: NULL values are treated as distinct from
each other, so rows with NULLs in unique columns never conflict. Also note that
UNIQUE allows nullable columns (unlike PRIMARY KEY) and that NULLS NOT DISTINCT
is not currently supported.
### Why are the changes needed?
Better javadoc clarity on UNIQUE constraint expectation.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
N/A
### Was this patch authored or co-authored using generative AI tooling?
Co-Authored-By: Claude Opus 4.6 <noreplyanthropic.com>
Closes #54357 from yyanyy/unique_definition_clarify.
Authored-by: Yan Yan <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
(cherry picked from commit 53606f21eb1a4dd47be15a2fc353f1dffa23c58d)
Signed-off-by: Gengliang Wang <[email protected]>
---
.../apache/spark/sql/connector/catalog/constraints/Unique.java | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Unique.java
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Unique.java
index d983ef656297..d4837932863f 100644
---
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Unique.java
+++
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Unique.java
@@ -28,6 +28,15 @@ import
org.apache.spark.sql.connector.expressions.NamedReference;
* <p>
* A UNIQUE constraint specifies one or more columns as unique columns. Such
constraint is satisfied
* if and only if no two rows in a table have the same non-null values in the
unique columns.
+ * Unlike PRIMARY KEY, UNIQUE allows nullable columns.
+ * <p>
+ * NULL values are treated as distinct from each other (NULLS DISTINCT
semantics). Two rows
+ * are considered duplicates only when every column in the unique key has a
non-null value and
+ * every value matches. If any column in the unique key is NULL, the row is
always considered
+ * unique regardless of other values. In other words, multiple rows with NULL
in one or more
+ * unique columns are allowed and do not violate the constraint definition.
+ * <p>
+ * The {@code NULLS NOT DISTINCT} modifier is not currently supported.
* <p>
* Spark doesn't enforce UNIQUE constraints but leverages them for query
optimization. Each
* constraint is either valid (the existing data is guaranteed to satisfy the
constraint), invalid
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]