This is an automated email from the ASF dual-hosted git repository.
gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 7e82e1bc43e [SPARK-45581] Make SQLSTATE mandatory
7e82e1bc43e is described below
commit 7e82e1bc43e0297c3036d802b3a151d2b93db2f6
Author: srielau <[email protected]>
AuthorDate: Wed Oct 18 11:04:44 2023 -0700
[SPARK-45581] Make SQLSTATE mandatory
### What changes were proposed in this pull request?
We propose to make SQLSTATEs mandatory field when using error classes in
the new error framework.
### Why are the changes needed?
Being able to rely on the existence of SQLSTATEs allows easier
classification of errors as well as usage of tooling to intercept SQLSTATEs.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
A new test was added to SparkThrowableSuite to enforce SQLSTATE existence
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #43412 from srielau/SPARK-45581-make-SQLSTATEs-mandatory.
Authored-by: srielau <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
---
common/utils/src/main/resources/error/README.md | 26 ++++++++++++++--------
.../src/main/resources/error/error-classes.json | 9 +++++---
.../org/apache/spark/ErrorClassesJSONReader.scala | 8 +++----
.../org/apache/spark/SparkThrowableSuite.scala | 16 ++++++++-----
docs/sql-error-conditions.md | 6 ++---
5 files changed, 41 insertions(+), 24 deletions(-)
diff --git a/common/utils/src/main/resources/error/README.md
b/common/utils/src/main/resources/error/README.md
index ac388c29250..8d8529bea56 100644
--- a/common/utils/src/main/resources/error/README.md
+++ b/common/utils/src/main/resources/error/README.md
@@ -1,6 +1,6 @@
# Guidelines
-To throw a standardized user-facing error or exception, developers should
specify the error class
+To throw a standardized user-facing error or exception, developers should
specify the error class, a SQLSTATE,
and message parameters rather than an arbitrary error message.
## Usage
@@ -10,7 +10,7 @@ and message parameters rather than an arbitrary error message.
If true, use the error class `INTERNAL_ERROR` and skip to step 4.
2. Check if an appropriate error class already exists in `error-classes.json`.
If true, use the error class and skip to step 4.
-3. Add a new class to `error-classes.json`; keep in mind the invariants below.
+3. Add a new class with a new or existing SQLSTATE to `error-classes.json`;
keep in mind the invariants below.
4. Check if the exception type already extends `SparkThrowable`.
If true, skip to step 6.
5. Mix `SparkThrowable` into the exception.
@@ -26,9 +26,9 @@ Throw with arbitrary error message:
`error-classes.json`
- "PROBLEM_BECAUSE": {
- "message": ["Problem <problem> because <cause>"],
- "sqlState": "XXXXX"
+ "PROBLEM_BECAUSE" : {
+ "message" : ["Problem <problem> because <cause>"],
+ "sqlState" : "XXXXX"
}
`SparkException.scala`
@@ -70,6 +70,8 @@ Error classes are a succinct, human-readable representation
of the error categor
An uncategorized errors can be assigned to a legacy error class with the
prefix `_LEGACY_ERROR_TEMP_` and an unused sequential number, for instance
`_LEGACY_ERROR_TEMP_0053`.
+You should not introduce new uncategorized errors. Instead, convert them to
proper errors whenever encountering them in new code.
+
#### Invariants
- Unique
@@ -79,7 +81,10 @@ An uncategorized errors can be assigned to a legacy error
class with the prefix
### Message
Error messages provide a descriptive, human-readable representation of the
error.
-The message format accepts string parameters via the C-style printf syntax.
+The message format accepts string parameters via the HTML tag syntax: e.g.
<relationName>.
+
+The values passed to the message shoudl not themselves be messages.
+They should be: runtime-values, keywords, identifiers, or other values that
are not translated.
The quality of the error message should match the
[guidelines](https://spark.apache.org/error-message-guidelines.html).
@@ -90,21 +95,24 @@ The quality of the error message should match the
### SQLSTATE
-SQLSTATE is an optional portable error identifier across SQL engines.
+SQLSTATE is an mandatory portable error identifier across SQL engines.
SQLSTATE comprises a 2-character class value followed by a 3-character
subclass value.
Spark prefers to re-use existing SQLSTATEs, preferably used by multiple
vendors.
For extension Spark claims the 'K**' subclass range.
If a new class is needed it will also claim the 'K0' class.
+Internal errors should use the 'XX' class. You can subdivide internal errors
by component.
+For example: The existing 'XXKD0' is used for an internal analyzer error.
+
#### Invariants
-- Consistent across releases
+- Consistent across releases unless the error is internal.
#### ANSI/ISO standard
The following SQLSTATEs are collated from:
- SQL2016
-- DB2 zOS
+- DB2 zOS/LUW
- PostgreSQL 15
- Oracle 12 (last published)
- SQL Server
diff --git a/common/utils/src/main/resources/error/error-classes.json
b/common/utils/src/main/resources/error/error-classes.json
index 7350d0331c4..9e442f4c666 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -2048,7 +2048,8 @@
"INVALID_SQL_ARG" : {
"message" : [
"The argument <name> of `sql()` is invalid. Consider to replace it
either by a SQL literal or by collection constructor functions such as `map()`,
`array()`, `struct()`."
- ]
+ ],
+ "sqlState" : "42K08"
},
"INVALID_SQL_SYNTAX" : {
"message" : [
@@ -2370,7 +2371,8 @@
"MULTI_SOURCES_UNSUPPORTED_FOR_EXPRESSION" : {
"message" : [
"The expression <expr> does not support more than one source."
- ]
+ ],
+ "sqlState" : "42K0E"
},
"MULTI_UDF_INTERFACE_ERROR" : {
"message" : [
@@ -3689,7 +3691,8 @@
"1. use typed Scala UDF APIs(without return type parameter), e.g.
`udf((x: Int) => x)`.",
"2. use Java UDF APIs, e.g. `udf(new UDF1[String, Integer] { override
def call(s: String): Integer = s.length() }, IntegerType)`, if input types are
all non primitive.",
"3. set \"spark.sql.legacy.allowUntypedScalaUDF\" to \"true\" and use
this API with caution."
- ]
+ ],
+ "sqlState" : "42K0E"
},
"USER_RAISED_EXCEPTION" : {
"message" : [
diff --git
a/common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala
b/common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala
index c59e1e376ea..083064bfe23 100644
--- a/common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala
+++ b/common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala
@@ -97,9 +97,9 @@ class ErrorClassesJsonReader(jsonFileURLs: Seq[URL]) {
val errorClasses = errorClass.split("\\.")
errorClasses match {
case Array(mainClass) => errorInfoMap.contains(mainClass)
- case Array(mainClass, subClass) => errorInfoMap.get(mainClass).map {
info =>
+ case Array(mainClass, subClass) => errorInfoMap.get(mainClass).exists {
info =>
info.subClass.get.contains(subClass)
- }.getOrElse(false)
+ }
case _ => false
}
}
@@ -130,7 +130,7 @@ private object ErrorClassesJsonReader {
*
* @param sqlState SQLSTATE associated with this class.
* @param subClass SubClass associated with this class.
- * @param message C-style message format compatible with printf.
+ * @param message Message format with optional placeholders (e.g.
<parm>).
* The error message is constructed by concatenating the lines
with newlines.
*/
private case class ErrorInfo(
@@ -145,7 +145,7 @@ private case class ErrorInfo(
/**
* Information associated with an error subclass.
*
- * @param message C-style message format compatible with printf.
+ * @param message Message format with optional placeholders (e.g.
<parm>).
* The error message is constructed by concatenating the lines
with newlines.
*/
private case class ErrorSubInfo(message: Seq[String]) {
diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
index bb9fe79071c..a4120637b69 100644
--- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
@@ -66,15 +66,12 @@ class SparkThrowableSuite extends SparkFunSuite {
def checkIfUnique(ss: Seq[Any]): Unit = {
val dups = ss.groupBy(identity).mapValues(_.size).filter(_._2 >
1).keys.toSeq
- assert(dups.isEmpty)
+ assert(dups.isEmpty, s"Duplicate error classes: ${dups.mkString(", ")}")
}
def checkCondition(ss: Seq[String], fx: String => Boolean): Unit = {
ss.foreach { s =>
- if (!fx(s)) {
- print(s)
- }
- assert(fx(s))
+ assert(fx(s), s)
}
}
@@ -116,6 +113,15 @@ class SparkThrowableSuite extends SparkFunSuite {
}
}
+ test("SQLSTATE is mandatory") {
+ val errorClassesNoSqlState = errorReader.errorInfoMap.filter {
+ case (error: String, info: ErrorInfo) =>
+ !error.startsWith("_LEGACY_ERROR_TEMP") && info.sqlState.isEmpty
+ }.keys.toSeq
+ assert(errorClassesNoSqlState.isEmpty,
+ s"Error classes without SQLSTATE: ${errorClassesNoSqlState.mkString(",
")}")
+ }
+
test("SQLSTATE invariants") {
val sqlStates = errorReader.errorInfoMap.values.toSeq.flatMap(_.sqlState)
val errorClassReadMe =
Utils.getSparkClassLoader.getResource("error/README.md")
diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md
index 78904780054..d8f5193c9bc 100644
--- a/docs/sql-error-conditions.md
+++ b/docs/sql-error-conditions.md
@@ -1178,7 +1178,7 @@ Expected format is 'SET', 'SET key', or 'SET key=value'.
If you want to include
### INVALID_SQL_ARG
-SQLSTATE: none assigned
+[SQLSTATE:
42K08](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)
The argument `<name>` of `sql()` is invalid. Consider to replace it either by
a SQL literal or by collection constructor functions such as `map()`,
`array()`, `struct()`.
@@ -1344,7 +1344,7 @@ The query does not include a GROUP BY clause. Add GROUP
BY or turn it into the w
### MULTI_SOURCES_UNSUPPORTED_FOR_EXPRESSION
-SQLSTATE: none assigned
+[SQLSTATE:
42K0E](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)
The expression `<expr>` does not support more than one source.
@@ -2191,7 +2191,7 @@ Literals of the type `<unsupportedType>` are not
supported. Supported types are
### UNTYPED_SCALA_UDF
-SQLSTATE: none assigned
+[SQLSTATE:
42K0E](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)
You're using untyped Scala UDF, which does not have the input type
information. Spark may blindly pass null to the Scala closure with
primitive-type argument, and the closure will see the default value of the Java
type for the null argument, e.g. `udf((x: Int) => x, IntegerType)`, the result
is 0 for null input. To get rid of this error, you could:
1. use typed Scala UDF APIs(without return type parameter), e.g. `udf((x: Int)
=> x)`.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]