[jira] [Commented] (SPARK-46810) Clarify error class terminology

Nicholas Chammas (Jira) Mon, 29 Jan 2024 06:51:04 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811923#comment-17811923
 ]


Nicholas Chammas commented on SPARK-46810:
------------------------------------------

I think Option 3 is a good compromise that lets us continue calling 
{{INCOMPLETE_TYPE_DEFINITION}} an "error class", which perhaps would be the 
least disruptive to Spark developers.

However, for the record, the SQL standard only seems to use the term "class" in 
the context of the 5-character SQLSTATE. Otherwise, the standard uses the term 
"condition" or "exception condition".

I don't have a copy of the SQL 2016 standard handy. It's not available on ISO's 
website for sale, actually. The only option appears to be to purchase [the SQL 
2023 standard for ~$220|https://www.iso.org/standard/76583.html].

However, there is a copy of the [SQL 1992 standard available 
publicly|https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt]. 

Table 23 on page 619 is relevant:

{code}
         ____________Table_23-SQLSTATE_class_and_subclass_values____________

         _Condition__________________Class_Subcondition_______________Subclass

        | ambiguous cursor name    | 3C  | (no subclass)            | 000  |
        |                          |     |                          |      |
        |                          |     |                          |      |
        | cardinality violation    | 21  | (no subclass)            | 000  |
        |                          |     |                          |      |
        | connection exception     | 08  | (no subclass)            | 000  |
        |                          |     |                          |      |
        |                          |     | connection does not      | 003  |
                                           exist
        |                          |     | connection failure       | 006  |
        |                          |     |                          |      |
        |                          |     | connection name in use   | 002  |
        |                          |     |                          |      |
        |                          |     | SQL-client unable to     | 001  |
                                           establish SQL-connection
        ...
{code}

I think this maps closest to Option 1, but again if we want to go with Option 3 
I think that's reasonable too. But in the case of Option 3 we should then 
retire [our use of the term "error 
condition"|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html] so 
that we don't use multiple terms to refer to the same thing.

> Clarify error class terminology
> -------------------------------
>
>                 Key: SPARK-46810
>                 URL: https://issues.apache.org/jira/browse/SPARK-46810
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation, SQL
>    Affects Versions: 4.0.0
>            Reporter: Nicholas Chammas
>            Priority: Minor
>              Labels: pull-request-available
>
> We use inconsistent terminology when talking about error classes. I'd like to 
> get some clarity on that before contributing any potential improvements to 
> this part of the documentation.
> Consider 
> [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html].
>  It has several key pieces of hierarchical information that have inconsistent 
> names throughout our documentation and codebase:
>  * 42
>  ** K01
>  *** INCOMPLETE_TYPE_DEFINITION
>  **** ARRAY
>  **** MAP
>  **** STRUCT
> What are the names of these different levels of information?
> Some examples of inconsistent terminology:
>  * [Over 
> here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation]
>  we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION 
> we call that an "error class". So what exactly is a class, the 42 or the 
> INCOMPLETE_TYPE_DEFINITION?
>  * [Over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122]
>  we call K01 the "subclass". But [over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467]
>  we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for 
> INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". 
> So what exactly is a subclass?
>  * [On this 
> page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition]
>  we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other 
> places we refer to it as an "error class".
> I don't think we should leave this status quo as-is. I see a couple of ways 
> to fix this.
> h1. Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition"
> One solution is to use the following terms:
>  * Error class: 42
>  * Error sub-class: K01
>  * Error state: 42K01
>  * Error condition: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-condition: ARRAY, MAP, STRUCT
> Pros: 
>  * This terminology seems (to me at least) the most natural and intuitive.
>  * It may also match the SQL standard.
> Cons:
>  * We use {{errorClass}} [all over our 
> codebase|https://github.com/apache/spark/blob/15c9ec7cbbbba3b66ec413b7964a374cb9508a80/common/utils/src/main/scala/org/apache/spark/SparkException.scala#L30]
>  – literally in thousands of places – to refer to strings like 
> INCOMPLETE_TYPE_DEFINITION.
>  ** It's probably not practical to update all these usages to say 
> {{errorCondition}} instead, so if we go with this approach there will be a 
> divide between the terminology we use in user-facing documentation vs. what 
> the code base uses.
>  ** We can perhaps rename the existing {{error-classes.json}} to 
> {{error-conditions.json}} but clarify the reason for this divide between code 
> and user docs in the documentation for {{ErrorClassesJsonReader}} .
> h1. Option 2: 42 becomes an "Error Category"
> Another approach is to use the following terminology:
>  * Error category: 42
>  * Error sub-category: K01
>  * Error state: 42K01
>  * Error class: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-classes: ARRAY, MAP, STRUCT
> Pros:
>  * We continue to use "error class" as we do today in our code base.
>  * The change from calling "42" a class to a category is low impact and may 
> not show up in user-facing documentation at all. (See my side note below.)
> Cons:
>  * These terms may not align with the SQL standard.
>  * We will have to retire the term "error condition", which we have [already 
> used|https://github.com/apache/spark/blob/e7fb0ad68f73d0c1996b19c9e139d70dcc97a8c4/docs/sql-error-conditions.md]
>  in user-facing documentation.
> —
> Side note: In either case, I believe talking about "42" and "K01" – 
> regardless of what we end up calling them – in front of users is not helpful. 
> I don't think anybody cares what "42" by itself means, or what "K01" by 
> itself means. Accordingly, we should limit how much we talk about these 
> concepts in the user-facing documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-46810) Clarify error class terminology

Reply via email to