[
https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-46810:
-----------------------------------
Labels: pull-request-available (was: )
> Clarify error class terminology
> -------------------------------
>
> Key: SPARK-46810
> URL: https://issues.apache.org/jira/browse/SPARK-46810
> Project: Spark
> Issue Type: Improvement
> Components: Documentation, SQL
> Affects Versions: 4.0.0
> Reporter: Nicholas Chammas
> Priority: Minor
> Labels: pull-request-available
>
> We use inconsistent terminology when talking about error classes. I'd like to
> get some clarity on that before contributing any potential improvements to
> this part of the documentation.
> Consider
> [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html].
> It has several key pieces of hierarchical information that have inconsistent
> names throughout our documentation and codebase:
> * 42
> ** K01
> *** INCOMPLETE_TYPE_DEFINITION
> **** ARRAY
> **** MAP
> **** STRUCT
> What are the names of these different levels of information?
> Some examples of inconsistent terminology:
> * [Over
> here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation]
> we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION
> we call that an "error class". So what exactly is a class, the 42 or the
> INCOMPLETE_TYPE_DEFINITION?
> * [Over
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122]
> we call K01 the "subclass". But [over
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467]
> we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for
> INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes".
> So what exactly is a subclass?
> * [On this
> page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition]
> we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other
> places we refer to it as an "error class".
> I personally like the terminology "error condition", but as we are already
> using "error class" very heavily throughout the codebase to refer to
> something like INCOMPLETE_TYPE_DEFINITION, I don't think it's practical to
> change at this point.
> To rationalize the different terms we are using, I propose the following
> terminology, which we should use consistently throughout our code and
> documentation:
> * Error category: 42
> * Error sub-category: K01
> * Error state: 42K01
> * Error class: INCOMPLETE_TYPE_DEFINITION
> * Error sub-classes: ARRAY, MAP, STRUCT
> We should not use "error condition" if one of the above terms more accurately
> describes what we are talking about.
> Side note: With this terminology, I believe talking about error categories
> and sub-categories in front of users is not helpful. I don't think anybody
> cares what "42" by itself means, or what "K01" by itself means. Accordingly,
> we should limit how much we talk about these concepts in the user-facing
> documentation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]