parthchandra opened a new pull request, #3580:
URL: https://github.com/apache/datafusion-comet/pull/3580

   ## Which issue does this PR close?
   
   Closes parts of https://github.com/apache/datafusion-comet/issues/551 
   Closes https://github.com/apache/datafusion-comet/issues/2215 
   Closes https://github.com/apache/datafusion-comet/issues/3375
   
   ## Rationale for this change
   
   With Spark 4.0 (And Spark 3.5 with Ansi mode), Spark produces ansi compliant 
error messages that have an error code and in many cases include the original 
SQL query. When we encounter errors in native code, Comet throws a 
SparkException or CometNativeException that do not conform to the expected 
error reporting standard.
   
   ## What changes are included in this PR?
   This PR introduces a framework to report ansi compliant error messages from 
native code.
   
   Summary of error propagation - 
   
     1. Spark-side query context serialization : For every serialized 
expression and aggregate expression, a unique `expr_id` is generated. If the 
expression's origin carries a `QueryContext` (SQL text, line, column, object 
name), it is extracted and attached to the protobuf. This is done for both 
`Expr` and `AggExpr`. 
     2. Native planner (planner.rs): The PhysicalPlanner now holds a 
`QueryContextMap`. When planning `Expr` and `AggExpr` nodes, if `expr_id` and 
`query_context` are present,  the context is registered in the map. When 
creating physical expressions for Cast, CheckOverflow, ListExtract, SumDecimal, 
AvgDecimal, and arithmetic binary expressions, the relevant QueryContext is 
looked up and passed to the constructor.
     3. Native errors : The `SparkError` enum is extended with new variants 
will all the Spark ANSI errors (from 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala).
 A new `SparkErrorWithContext` type wraps a `SparkError` with a `QueryContext`. 
All affected expression implementations look up the context and produce a 
`SparkErrorWithContext` when available. 
   The `SparkError` implementation also has new `to_json()` and 
`exception_class()` methods for JNI serialization.
     4. JNI boundary (errors.rs -> CometQueryExecutionException): The 
`throw_exception` function now checks for `SparkErrorWithContext` or 
`SparkError` and throws `CometQueryExecutionException`. 
`CometQueryExecutionException` carries the entire `SparkErrorWithContext` as a 
JSON message. On the Scala side, `CometExecIterator` catches this exception and 
calls `SparkErrorConverter.convertToSparkException()` to convert to the 
appropriate Spark exception. If the JSON message contained the `QueryContext`, 
the exception will contain the query, otherwise it will not.
    5. There are two version specific implementations -one for Spark 3.x 
(fallback to generic SparkException) and one for Spark 4.0 (calls the exact 
QueryExecutionErrors.* methods).
    
   Notes: Not all expressions have been updated. All the expressions that 
failed unit tests as a result of incorrect error messages have been updated. ( 
Cast, CheckOverflow, ListExtract, SumDecimal, AvgDecimal, and binary arithmetic 
expressions). Binary arithmetic expressions are now represented as 
`CheckedBinaryExpr` which also includes the query context. 
   Most errors in `QueryExecutionErrors` are reproduced as is in the native 
side. However some errors like `INTERVAL_ARITHMETIC_OVERFLOW` have a version 
with a user suggestion and one without a user suggestion. In such cases there 
are two variants in the native side. 
   
   ## How are these changes tested?
   
   New unit tests. Failing tests listed in 
https://github.com/apache/datafusion-comet/issues/551, 
https://github.com/apache/datafusion-comet/issues/2215, 
https://github.com/apache/datafusion-comet/issues/3375
   
   
   _This PR was produced with the generous assistance of Claude Code_ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to