parthchandra opened a new pull request, #3580: URL: https://github.com/apache/datafusion-comet/pull/3580
## Which issue does this PR close? Closes parts of https://github.com/apache/datafusion-comet/issues/551 Closes https://github.com/apache/datafusion-comet/issues/2215 Closes https://github.com/apache/datafusion-comet/issues/3375 ## Rationale for this change With Spark 4.0 (And Spark 3.5 with Ansi mode), Spark produces ansi compliant error messages that have an error code and in many cases include the original SQL query. When we encounter errors in native code, Comet throws a SparkException or CometNativeException that do not conform to the expected error reporting standard. ## What changes are included in this PR? This PR introduces a framework to report ansi compliant error messages from native code. Summary of error propagation - 1. Spark-side query context serialization : For every serialized expression and aggregate expression, a unique `expr_id` is generated. If the expression's origin carries a `QueryContext` (SQL text, line, column, object name), it is extracted and attached to the protobuf. This is done for both `Expr` and `AggExpr`. 2. Native planner (planner.rs): The PhysicalPlanner now holds a `QueryContextMap`. When planning `Expr` and `AggExpr` nodes, if `expr_id` and `query_context` are present, the context is registered in the map. When creating physical expressions for Cast, CheckOverflow, ListExtract, SumDecimal, AvgDecimal, and arithmetic binary expressions, the relevant QueryContext is looked up and passed to the constructor. 3. Native errors : The `SparkError` enum is extended with new variants will all the Spark ANSI errors (from https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala). A new `SparkErrorWithContext` type wraps a `SparkError` with a `QueryContext`. All affected expression implementations look up the context and produce a `SparkErrorWithContext` when available. The `SparkError` implementation also has new `to_json()` and `exception_class()` methods for JNI serialization. 4. JNI boundary (errors.rs -> CometQueryExecutionException): The `throw_exception` function now checks for `SparkErrorWithContext` or `SparkError` and throws `CometQueryExecutionException`. `CometQueryExecutionException` carries the entire `SparkErrorWithContext` as a JSON message. On the Scala side, `CometExecIterator` catches this exception and calls `SparkErrorConverter.convertToSparkException()` to convert to the appropriate Spark exception. If the JSON message contained the `QueryContext`, the exception will contain the query, otherwise it will not. 5. There are two version specific implementations -one for Spark 3.x (fallback to generic SparkException) and one for Spark 4.0 (calls the exact QueryExecutionErrors.* methods). Notes: Not all expressions have been updated. All the expressions that failed unit tests as a result of incorrect error messages have been updated. ( Cast, CheckOverflow, ListExtract, SumDecimal, AvgDecimal, and binary arithmetic expressions). Binary arithmetic expressions are now represented as `CheckedBinaryExpr` which also includes the query context. Most errors in `QueryExecutionErrors` are reproduced as is in the native side. However some errors like `INTERVAL_ARITHMETIC_OVERFLOW` have a version with a user suggestion and one without a user suggestion. In such cases there are two variants in the native side. ## How are these changes tested? New unit tests. Failing tests listed in https://github.com/apache/datafusion-comet/issues/551, https://github.com/apache/datafusion-comet/issues/2215, https://github.com/apache/datafusion-comet/issues/3375 _This PR was produced with the generous assistance of Claude Code_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
