parthchandra commented on code in PR #3718:
URL: https://github.com/apache/datafusion-comet/pull/3718#discussion_r2956462818
##########
native/core/src/errors.rs:
##########
@@ -474,6 +476,54 @@ fn throw_spark_error_as_json(
)
}
+/// Try to convert a DataFusion "Unable to get field named" error into a
SparkError.
+/// DataFusion produces this error when reading Parquet files with duplicate
field names
+/// in case-insensitive mode. For example, if a Parquet file has columns "B"
and "b",
+/// DataFusion may deduplicate them and report: Unable to get field named "b".
Valid
+/// fields: ["A", "B"]. When the requested field has a case-insensitive match
among the
+/// valid fields, we convert this to Spark's _LEGACY_ERROR_TEMP_2093 error.
+fn try_convert_duplicate_field_error(error_msg: &str) -> Option<SparkError> {
Review Comment:
late comment:
You're right, this is overkill. We can, if the need arises (as in this
case), not convert the `_LEGACY_ ` errors? Or even more broadly, not do this
for the errors that originate in `QueryExecutionErrors`.
The error framework is important for the errors in
`org.apache.spark.sql.errors.ExecutionErrors` because those are SQL errors and
correspond to some pre-defined error codes in ANSI. But for
`QueryExecutionErrors` we do not have to be strict.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]