Chen-Yuan-Lai commented on issue #14432: URL: https://github.com/apache/datafusion/issues/14432#issuecomment-2671946008
Hi @eliaperantoni I found that `Diagnostic` information may lost when the error is wrapped in multiple layers. For example, the "wrong number of argument" error of `sum` function is wrapped in three locations: 1. `test_wrong_argument_number` https://github.com/apache/datafusion/blob/469f18be1c594b07e4b235f3404419792ed3c24f/datafusion/common/src/utils/mod.rs#L959-L971 2. `get_valid_types_with_scalar_udf` https://github.com/apache/datafusion/blob/71f9d0c1d69c2404422f9f19bf78f0062c6b2c5e/datafusion/expr/src/type_coercion/functions.rs#L295-L305 3. `get_type` https://github.com/apache/datafusion/blob/71f9d0c1d69c2404422f9f19bf78f0062c6b2c5e/datafusion/expr/src/expr_schema.rs#L156-L177 Once the error macro (ex. `plan_datafusion_err!`, `exec_err!`) is called to wrap the error from the inner layer, a new `DatafusionError` will be reproduced, that is why I can't capture the `Diagnostic` in the unit test. This is the `DatafusionError` in these three locations (use `dbg!`), we can see `Diagnostic` lost in the first layer. ```bash [datafusion/expr/src/type_coercion/functions.rs:304:17] &e = Diagnostic( Diagnostic { kind: Error, message: "Wrong number of arguments for sum function call", span: None, notes: [], helps: [], }, Execution( "sum function requires 1 argument, got 2", ), ) [datafusion/expr/src/expr_schema.rs:166:25] &err = Execution( "Function 'sum' user-defined coercion failed with \"Execution error: sum function requires 1 argument, got 2\"", ) [datafusion/sql/tests/cases/diagnostic.rs:52:17] &err = Plan( "Execution error: Function 'sum' user-defined coercion failed with \"Execution error: sum function requires 1 argument, got 2\" No function matches the given name and argument types 'sum(Int64, Int64)'. You might need to add explicit type casts.\n\tCandidate functions:\n\tsum(UserDefined)", ) ``` I think there are two solutions: 1. modify error macro to attach `Diagnostic` . 2. reattach `Diagnostic` again and again in the error chain. But 1. seems to have a large effect on the codebase, I'm not sure which one is better. I hope to get some suggestions or other best practices, thx so much!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org