eliaperantoni opened a new pull request, #13664: URL: https://github.com/apache/datafusion/pull/13664
After the PoC that I did in https://github.com/eliaperantoni/datafusion/tree/eper/inline-diagnostics, in this PR I'm attempting to build a more robust solution, with the goal of eventually merging it. Still, so far I've only added diagnostics to the `Cannot coerce arithmetic expression .. to valid types` error to get feedback and approval on the general implementation. That way I can make tweaks quickly, before I go extend this to the rest of the repo. ## Which issue does this PR close? Closes #13662. ## What changes are included in this PR? I introduce the `Diagnostic`, `DiagnosticEntry`, and `DiagnosticEntryKind` types to enrich errors with better user-facing information, mainly by referring to the portions of the SQL code that caused it. A `DiagnosticEntry` is used to refer to one particular portion of SQL code and describe its involvement in the error with a string message and a `kind`. A `Diagnostic` is simply a collection of `DiagnosticEntry`. For example, in `SELECT a_string + an_integer` we'd have: - Entry `kind=Error` `message=Incompatible types in binary expression` Wrapping the whole `a_string + an_integer` expression. - Entry `kind=Note` `message=Is of type Utf8` Wrapping just the left side. - Entry `kind=Note` `message=Is of type Int64` Wrapping just the right side. A new `DataFusionError::Diagnostic` variant is used to attach `Diagnostic` to errors. You can call `with_diagnostic` on any `DataFusionError` to do so. They can later be retrieved by calling `DataFusionError::get_diagnostics`. It is possible to attach multiple `Diagnostic` while an error is being returned along the function call stack. The rationale here is that different functions might have different levels of detail, but they might all have useful diagnostics to add. e.g. the function that computes the output type of an expression has no idea about why it's being called, but calling functions have no idea about the intricacies of the error. They all know different things. I also added a `span: Span` to `Column`. This is the first step of a (probably) long process of enriching the types used during logical planning with the `Span`s from the parsed AST nodes. Added `WithSpan<T>`. The idea is that we want to get `Span` information down to functions called deep in the call stack, but without breaking existing code too much so that we can ensure a smooth and incremental transition to `Diagnostic`-ed errors. `WithSpan<T>` was my attempt to do so because any `T` implements `Into<WithSpan<T>>` by simply attaching `Span::empty` (but of course, if a `Span` is available it should be used instead). This means that any time a function wants to start spawning `Diagnostic`-ed errors, it can change some of its arguments by wrapping their types in `WithSpan<T>` and existing calls will keep on working. `WithSpan<T>` should be used when the argument type is something that can loosely be traced back to a location in the source code, but is not really part of any AST node, or anything that is isomorphic to it. A good example is `DataType` which is taken by `signature(DataType, Operator, DataType) -> DataType` to get the output type of a binary expression. That is the function that spawns the error, and the caller doesn't have enough information to provide a good `Diagnostic`. But at the same time, it would be (in my opinion) cumbersome to add `Span` arguments to the function. So instead, the function can wrap the `DataType`s for the two sides of the expression in `WithSpan<T>`. Some call points can be updated to pass a `Span`-enriched `WithSpan<DataType>`, and all the others will keep on working without changes required. ## Are these changes tested? Not yet, while this a draft to gather feedback. ## Are there any user-facing changes? Yes. The user can call `DataFusionError::get_diagnostics` to get an iterator over all the `Diagnostic` that have been attached to the error chain. Those contain source code locations that relate to the error, and each location comes with a message. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org