eliaperantoni opened a new pull request, #13664:
URL: https://github.com/apache/datafusion/pull/13664

   After the PoC that I did in 
https://github.com/eliaperantoni/datafusion/tree/eper/inline-diagnostics, in 
this PR I'm attempting to build a more robust solution, with the goal of 
eventually merging it.
   
   Still, so far I've only added diagnostics to the `Cannot coerce arithmetic 
expression .. to valid types` error to get feedback and approval on the general 
implementation. That way I can make tweaks quickly, before I go extend this to 
the rest of the repo.
   
   ## Which issue does this PR close?
   
   Closes #13662.
   
   ## What changes are included in this PR?
   
   I introduce the `Diagnostic`, `DiagnosticEntry`, and `DiagnosticEntryKind` 
types to enrich errors with better user-facing information, mainly by referring 
to the portions of the SQL code that caused it.
   
   A `DiagnosticEntry` is used to refer to one particular portion of SQL code 
and describe its involvement in the error with a string message and a `kind`. A 
`Diagnostic` is simply a collection of `DiagnosticEntry`.
   
   For example, in `SELECT a_string + an_integer` we'd have:
   
   - Entry `kind=Error` `message=Incompatible types in binary expression` 
Wrapping the whole `a_string + an_integer` expression.
   - Entry `kind=Note` `message=Is of type Utf8` Wrapping just the left side.
   - Entry `kind=Note` `message=Is of type Int64` Wrapping just the right side.
   
   A new `DataFusionError::Diagnostic` variant is used to attach `Diagnostic` 
to errors. You can call `with_diagnostic` on any `DataFusionError` to do so. 
They can later be retrieved by calling `DataFusionError::get_diagnostics`.
   
   It is possible to attach multiple `Diagnostic` while an error is being 
returned along the function call stack. The rationale here is that different 
functions might have different levels of detail, but they might all have useful 
diagnostics to add. e.g. the function that computes the output type of an 
expression has no idea about why it's being called, but calling functions have 
no idea about the intricacies of the error. They all know different things.
   
   I also added a `span: Span` to `Column`. This is the first step of a 
(probably) long process of enriching the types used during logical planning 
with the `Span`s from the parsed AST nodes.
   
   Added `WithSpan<T>`. The idea is that we want to get `Span` information down 
to functions called deep in the call stack, but without breaking existing code 
too much so that we can ensure a smooth and incremental transition to 
`Diagnostic`-ed errors. `WithSpan<T>` was my attempt to do so because any `T` 
implements `Into<WithSpan<T>>` by simply attaching `Span::empty` (but of 
course, if a `Span` is available it should be used instead). This means that 
any time a function wants to start spawning `Diagnostic`-ed errors, it can 
change some of its arguments by wrapping their types in `WithSpan<T>` and 
existing calls will keep on working.
   
   `WithSpan<T>` should be used when the argument type is something that can 
loosely be traced back to a location in the source code, but is not really part 
of any AST node, or anything that is isomorphic to it. A good example is 
`DataType` which is taken by `signature(DataType, Operator, DataType) -> 
DataType` to get the output type of a binary expression. That is the function 
that spawns the error, and the caller doesn't have enough information to 
provide a good `Diagnostic`. But at the same time, it would be (in my opinion) 
cumbersome to add `Span` arguments to the function. So instead, the function 
can wrap the `DataType`s for the two sides of the expression in `WithSpan<T>`. 
Some call points can be updated to pass a `Span`-enriched `WithSpan<DataType>`, 
and all the others will keep on working without changes required.
   
   ## Are these changes tested?
   
   Not yet, while this a draft to gather feedback.
   
   ## Are there any user-facing changes?
   
   Yes. The user can call `DataFusionError::get_diagnostics` to get an iterator 
over all the `Diagnostic` that have been attached to the error chain. Those 
contain source code locations that relate to the error, and each location comes 
with a message.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to