eliaperantoni opened a new issue, #13676:
URL: https://github.com/apache/datafusion/issues/13676
### Is your feature request related to a problem or challenge?
In the following query there are 4 distinct errors:
```sql
WITH users AS (
SELECT 1 AS id, 'John' AS name
)
SELECT 'id:' + idd, name FROM userss GROUP BY id;
```
1. `userss` doesn't exist
2. `idd` doesn't exist
3. Can't add a string to a number
4. `name` is missing from `GROUP BY`
DataFusion currently reports only one of those error when you try to execute
the query. After you solve one, you can try again and get the next error.
This can be a bit frustrating for the end user because it requires many
iterations of a (possibly expensive and slow) parsing and planning step.
Furthermore, reporting multiple errors would make it possible to develop an LSP
on top of DataFusion and such.
The desired feature is for DataFusion to report as many errors as possible
in one go.
### Describe the solution you'd like
The world of programmatic language does this quite well, I think. Take rustc
for example: you can get tens of errors in one go and fix them all before
invoking an expensive compilation again.
I think we should take inspiration from the way these compilers do it, e.g.
panic mode and synchronization. See here for an introduction
https://craftinginterpreters.com/parsing-expressions.html#panic-mode-error-recovery.
The way it could work is: when parsing or planning for the `SelectItem`s in
a `Select`, we catch any error coming from _one_ of the `SelectItem`, store it
in a local variable, and proceed with the next. Then if there were any errors,
we return their collection. We could add a
`DataFusionError::Many(Vec<DataFusionError>)` to represent this.
The same idea of "storing the error for later, synchronising to the next
safe point, and continuing" could also be applied when parsing or planning for
different parts of a query (e.g. the CTEs, the `SELECT`, the `WHERE`, the
`ORDER BY`, etc. After any error in the CTEs section, we can continue with the
`SELECT` and collect the errors there, then move on to the `WHERE`, etc), and
also when analysing different `Statement`s.
### Describe alternatives you've considered
_No response_
### Additional context
This is related to issue #13662 and my PR about diagnostics #13664. I'd be
open to work on this issue too if the contributions would be welcomed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]