eliaperantoni opened a new issue, #13676:
URL: https://github.com/apache/datafusion/issues/13676

   ### Is your feature request related to a problem or challenge?
   
   In the following query there are 4 distinct errors:
   
   ```sql
   WITH users AS (
        SELECT 1 AS id, 'John' AS name
   )
   SELECT 'id:' + idd, name FROM userss GROUP BY id;
   ```
   
   1. `userss` doesn't exist
   2. `idd` doesn't exist
   3. Can't add a string to a number
   4. `name` is missing from `GROUP BY`
   
   DataFusion currently reports only one of those error when you try to execute 
the query. After you solve one, you can try again and get the next error.
   
   This can be a bit frustrating for the end user because it requires many 
iterations of a (possibly expensive and slow) parsing and planning step. 
Furthermore, reporting multiple errors would make it possible to develop an LSP 
on top of DataFusion and such.
   
   The desired feature is for DataFusion to report as many errors as possible 
in one go.
   
   ### Describe the solution you'd like
   
   The world of programmatic language does this quite well, I think. Take rustc 
for example: you can get tens of errors in one go and fix them all before 
invoking an expensive compilation again.
   
   I think we should take inspiration from the way these compilers do it, e.g. 
panic mode and synchronization. See here for an introduction 
https://craftinginterpreters.com/parsing-expressions.html#panic-mode-error-recovery.
   
   The way it could work is: when parsing or planning for the `SelectItem`s in 
a `Select`, we catch any error coming from _one_ of the `SelectItem`, store it 
in a local variable, and proceed with the next. Then if there were any errors, 
we return their collection. We could add a 
`DataFusionError::Many(Vec<DataFusionError>)` to represent this.
   
   The same idea of "storing the error for later, synchronising to the next 
safe point, and continuing" could also be applied when parsing or planning for 
different parts of a query (e.g. the CTEs, the `SELECT`, the `WHERE`, the 
`ORDER BY`, etc. After any error in the CTEs section, we can continue with the 
`SELECT` and collect the errors there, then move on to the `WHERE`, etc), and 
also when analysing different `Statement`s. 
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   This is related to issue #13662 and my PR about diagnostics #13664. I'd be 
open to work on this issue too if the contributions would be welcomed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to