alamb opened a new issue, #14089:
URL: https://github.com/apache/datafusion/issues/14089

   ### Is your feature request related to a problem or challenge?
   
   There has been significant and long standing confusion about how to refer to 
columns with capitalization, most recently:
   - https://github.com/apache/datafusion/issues/13649
   
   I think the root cause is that SQL is largely case insensitive but many 
DataFrame like systems are case sensitive. 
   
   There is a larger question if we could perhaps have less confusing defaults, 
but I think we could make the error messages even better
   
   Today if you have a schema with a field named `userId` (note the capital 
`I`), if you run a query like 
   
   ```sql
   SELECT userId FROM foo
   ``` 
   
   you will get a seemingly nonsensical error:
   
   > Schema error: No field named userid. Valid fields are 
telemetry_user_events.event, telemetry_user_events.prompt, 
telemetry_user_events.timestamp, telemetry_user_events.timestamp_utc, 
telemetry_user_events.\"teamId\", telemetry_user_events.\"userId\", 
telemetry_user_events.query.\n 1: No field named userid. Valid fields are 
telemetry_user_events.event, telemetry_user_events.prompt, 
telemetry_user_events.timestamp, telemetry_user_events.timestamp_utc, 
telemetry_user_events.\"teamId\", telemetry_user_events.\"userId\", 
telemetry_user_events.query.
   
   
   
   ### Describe the solution you'd like
   
   I would like to improve the error to add a hint if there is a column name 
that matches the field exept for case about what to do to fix it. 
   
   
   In the example above something like:
   
   > note use double quotes to refer to the "userId"  column  or set the 
`datafusion.sql_parser.enable_ident_normalization` configuration option
   
   So the whole error would be something like
   
   > > Schema error: No field named userid. You can use double quotes to refer 
to the "userId"  column  or set the 
`datafusion.sql_parser.enable_ident_normalization` configuration option .  
Valid fields are telemetry_user_events.event, telemetry_user_events.prompt, 
telemetry_user_events.timestamp, telemetry_user_events.timestamp_utc, 
telemetry_user_events.\"teamId\", telemetry_user_events.\"userId\", 
telemetry_user_events.query.\n 1: No field named userid. Valid fields are 
telemetry_user_events.event, telemetry_user_events.prompt, 
telemetry_user_events.timestamp, telemetry_user_events.timestamp_utc, 
telemetry_user_events.\"teamId\", telemetry_user_events.\"userId\", 
telemetry_user_events.query.
   
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   I think this would be super helpful and a nice first issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to