[GitHub] [arrow-datafusion] rdettai commented on pull request #1063: Optimize count agg expr with null column statistics

GitBox Sat, 16 Oct 2021 17:49:01 -0700


rdettai commented on pull request #1063:
URL: https://github.com/apache/arrow-datafusion/pull/1063#issuecomment-944285349



   I'm not sure this is a very well defined thing in the SQL standards, but 
usually in query engines, projected columns get default names. For instance the 
result value of `SELECT sum(a) FROM table` will be in a column named `SUM(a)` 
(or something like that). 
   
   If you start up the Datafusion CLI as explained in 
`datafusion-cli/README.md`, you will get:
   ```sql,ignore
   > SELECT count(*) FROM foo;
   +-----------------+
   | COUNT(UInt8(1)) |
   +-----------------+
   | 1               |
   +-----------------+
   ```
   
   ```sql,ignore
   > SELECT min(a) FROM foo;
   +------------+
   | MIN(foo.a) |
   +------------+
   | 1          |
   +------------+
   ```
   
   So what would you expect the column name to be for `SELECT count(a) FROM 
foo;`?
   
   Also, it would be pretty nice if the response column names were the same 
whether the plan went through an optimization or not 😄 
   
   _Side note_: I checked on Athena (if I remember correctly you are used to 
that engine), and their choice for example is to simply name the column 
`_col0`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] rdettai commented on pull request #1063: Optimize count agg expr with null column statistics

Reply via email to