alamb opened a new issue, #6758:
URL: https://github.com/apache/arrow-datafusion/issues/6758

   ### Is your feature request related to a problem or challenge?
   
   The way that DataFusion names the output columns can be quite messy. For 
example, the query below produces a column named `"FIRST_VALUE(foo.free) 
PARTITION BY [foo.host] ORDER BY [foo.time DESC NULLS FIRST] RANGE BETWEEN 
UNBOUNDED PRECEDING AND CURRENT ROW + Int64(5) / Int64(12)"` (yes that whole 
thing)
   
   Arrow requires column names to be distinct strings and DataFusion creates 
the name based on the content of the expression. 
   
   Here is the entire query
   
   ```sql
   ❯ create table foo(host varchar, free int, time int) as values ('a', 1, 2), 
('a', 3, 4);
   0 rows in set. Query took 0.003 seconds.
   ❯ select first_value(free) over (partition by host order by time desc) + 5 / 
12 from foo limit 10;
   
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | FIRST_VALUE(foo.free) PARTITION BY [foo.host] ORDER BY [foo.time DESC 
NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW + Int64(5) / 
Int64(12) |
   
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | 3                                                                          
                                                                                
 |
   | 3                                                                          
                                                                                
 |
   
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.003 seconds.
   ```
   
   It also means you can't select the same expression without an alias, which 
while annoying is likely not a critical usecase
   
   ```sql
   ❯ select free / 1, free / 1 from foo;
   Error during planning: Projections require unique expression names but the 
expression "foo.free / Int64(1)" at position 0 and "foo.free / Int64(1)" at 
position 1 have the same name. Consider aliasing ("AS") one of them.
   ❯ select free / 1, free / 1 as col2 from foo;
   +---------------------+------+
   | foo.free / Int64(1) | col2 |
   +---------------------+------+
   | 1                   | 1    |
   | 3                   | 3    |
   +---------------------+------+
   ```
   
   ### Describe the solution you'd like
   
   While the output column names can be controlled via adding an explicit 
alias,  I think it would be good to have a more concise way of naming the 
default outputs, especially for window functions
   
   Perhaps something like not adding the `OVER` clause's contents so 
`FIRST_VALUE(foo.free)` in the above example.
   
   If we do this we need to be careful to allow multiple calls to `FIRST_VALUE` 
with different `OVER` clauses in the same query
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   If we make the names more concise, we will probably need a solution like 
https://github.com/apache/arrow-datafusion/issues/6543 to ensure they remain 
unique


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to