[jira] [Updated] (ARROW-10374) [Rust] [DataFusion] Grouping by column position

Andrew Lamb (Jira) Mon, 05 Apr 2021 08:07:09 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrew Lamb updated ARROW-10374:
--------------------------------
    Description: 
It would be great to have the support of grouping by column position instead of 
grouping by exact expression. For example:
{code:java}
SELECT state, COUNT(*) FROM customers GROUP BY 1{code}


For example, for a query like

{code}
> select database_name, storage, sum(estimated_bytes) from chunks group by 
> database_name, storage;
+-----------------------------------+---------------------+----------------------+
| database_name                     | storage             | 
SUM(estimated_bytes) |
+-----------------------------------+---------------------+----------------------+
| 844910ece80be8bc_cac95fa59126cd01 | OpenMutableBuffer   | 109737              
 |
| 844910ece80be8bc_05d1e95653672000 | OpenMutableBuffer   | 2337719             
 |
| 844910ece80be8bc_7be09b71c487d5d3 | ClosedMutableBuffer | 799682176           
 |
+-----------------------------------+---------------------+----------------------+
{code}

It can be expressed in the same way using numbers to refer to other items in 
the select list.

However, this does not work today in DataFusion:

{code}
> select database_name, storage, sum(estimated_bytes) from chunks group by 1, 2;
Plan("Projection references non-aggregate values")
{code}


  was:
It would be great to have the support of grouping by column position instead of 
grouping by exact expression. For example:
{code:java}
SELECT state, COUNT(*) FROM customers GROUP BY 1{code}


For example, for a query like

{code}
> select database_name, storage, sum(estimated_bytes) from chunks group by 
> database_name, storage;
+-----------------------------------+---------------------+----------------------+
| database_name                     | storage             | 
SUM(estimated_bytes) |
+-----------------------------------+---------------------+----------------------+
| 844910ece80be8bc_cac95fa59126cd01 | OpenMutableBuffer   | 109737              
 |
| 844910ece80be8bc_5403ba1b2193d41f | OpenMutableBuffer   | 4633541             
 |
| 844910ece80be8bc_7caaf9613b16e7c2 | OpenMutableBuffer   | 2083736             
 |
| InfluxData_%5Ftasks               | OpenMutableBuffer   | 824561              
 |
| 844910ece80be8bc_4bed41a6ff7f0ee0 | OpenMutableBuffer   | 7033460             
 |
| 844910ece80be8bc_dfeaf11d9d194efd | OpenMutableBuffer   | 528078              
 |
| 844910ece80be8bc_5403ba1b2193d41f | ClosedMutableBuffer | 21653483            
 |
| 7f4b06e9112d7bc8_system%5Fusage   | OpenMutableBuffer   | 9564613             
 |
| 844910ece80be8bc_eaec8df57a81a1e9 | OpenMutableBuffer   | 3902092             
 |
| 844910ece80be8bc_ea96994e53c36625 | OpenMutableBuffer   | 709                 
 |
| 844910ece80be8bc_7be09b71c487d5d3 | OpenMutableBuffer   | 6446333             
 |
| 844910ece80be8bc_3c0bd4c89186ca89 | OpenMutableBuffer   | 169275              
 |
| 844910ece80be8bc_a44813fabdadbc9d | OpenMutableBuffer   | 86700               
 |
| 844910ece80be8bc_f34028c22e8b8b37 | OpenMutableBuffer   | 5374                
 |
| 844910ece80be8bc_81c1fce8c36339dc | OpenMutableBuffer   | 23501               
 |
| 844910ece80be8bc_05ca7bca3d2e1000 | OpenMutableBuffer   | 90202               
 |
| 844910ece80be8bc_5d5a6ade9b665dfc | OpenMutableBuffer   | 636769              
 |
| 7f4b06e9112d7bc8_system%5Fusage   | ClosedMutableBuffer | 21003010            
 |
| 844910ece80be8bc_05d1e95653672000 | OpenMutableBuffer   | 2337719             
 |
| 844910ece80be8bc_7be09b71c487d5d3 | ClosedMutableBuffer | 799682176           
 |
+-----------------------------------+---------------------+----------------------+
{code}

It can be expressed in the same way using numbers to refer to other items in 
the select list.

However, this does not work today in DataFusion:

{code}
> select database_name, storage, sum(estimated_bytes) from chunks group by 1, 2;
Plan("Projection references non-aggregate values")
{code}



> [Rust] [DataFusion] Grouping by column position
> -----------------------------------------------
>
>                 Key: ARROW-10374
>                 URL: https://issues.apache.org/jira/browse/ARROW-10374
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust - DataFusion
>            Reporter: Pavel Tiunov
>            Priority: Major
>
> It would be great to have the support of grouping by column position instead 
> of grouping by exact expression. For example:
> {code:java}
> SELECT state, COUNT(*) FROM customers GROUP BY 1{code}
> For example, for a query like
> {code}
> > select database_name, storage, sum(estimated_bytes) from chunks group by 
> > database_name, storage;
> +-----------------------------------+---------------------+----------------------+
> | database_name                     | storage             | 
> SUM(estimated_bytes) |
> +-----------------------------------+---------------------+----------------------+
> | 844910ece80be8bc_cac95fa59126cd01 | OpenMutableBuffer   | 109737            
>    |
> | 844910ece80be8bc_05d1e95653672000 | OpenMutableBuffer   | 2337719           
>    |
> | 844910ece80be8bc_7be09b71c487d5d3 | ClosedMutableBuffer | 799682176         
>    |
> +-----------------------------------+---------------------+----------------------+
> {code}
> It can be expressed in the same way using numbers to refer to other items in 
> the select list.
> However, this does not work today in DataFusion:
> {code}
> > select database_name, storage, sum(estimated_bytes) from chunks group by 1, 
> > 2;
> Plan("Projection references non-aggregate values")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-10374) [Rust] [DataFusion] Grouping by column position

Reply via email to