[
https://issues.apache.org/jira/browse/ARROW-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Lamb updated ARROW-10374:
--------------------------------
Description:
It would be great to have the support of grouping by column position instead of
grouping by exact expression. For example:
{code:java}
SELECT state, COUNT(*) FROM customers GROUP BY 1{code}
For example, for a query like
{code}
> select database_name, storage, sum(estimated_bytes) from chunks group by
> database_name, storage;
+-----------------------------------+---------------------+----------------------+
| database_name | storage |
SUM(estimated_bytes) |
+-----------------------------------+---------------------+----------------------+
| 844910ece80be8bc_cac95fa59126cd01 | OpenMutableBuffer | 109737
|
| 844910ece80be8bc_05d1e95653672000 | OpenMutableBuffer | 2337719
|
| 844910ece80be8bc_7be09b71c487d5d3 | ClosedMutableBuffer | 799682176
|
+-----------------------------------+---------------------+----------------------+
{code}
It can be expressed in the same way using numbers to refer to other items in
the select list.
However, this does not work today in DataFusion:
{code}
> select database_name, storage, sum(estimated_bytes) from chunks group by 1, 2;
Plan("Projection references non-aggregate values")
{code}
was:
It would be great to have the support of grouping by column position instead of
grouping by exact expression. For example:
{code:java}
SELECT state, COUNT(*) FROM customers GROUP BY 1{code}
For example, for a query like
{code}
> select database_name, storage, sum(estimated_bytes) from chunks group by
> database_name, storage;
+-----------------------------------+---------------------+----------------------+
| database_name | storage |
SUM(estimated_bytes) |
+-----------------------------------+---------------------+----------------------+
| 844910ece80be8bc_cac95fa59126cd01 | OpenMutableBuffer | 109737
|
| 844910ece80be8bc_5403ba1b2193d41f | OpenMutableBuffer | 4633541
|
| 844910ece80be8bc_7caaf9613b16e7c2 | OpenMutableBuffer | 2083736
|
| InfluxData_%5Ftasks | OpenMutableBuffer | 824561
|
| 844910ece80be8bc_4bed41a6ff7f0ee0 | OpenMutableBuffer | 7033460
|
| 844910ece80be8bc_dfeaf11d9d194efd | OpenMutableBuffer | 528078
|
| 844910ece80be8bc_5403ba1b2193d41f | ClosedMutableBuffer | 21653483
|
| 7f4b06e9112d7bc8_system%5Fusage | OpenMutableBuffer | 9564613
|
| 844910ece80be8bc_eaec8df57a81a1e9 | OpenMutableBuffer | 3902092
|
| 844910ece80be8bc_ea96994e53c36625 | OpenMutableBuffer | 709
|
| 844910ece80be8bc_7be09b71c487d5d3 | OpenMutableBuffer | 6446333
|
| 844910ece80be8bc_3c0bd4c89186ca89 | OpenMutableBuffer | 169275
|
| 844910ece80be8bc_a44813fabdadbc9d | OpenMutableBuffer | 86700
|
| 844910ece80be8bc_f34028c22e8b8b37 | OpenMutableBuffer | 5374
|
| 844910ece80be8bc_81c1fce8c36339dc | OpenMutableBuffer | 23501
|
| 844910ece80be8bc_05ca7bca3d2e1000 | OpenMutableBuffer | 90202
|
| 844910ece80be8bc_5d5a6ade9b665dfc | OpenMutableBuffer | 636769
|
| 7f4b06e9112d7bc8_system%5Fusage | ClosedMutableBuffer | 21003010
|
| 844910ece80be8bc_05d1e95653672000 | OpenMutableBuffer | 2337719
|
| 844910ece80be8bc_7be09b71c487d5d3 | ClosedMutableBuffer | 799682176
|
+-----------------------------------+---------------------+----------------------+
{code}
It can be expressed in the same way using numbers to refer to other items in
the select list.
However, this does not work today in DataFusion:
{code}
> select database_name, storage, sum(estimated_bytes) from chunks group by 1, 2;
Plan("Projection references non-aggregate values")
{code}
> [Rust] [DataFusion] Grouping by column position
> -----------------------------------------------
>
> Key: ARROW-10374
> URL: https://issues.apache.org/jira/browse/ARROW-10374
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Rust - DataFusion
> Reporter: Pavel Tiunov
> Priority: Major
>
> It would be great to have the support of grouping by column position instead
> of grouping by exact expression. For example:
> {code:java}
> SELECT state, COUNT(*) FROM customers GROUP BY 1{code}
> For example, for a query like
> {code}
> > select database_name, storage, sum(estimated_bytes) from chunks group by
> > database_name, storage;
> +-----------------------------------+---------------------+----------------------+
> | database_name | storage |
> SUM(estimated_bytes) |
> +-----------------------------------+---------------------+----------------------+
> | 844910ece80be8bc_cac95fa59126cd01 | OpenMutableBuffer | 109737
> |
> | 844910ece80be8bc_05d1e95653672000 | OpenMutableBuffer | 2337719
> |
> | 844910ece80be8bc_7be09b71c487d5d3 | ClosedMutableBuffer | 799682176
> |
> +-----------------------------------+---------------------+----------------------+
> {code}
> It can be expressed in the same way using numbers to refer to other items in
> the select list.
> However, this does not work today in DataFusion:
> {code}
> > select database_name, storage, sum(estimated_bytes) from chunks group by 1,
> > 2;
> Plan("Projection references non-aggregate values")
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)