[ 
https://issues.apache.org/jira/browse/ARROW-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17187662#comment-17187662
 ] 

Andrew Lamb commented on ARROW-9520:
------------------------------------

Here is another reproducer (using the aggregate_test_100.csv data from the 
tests):

{code}
    CREATE EXTERNAL TABLE aggregate_test_100 (
        c1  VARCHAR NOT NULL,
        c2  INT NOT NULL,
        c3  SMALLINT NOT NULL,
        c4  SMALLINT NOT NULL,
        c5  INT NOT NULL,
        c6  BIGINT NOT NULL,
        c7  SMALLINT NOT NULL,
        c8  INT NOT NULL,
        c9  BIGINT NOT NULL,
        c10 VARCHAR NOT NULL,
        c11 FLOAT NOT NULL,
        c12 DOUBLE NOT NULL,
        c13 VARCHAR NOT NULL
    )
    STORED AS CSV
    WITH HEADER ROW
    LOCATION 'arrow/testing/data/csv/aggregate_test_100.csv';
{code}

And then run this query:
{code}
> select min(c3) from aggregate_test_100 group by c1;
{code}

h2. Actual behavior: Error

{code}
> select min(c3) from aggregate_test_100 group by c1;
General("Projection references non-aggregate values")
{code}

h2. Expected behavior: Results

{code}
+---------+
| min(c3) |
+---------+
| -101    |
| -95     |
| -99     |
| -117    |
| -117    |
+---------+
{code}


Note, If you include the group key, c1, in the select list, then it does work:
{code}
> select min(c3), c1 from aggregate_test_100 group by c1;
+---------+----+
| min(c3) | c1 |
+---------+----+
| -101    | a  |
| -95     | e  |
| -99     | d  |
| -117    | c  |
| -117    | b  |
+---------+----+
5 row in set. Query took 0 seconds.
{code}

Typically handling this kind of query requires that c1 is brought up in the in 
plan, but is hidden in the  final selection.


> [Rust] [DataFusion] Can't alias an aggregate expression
> -------------------------------------------------------
>
>                 Key: ARROW-9520
>                 URL: https://issues.apache.org/jira/browse/ARROW-9520
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust - DataFusion
>            Reporter: Jorge
>            Priority: Major
>
> The following test (on execute) fails:
> {code}
>     #[test]
>     fn aggregate_with_alias() -> Result<()> {
>         let results = execute("SELECT c1, COUNT(c2) AS count FROM test GROUP 
> BY c1", 4)?;
>         assert_eq!(field_names(batch), vec!["c1", "count"]);
>         let expected = vec!["0,10", "1,10", "2,10", "3,10"];
>         let mut rows = test::format_batch(&batch);
>         rows.sort();
>         assert_eq!(rows, expected);
>         Ok(())
>     }
> {code}
> The root cause is that, in {{sql::planner}}, we interpret {{COUNT(c2) AS 
> count}} as An {{Expr::Alias}}, which fails the {{is_aggregate_expr}} 
> condition, thus being interpreted as grouped expression instead of an 
> aggregated expression. This raises the Error
> {{General("Projection references non-aggregate values")}}
> The planner could interpret the statement above as two steps: an aggregation 
> followed by a projection. Alternatively, we can allow aliases to be valid 
> aggregation expressions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to