Andrew Lamb created ARROW-9677:
----------------------------------
Summary: Aggregate queries that don't include group column in
select list error with "Projection references non-aggregate values"
Key: ARROW-9677
URL: https://issues.apache.org/jira/browse/ARROW-9677
Project: Apache Arrow
Issue Type: Bug
Reporter: Andrew Lamb
Reproducer (using the aggregate_test_100.csv data from the tests):
{code}
CREATE EXTERNAL TABLE aggregate_test_100 (
c1 VARCHAR NOT NULL,
c2 INT NOT NULL,
c3 SMALLINT NOT NULL,
c4 SMALLINT NOT NULL,
c5 INT NOT NULL,
c6 BIGINT NOT NULL,
c7 SMALLINT NOT NULL,
c8 INT NOT NULL,
c9 BIGINT NOT NULL,
c10 VARCHAR NOT NULL,
c11 FLOAT NOT NULL,
c12 DOUBLE NOT NULL,
c13 VARCHAR NOT NULL
)
STORED AS CSV
WITH HEADER ROW
LOCATION 'arrow/testing/data/csv/aggregate_test_100.csv';
{code}
And then run this query:
{code}
> select min(c3) from aggregate_test_100 group by c1;
{code}
h2. Actual behavior is an error
{code}
> select min(c3) from aggregate_test_100 group by c1;
General("Projection references non-aggregate values")
{code}
h3. Expected behavior is actual results
{code}
+---------+
| min(c3) |
+---------+
| -101 |
| -95 |
| -99 |
| -117 |
| -117 |
+---------+
{code}
Note, If you include the group key, c1, in the select list, then it does work:
{code}
> select min(c3), c1 from aggregate_test_100 group by c1;
+---------+----+
| min(c3) | c1 |
+---------+----+
| -101 | a |
| -95 | e |
| -99 | d |
| -117 | c |
| -117 | b |
+---------+----+
5 row in set. Query took 0 seconds.
{code}
Typically handling this kind of query requires that c1 is brought up in the in
plan, but is hidden in the final selection.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)