Maksym Rymar created DRILL-8545:
-----------------------------------
Summary: COLLECT_TO_LIST_VARCHAR function returns incorrect result
when Hash Aggregator operator used
Key: DRILL-8545
URL: https://issues.apache.org/jira/browse/DRILL-8545
Project: Apache Drill
Issue Type: Bug
Components: Functions - Drill
Affects Versions: 1.21.2
Reporter: Maksym Rymar
Assignee: Maksym Rymar
Fix For: 1.23.0
Drill returns an incorrect result for a query with the
{{COLLECT_TO_LIST_VARCHAR}} function and the Hash Aggregator operator used
during execution.
*Query:*
{code:java}
SELECT cars.color,
COLLECT_TO_LIST_VARCHAR(cars.id)
FROM (
VALUES (1, 'Volkswagen', 'RED'),
(2, 'Volvo', 'RED'),
(3, 'Hyundai', 'BLACK'),
(4, 'Toyota', 'BLACK'),
(5, 'Fiat', 'RED'),
(6, 'Pegeout', 'RED'),
(7, 'Cintroen', 'RED'),
(8, 'Audi', 'BLACK'),
(9, 'Porsche', 'BLACK')
) AS cars(id, brand, color)
GROUP BY cars.color {code}
{*}Given result:{*}{*}{*}
{code:java}
+-------+-----------------------+
| color | EXPR$1 |
+-------+-----------------------+
| RED | ["1","2","5","6","7"] |
| BLACK | ["8","9"] |
+-------+-----------------------+{code}
{*}Expected result:{*}{*}{*}
{code:java}
+-------+-----------------------+
| color | EXPR$1 |
+-------+-----------------------+
| BLACK | ["3","4","8","9"] |
| RED | ["1","2","5","6","7"] |
+-------+-----------------------+ {code}
Query plan of the query:
{code:java}
00-00 Screen
00-01 Project(color=[$0], EXPR$1=[$1])
00-02 HashAgg(group=[{0}], EXPR$1=[COLLECT_TO_LIST_VARCHAR($1)])
00-03 Project(color=[$2], id=[$0])
00-04 Values(tuples=[[{ 1, 'Volkswagen', 'RED ' }, { 2, 'Volvo
', 'RED ' }, { 3, 'Hyundai ', 'BLACK' }, { 4, 'Toyota ', 'BLACK' }, { 5,
'Fiat ', 'RED ' }, { 6, 'Pegeout ', 'RED ' }, { 7, 'Cintroen ', 'RED
' }, { 8, 'Audi ', 'BLACK' }, { 9, 'Porsche ', 'BLACK' }]])
{code}
**
The same query executed without the Hash Aggregator operator returns the
correct result:
{code:java}
apache drill> set `planner.enable_hashagg` = false;
+------+---------------------------------+
| ok | summary |
+------+---------------------------------+
| true | planner.enable_hashagg updated. |
+------+---------------------------------+
1 row selected (0.292 seconds)
apache drill> SELECT cars.color,
2..semicolon> COLLECT_TO_LIST_VARCHAR(cars.id)
3..semicolon> FROM (
4..........)> VALUES (1, 'Volkswagen', 'RED'),
5..........)> (2, 'Volvo', 'RED'),
6..........)> (3, 'Hyundai', 'BLACK'),
7..........)> (4, 'Toyota', 'BLACK'),
8..........)> (5, 'Fiat', 'RED'),
9..........)> (6, 'Pegeout', 'RED'),
10.........)> (7, 'Cintroen', 'RED'),
11.........)> (8, 'Audi', 'BLACK'),
12.........)> (9, 'Porsche', 'BLACK')
13.........)> ) AS cars(id, brand, color)
14.semicolon> GROUP BY cars.color;
+-------+-----------------------+
| color | EXPR$1 |
+-------+-----------------------+
| BLACK | ["3","4","8","9"] |
| RED | ["1","2","5","6","7"] |
+-------+-----------------------+
2 rows selected (8.577 seconds) {code}
h2. Steps to reproduce
# Make sure Hash Aggregator is used in the execution plan. It can be forced by
disabling Stream Aggregator operator:
{code:java}
set `planner.enable_streamagg` = false {code}
{code:java}
00-00 Screen
00-01 Project(color=[$0], EXPR$1=[$1])
00-02 HashAgg(group=[{0}], EXPR$1=[COLLECT_TO_LIST_VARCHAR($1)])
00-03 Project(color=[$2], id=[$0])
00-04 Values(tuples=[[{ 1, 'Volkswagen', 'RED ' }, { 2, 'Volvo
', 'RED ' }, { 3, 'Hyundai ', 'BLACK' }, { 4, 'Toyota ', 'BLACK' }, { 5,
'Fiat ', 'RED ' }, { 6, 'Pegeout ', 'RED ' }, { 7, 'Cintroen ', 'RED
' }, { 8, 'Audi ', 'BLACK' }, { 9, 'Porsche ', 'BLACK' }]]) {code}
# Execute a query with aggregation and {{COLLECT_TO_LIST_VARCHAR}} function
used:
{code:java}
SELECT cars.color,
COLLECT_TO_LIST_VARCHAR(cars.id)
FROM (
VALUES (1, 'Volkswagen', 'RED'),
(2, 'Volvo', 'RED'),
(3, 'Hyundai', 'BLACK'),
(4, 'Toyota', 'BLACK'),
(5, 'Fiat', 'RED'),
(6, 'Pegeout', 'RED'),
(7, 'Cintroen', 'RED'),
(8, 'Audi', 'BLACK'),
(9, 'Porsche', 'BLACK')
) AS cars(id, brand, color)
GROUP BY cars.color{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)