[jira] [Created] (DRILL-8545) COLLECT_TO_LIST_VARCHAR function returns incorrect result when Hash Aggregator operator used

Maksym Rymar (Jira) Mon, 23 Mar 2026 10:47:16 -0700

Maksym Rymar created DRILL-8545:
-----------------------------------

             Summary: COLLECT_TO_LIST_VARCHAR function returns incorrect result 
when Hash Aggregator operator used
                 Key: DRILL-8545
                 URL: https://issues.apache.org/jira/browse/DRILL-8545
             Project: Apache Drill
          Issue Type: Bug
          Components: Functions - Drill
    Affects Versions: 1.21.2
            Reporter: Maksym Rymar
            Assignee: Maksym Rymar
             Fix For: 1.23.0



Drill returns an incorrect result for a query with the 
{{COLLECT_TO_LIST_VARCHAR}} function and the Hash Aggregator operator used 
during execution.

*Query:*
{code:java}
SELECT cars.color,
       COLLECT_TO_LIST_VARCHAR(cars.id)
FROM (
      VALUES (1, 'Volkswagen', 'RED'), 
               (2, 'Volvo', 'RED'), 
               (3, 'Hyundai', 'BLACK'), 
               (4, 'Toyota', 'BLACK'), 
               (5, 'Fiat', 'RED'), 
               (6, 'Pegeout', 'RED'), 
               (7, 'Cintroen', 'RED'), 
               (8, 'Audi', 'BLACK'), 
               (9, 'Porsche', 'BLACK')
      ) AS cars(id, brand, color) 
GROUP BY cars.color {code}
{*}Given result:{*}{*}{*}
{code:java}
+-------+-----------------------+
| color |        EXPR$1         |
+-------+-----------------------+
| RED   | ["1","2","5","6","7"] |
| BLACK | ["8","9"]             |
+-------+-----------------------+{code}
 

{*}Expected result:{*}{*}{*}
{code:java}
+-------+-----------------------+
| color |        EXPR$1         |
+-------+-----------------------+
| BLACK | ["3","4","8","9"]     |
| RED   | ["1","2","5","6","7"] |
+-------+-----------------------+ {code}
Query plan of the query:
{code:java}
00-00    Screen
00-01      Project(color=[$0], EXPR$1=[$1])
00-02        HashAgg(group=[{0}], EXPR$1=[COLLECT_TO_LIST_VARCHAR($1)])
00-03          Project(color=[$2], id=[$0])
00-04            Values(tuples=[[{ 1, 'Volkswagen', 'RED  ' }, { 2, 'Volvo     
', 'RED  ' }, { 3, 'Hyundai   ', 'BLACK' }, { 4, 'Toyota    ', 'BLACK' }, { 5, 
'Fiat      ', 'RED  ' }, { 6, 'Pegeout   ', 'RED  ' }, { 7, 'Cintroen  ', 'RED  
' }, { 8, 'Audi      ', 'BLACK' }, { 9, 'Porsche   ', 'BLACK' }]]) 


{code}
**

 

 

The same query executed without the Hash Aggregator operator returns the 
correct result:

 
{code:java}
apache drill> set `planner.enable_hashagg` = false;
+------+---------------------------------+
|  ok  |             summary             |
+------+---------------------------------+
| true | planner.enable_hashagg updated. |
+------+---------------------------------+
1 row selected (0.292 seconds)
apache drill> SELECT cars.color,
2..semicolon>        COLLECT_TO_LIST_VARCHAR(cars.id)
3..semicolon> FROM (
4..........)>       VALUES (1, 'Volkswagen', 'RED'),
5..........)>                (2, 'Volvo', 'RED'),
6..........)>                (3, 'Hyundai', 'BLACK'),
7..........)>                (4, 'Toyota', 'BLACK'),
8..........)>                (5, 'Fiat', 'RED'),
9..........)>                (6, 'Pegeout', 'RED'),
10.........)>                (7, 'Cintroen', 'RED'),
11.........)>                (8, 'Audi', 'BLACK'),
12.........)>                (9, 'Porsche', 'BLACK')
13.........)>       ) AS cars(id, brand, color)
14.semicolon> GROUP BY cars.color;
+-------+-----------------------+
| color |        EXPR$1         |
+-------+-----------------------+
| BLACK | ["3","4","8","9"]     |
| RED   | ["1","2","5","6","7"] |
+-------+-----------------------+
2 rows selected (8.577 seconds) {code}
 

 
h2. Steps to reproduce
 # Make sure Hash Aggregator is used in the execution plan. It can be forced by 
disabling Stream Aggregator operator:

{code:java}
set `planner.enable_streamagg` = false {code}
{code:java}
00-00    Screen
00-01      Project(color=[$0], EXPR$1=[$1])
00-02        HashAgg(group=[{0}], EXPR$1=[COLLECT_TO_LIST_VARCHAR($1)])
00-03          Project(color=[$2], id=[$0])
00-04            Values(tuples=[[{ 1, 'Volkswagen', 'RED  ' }, { 2, 'Volvo     
', 'RED  ' }, { 3, 'Hyundai   ', 'BLACK' }, { 4, 'Toyota    ', 'BLACK' }, { 5, 
'Fiat      ', 'RED  ' }, { 6, 'Pegeout   ', 'RED  ' }, { 7, 'Cintroen  ', 'RED  
' }, { 8, 'Audi      ', 'BLACK' }, { 9, 'Porsche   ', 'BLACK' }]]) {code}

 # Execute a query with aggregation and {{COLLECT_TO_LIST_VARCHAR}} function 
used:
{code:java}
SELECT cars.color,
       COLLECT_TO_LIST_VARCHAR(cars.id)
FROM (
      VALUES (1, 'Volkswagen', 'RED'), 
               (2, 'Volvo', 'RED'), 
               (3, 'Hyundai', 'BLACK'), 
               (4, 'Toyota', 'BLACK'), 
               (5, 'Fiat', 'RED'), 
               (6, 'Pegeout', 'RED'), 
               (7, 'Cintroen', 'RED'), 
               (8, 'Audi', 'BLACK'), 
               (9, 'Porsche', 'BLACK')
      ) AS cars(id, brand, color) 
GROUP BY cars.color{code}
 

 
 
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (DRILL-8545) COLLECT_TO_LIST_VARCHAR function returns incorrect result when Hash Aggregator operator used

Reply via email to