manabu nagamine created DRILL-8294:
--------------------------------------

             Summary: ERROR: Hash aggregate does not support schema change
                 Key: DRILL-8294
                 URL: https://issues.apache.org/jira/browse/DRILL-8294
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.20.2
            Reporter: manabu nagamine
         Attachments: data_20220906.zip

I am having trouble with the following error in the aggregate process.
java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not 
support schema change
Prior schema : 
BatchSchema [fields=[[`val8` (VARCHAR:REQUIRED)], [`val14` (VARCHAR:REQUIRED)], 
[`COL41117` (BIGINT:REQUIRED)]], selectionVector=NONE]
New schema : 
BatchSchema [fields=[[`val8` (VARCHAR:REQUIRED)], [`val14` (VARCHAR:REQUIRED)], 
[`COL41117` (BIGINT:REQUIRED)]], selectionVector=NONE]
It says it does not support schema changes, but I could not tell the difference 
between the schemas in the error message.
The SELECT to execute.
select
    val8 COL41134,
    COUNT() COL41117,
    COUNT(DISTINCT val14) COL41121
from
    hdfs.root.`/drill/data/test/*.parquet`
WHERE
     LOG_DATE >= '2022-09-01 00:00:00.000000' and LOG_DATE <= '2022-09-01 
23:59:59.000000'
group by
    val8
order by
    COL41117 DESC,
    COL41121 DESC
LIMIT 1000
The EXPLAIN of the query. 
{code:java}
00-00    Screen
00-01      Project(COL41134=[$0], COL41117=[$1], COL41121=[$2])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[1000])
00-04            SelectionVectorRemover
00-05              Sort(sort0=[$1], sort1=[$2], dir0=[DESC], dir1=[DESC])
00-06                HashAgg(group=[{0}], COL41117=[$SUM0($2)], 
COL41121=[COUNT($1)])
00-07                  StreamAgg(group=[{0, 1}], COL41117=[$SUM0($2)])
00-08                    StreamAgg(group=[{0, 1}], COL41117=[COUNT()])
00-09                      Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
00-10                        Scan(table=[[hdfs, root, 
/drill/data/test/*.parquet]], groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath 
[path=hdfs://tstnss-hacluster/drill/data/test/140001.parquet], 
ReadEntryWithPath 
[path=hdfs://tstnss-hacluster/drill/data/test/140002.parquet], 
ReadEntryWithPath 
[path=hdfs://tstnss-hacluster/drill/data/test/140003.parquet], 
ReadEntryWithPath 
[path=hdfs://tstnss-hacluster/drill/data/test/140004.parquet], 
ReadEntryWithPath 
[path=hdfs://tstnss-hacluster/drill/data/test/140005.parquet], 
ReadEntryWithPath 
[path=hdfs://tstnss-hacluster/drill/data/test/140006.parquet], 
ReadEntryWithPath 
[path=hdfs://tstnss-hacluster/drill/data/test/140007.parquet], 
ReadEntryWithPath 
[path=hdfs://tstnss-hacluster/drill/data/test/140008.parquet], 
ReadEntryWithPath 
[path=hdfs://tstnss-hacluster/drill/data/test/140009.parquet], 
ReadEntryWithPath 
[path=hdfs://tstnss-hacluster/drill/data/test/140010.parquet], 
ReadEntryWithPath 
[path=hdfs://tstnss-hacluster/drill/data/test/140011.parquet]], 
selectionRoot=hdfs://tstnss-hacluster/drill/data/test, numFiles=11, 
numRowGroups=11, usedMetadataFile=false, usedMetastore=false, columns=[`val8`, 
`val14`]]]) {code}
 
I changed the files in the directory to a quarter and tried it.
The result was successful.
EXPLAIN is the following.
{code:java}
00-00    Screen
00-01      Project(COL41134=[$0], COL41117=[$1], COL41121=[$2])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[1000])
00-04            SelectionVectorRemover
00-05              Sort(sort0=[$1], sort1=[$2], dir0=[DESC], dir1=[DESC])
00-06                HashAgg(group=[{0}], COL41117=[$SUM0($2)], 
COL41121=[COUNT($1)])
00-07                  HashAgg(group=[{0, 1}], COL41117=[COUNT()])
00-08                    Scan(table=[[hdfs, root, /drill/data/test/*.parquet]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=hdfs://tstnss-hacluster/drill/data/test/140001.parquet], 
ReadEntryWithPath 
[path=hdfs://tstnss-hacluster/drill/data/test/140002.parquet], 
ReadEntryWithPath 
[path=hdfs://tstnss-hacluster/drill/data/test/140003.parquet], 
ReadEntryWithPath 
[path=hdfs://tstnss-hacluster/drill/data/test/140004.parquet]], 
selectionRoot=hdfs://tstnss-hacluster/drill/data/test, numFiles=4, 
numRowGroups=4, usedMetadataFile=false, usedMetastore=false, columns=[`val8`, 
`val14`]]]) {code}
I also know that changing one of the following options to false will succeed.
{code:java}
alter session set `planner.enable_streamagg` = false;
alter session set `planner.force_2phase_aggr` = false; {code}
A parquet file is attached for verification.
Thank you.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to