manabu nagamine created DRILL-8294: -------------------------------------- Summary: ERROR: Hash aggregate does not support schema change Key: DRILL-8294 URL: https://issues.apache.org/jira/browse/DRILL-8294 Project: Apache Drill Issue Type: Bug Affects Versions: 1.20.2 Reporter: manabu nagamine Attachments: data_20220906.zip
I am having trouble with the following error in the aggregate process. java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema change Prior schema : BatchSchema [fields=[[`val8` (VARCHAR:REQUIRED)], [`val14` (VARCHAR:REQUIRED)], [`COL41117` (BIGINT:REQUIRED)]], selectionVector=NONE] New schema : BatchSchema [fields=[[`val8` (VARCHAR:REQUIRED)], [`val14` (VARCHAR:REQUIRED)], [`COL41117` (BIGINT:REQUIRED)]], selectionVector=NONE] It says it does not support schema changes, but I could not tell the difference between the schemas in the error message. The SELECT to execute. select val8 COL41134, COUNT() COL41117, COUNT(DISTINCT val14) COL41121 from hdfs.root.`/drill/data/test/*.parquet` WHERE LOG_DATE >= '2022-09-01 00:00:00.000000' and LOG_DATE <= '2022-09-01 23:59:59.000000' group by val8 order by COL41117 DESC, COL41121 DESC LIMIT 1000 The EXPLAIN of the query. {code:java} 00-00 Screen 00-01 Project(COL41134=[$0], COL41117=[$1], COL41121=[$2]) 00-02 SelectionVectorRemover 00-03 Limit(fetch=[1000]) 00-04 SelectionVectorRemover 00-05 Sort(sort0=[$1], sort1=[$2], dir0=[DESC], dir1=[DESC]) 00-06 HashAgg(group=[{0}], COL41117=[$SUM0($2)], COL41121=[COUNT($1)]) 00-07 StreamAgg(group=[{0, 1}], COL41117=[$SUM0($2)]) 00-08 StreamAgg(group=[{0, 1}], COL41117=[COUNT()]) 00-09 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-10 Scan(table=[[hdfs, root, /drill/data/test/*.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=hdfs://tstnss-hacluster/drill/data/test/140001.parquet], ReadEntryWithPath [path=hdfs://tstnss-hacluster/drill/data/test/140002.parquet], ReadEntryWithPath [path=hdfs://tstnss-hacluster/drill/data/test/140003.parquet], ReadEntryWithPath [path=hdfs://tstnss-hacluster/drill/data/test/140004.parquet], ReadEntryWithPath [path=hdfs://tstnss-hacluster/drill/data/test/140005.parquet], ReadEntryWithPath [path=hdfs://tstnss-hacluster/drill/data/test/140006.parquet], ReadEntryWithPath [path=hdfs://tstnss-hacluster/drill/data/test/140007.parquet], ReadEntryWithPath [path=hdfs://tstnss-hacluster/drill/data/test/140008.parquet], ReadEntryWithPath [path=hdfs://tstnss-hacluster/drill/data/test/140009.parquet], ReadEntryWithPath [path=hdfs://tstnss-hacluster/drill/data/test/140010.parquet], ReadEntryWithPath [path=hdfs://tstnss-hacluster/drill/data/test/140011.parquet]], selectionRoot=hdfs://tstnss-hacluster/drill/data/test, numFiles=11, numRowGroups=11, usedMetadataFile=false, usedMetastore=false, columns=[`val8`, `val14`]]]) {code} I changed the files in the directory to a quarter and tried it. The result was successful. EXPLAIN is the following. {code:java} 00-00 Screen 00-01 Project(COL41134=[$0], COL41117=[$1], COL41121=[$2]) 00-02 SelectionVectorRemover 00-03 Limit(fetch=[1000]) 00-04 SelectionVectorRemover 00-05 Sort(sort0=[$1], sort1=[$2], dir0=[DESC], dir1=[DESC]) 00-06 HashAgg(group=[{0}], COL41117=[$SUM0($2)], COL41121=[COUNT($1)]) 00-07 HashAgg(group=[{0, 1}], COL41117=[COUNT()]) 00-08 Scan(table=[[hdfs, root, /drill/data/test/*.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=hdfs://tstnss-hacluster/drill/data/test/140001.parquet], ReadEntryWithPath [path=hdfs://tstnss-hacluster/drill/data/test/140002.parquet], ReadEntryWithPath [path=hdfs://tstnss-hacluster/drill/data/test/140003.parquet], ReadEntryWithPath [path=hdfs://tstnss-hacluster/drill/data/test/140004.parquet]], selectionRoot=hdfs://tstnss-hacluster/drill/data/test, numFiles=4, numRowGroups=4, usedMetadataFile=false, usedMetastore=false, columns=[`val8`, `val14`]]]) {code} I also know that changing one of the following options to false will succeed. {code:java} alter session set `planner.enable_streamagg` = false; alter session set `planner.force_2phase_aggr` = false; {code} A parquet file is attached for verification. Thank you. -- This message was sent by Atlassian Jira (v8.20.10#820010)