Bjorn Olsen created HIVE-21016: ---------------------------------- Summary: Duplicate column name in GROUP BY statement causing Vertex failures Key: HIVE-21016 URL: https://issues.apache.org/jira/browse/HIVE-21016 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Bjorn Olsen
Hive queries fail with "Vertex failure" messages when the user submits a query containing duplicate GROUP BY columns. The Hive query parser should detect and reject this scenario with a meaningful error message, rather than executing the query and failing with an obfuscated message. To repeat the issue, choose any table and perform a GROUP BY with a duplicate column name. For example: select count(*), party_id from party group by party_id, party_id; This will fail with messages similar to below: Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) 0000ffb9-5fb1-3024-922a-10cc313a7c171 at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:390) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:232) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) 0000ffb9-5fb1-3024-922a-10cc313a7c171 at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:454) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:381) ... 17 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector -- This message was sent by Atlassian JIRA (v7.6.3#76005)