[
https://issues.apache.org/jira/browse/HIVE-21016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bjorn Olsen updated HIVE-21016:
-------------------------------
Description:
Hive queries fail with "Vertex failure" messages when the user submits a query
containing duplicate GROUP BY columns. The Hive query parser should detect and
reject this scenario with a meaningful error message, rather than executing the
query and failing with an obfuscated message. For complex queries this can
result in a lot of debugging effort, whereas a simple error message could have
saved some time.
To repeat the issue, choose any table and perform a GROUP BY with a duplicate
column name.
{{For example:}}
select count( * ), party_id from party {{group by party_id, party_id;}}
Note the duplicate column in the GROUP BY.
This will fail with messages similar to below:
Caused by: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing vector batch (tag=0) 0000ffb9-5fb1-3024-922a-10cc313a7c171
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:390)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:232)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing vector batch (tag=0)
0000ffb9-5fb1-3024-922a-10cc313a7c171
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:454)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:381)
... 17 more
*Caused by: java.lang.ClassCastException:
org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector*
was:
Hive queries fail with "Vertex failure" messages when the user submits a query
containing duplicate GROUP BY columns. The Hive query parser should detect and
reject this scenario with a meaningful error message, rather than executing the
query and failing with an obfuscated message. For complex queries this can
result in a lot of debugging effort, whereas a simple error message could have
saved some time.
To repeat the issue, choose any table and perform a GROUP BY with a duplicate
column name.
{{For example:}}
select count( * ), party_id from party {{group by party_id, party_id;}}
Note the duplicate column in the GROUP BY.
This will fail with messages similar to below:
Caused by: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing vector batch (tag=0) 0000ffb9-5fb1-3024-922a-10cc313a7c171
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:390)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:232)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing vector batch (tag=0)
0000ffb9-5fb1-3024-922a-10cc313a7c171
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:454)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:381)
... 17 more
Caused by: java.lang.ClassCastException:
org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
> Duplicate column name in GROUP BY statement causing Vertex failures
> -------------------------------------------------------------------
>
> Key: HIVE-21016
> URL: https://issues.apache.org/jira/browse/HIVE-21016
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 1.2.1
> Reporter: Bjorn Olsen
> Priority: Major
>
> Hive queries fail with "Vertex failure" messages when the user submits a
> query containing duplicate GROUP BY columns. The Hive query parser should
> detect and reject this scenario with a meaningful error message, rather than
> executing the query and failing with an obfuscated message. For complex
> queries this can result in a lot of debugging effort, whereas a simple error
> message could have saved some time.
> To repeat the issue, choose any table and perform a GROUP BY with a duplicate
> column name.
> {{For example:}}
> select count( * ), party_id from party {{group by party_id, party_id;}}
> Note the duplicate column in the GROUP BY.
> This will fail with messages similar to below:
> Caused by: java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing vector batch (tag=0) 0000ffb9-5fb1-3024-922a-10cc313a7c171
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:390)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:232)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> ... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
> Error while processing vector batch (tag=0)
> 0000ffb9-5fb1-3024-922a-10cc313a7c171
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:454)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:381)
> ... 17 more
> *Caused by: java.lang.ClassCastException:
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector*
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)