Rajkumar Singh created HIVE-24469:
-------------------------------------
Summary: StatsTask failure while inserting the data into the table
partitioned by timestamp
Key: HIVE-24469
URL: https://issues.apache.org/jira/browse/HIVE-24469
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 4.0.0
Reporter: Rajkumar Singh
Steps to repro:
{code:java}
CREATE EXTERNAL TABLE `tblsource`(
`x` int,
`y` string)
STORED AS PARQUET;
CREATE EXTERNAL TABLE `tblinsert`(
`x` int)
PARTITIONED BY (
`y` timestamp)
STORED AS PARQUET;
insert into table tblsource values (5,'2020-11-06 00:00:00.000');
insert into table tblinsert partition(y) select * from tblsource distribute by
(y);
{code}
Query fail while executing the stats task and I can see the exception in HMS
{code:java}
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartColumnStatsWithMerge(HiveMetaStore.java:8629)
~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:8590)
~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
~[?:1.8.0_232]
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[?:1.8.0_232]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_232]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_232]
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at com.sun.proxy.$Proxy28.set_aggr_stats_for(Unknown Source) ~[?:?]
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:18937)
~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:18921)
~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at java.security.AccessController.doPrivileged(Native Method)
~[?:1.8.0_232]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_232]
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
~[hadoop-common-3.1.1.7.2.0.0-237.jar:?]
at
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_232]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_232]
{code}
I think the problem is with timestamp containing all 000 in nano seconds, after
inserting the value 2020-11-06 00:00:00.000, hive perform set_aggr_stats_for
and construct the SetPartitionsStatsRequest. during construction of the request
since nano seconds are all 0 hive FetchOperator convert the 2020-11-06
00:00:00.000 to 2020-11-06 00:00:00 ( Timestamp.valueOf(string)).
https://github.com/apache/hive/blob/f8aa55f9c8f22c4fd293d9531192f7f46099a420/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L176
on HMS
https://github.com/apache/hive/blob/2ab194d25311e15487ae010b8dd113879ccd501b/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L8626
does not yield any partition as the filter expression for partition was
2020-11-06 00:00:00 hence it fail with the above mentioned
IndexOutOfBoundsException.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)