Sungwoo Park created HIVE-29361:
-----------------------------------
Summary: 'analyze table compute statistics for columns' on
partitioned Iceberg tables fails with NullPointerException
Key: HIVE-29361
URL: https://issues.apache.org/jira/browse/HIVE-29361
Project: Hive
Issue Type: Bug
Affects Versions: 4.2.0
Reporter: Sungwoo Park
I found an error (NullPointerException) while updating statistics of TPC-DS
datasets stored on Iceberg.
These are steps taken to produce the error:
1. Load partitioned Iceberg tables, e.g.:
create table catalog_returns
( cr_returned_time_sk bigint, cr_item_sk bigint, cr_refunded_customer_sk
bigint, cr_refunded_cdemo_sk bigint, cr_refunded_hdemo_sk bigint,
cr_refunded_addr_sk bigint, cr_returning_customer_sk bigint,
cr_returning_cdemo_sk bigint, cr_returning_hdemo_sk bigint,
cr_returning_addr_sk bigint, cr_call_center_sk bigint, cr_catalog_page_sk
bigint, cr_ship_mode_sk bigint, cr_warehouse_sk bigint, cr_reason_sk bigint,
cr_order_number bigint, cr_return_quantity int, cr_return_amount double,
cr_return_tax double, cr_return_amt_inc_tax double, cr_fee double,
cr_return_ship_cost double, cr_refunded_cash double, cr_reversed_charge double,
cr_store_credit double, cr_net_loss double)
partitioned by (cr_returned_date_sk bigint)
STORED BY ICEBERG
stored as orc tblproperties ("orc.compress"="SNAPPY");
insert overwrite table catalog_returns select * from
tpcds_bin_partitioned_orc_10000.catalog_returns;
Loading Iceberg tables works okay, and I can run simple TPC-DS queries like
query 12.
2. Compute statistics by executing, e.g.:
analyze table catalog_returns compute statistics for columns;
Computing statistics seems okay, but updating statistics fails with the
following stack trace.
...
2025-12-07T17:52:57,450 INFO [HiveServer2-Background-Pool: Thread-277]
stats.BasicStatsTask: Partition {cr_returned_date_sk=2452924} stats:
[numFiles=1, numRows=25, totalSize=6266]
2025-12-07T17:52:57,665 INFO [HiveServer2-Background-Pool: Thread-277]
stats.BasicStatsTask: [Warning] could not update stats.Failed with exception
Unable to alter partition. java.lang.NullPointerException: Cannot invoke
"java.util.List.size()" because "vals" is null
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter partition.
java.lang.NullPointerException: Cannot invoke "java.util.List.size()" because
"vals" is null
at org.apache.hadoop.hive.ql.metadata.Hive.alterPartitions(Hive.java:1218)
at
org.apache.hadoop.hive.ql.stats.BasicStatsTask.aggregateStats(BasicStatsTask.java:406)
at
org.apache.hadoop.hive.ql.stats.BasicStatsTask.process(BasicStatsTask.java:108)
at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:111)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354)
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327)
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244)
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:347)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:191)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:144)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:139)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190)
at
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:234)
at
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:334)
at
java.base/java.security.AccessController.doPrivileged(AccessController.java:714)
at java.base/javax.security.auth.Subject.doAs(Subject.java:525)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
at
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:354)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: MetaException(message:java.lang.NullPointerException: Cannot invoke
"java.util.List.size()" because "vals" is null)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_partitions_req_result$alter_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_partitions_req_result$alter_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_partitions_req_result.read(ThriftHiveMetastore.java)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_partitions_req(ThriftHiveMetastore.java:4625)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_partitions_req(ThriftHiveMetastore.java:4612)
at
org.apache.hadoop.hive.metastore.client.ThriftHiveMetaStoreClient.alter_partitions(ThriftHiveMetaStoreClient.java:2382)
at
org.apache.hadoop.hive.metastore.client.MetaStoreClientWrapper.alter_partitions(MetaStoreClientWrapper.java:530)
at
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.alter_partitions(SessionHiveMetaStoreClient.java:1689)
at
org.apache.hadoop.hive.metastore.client.MetaStoreClientWrapper.alter_partitions(MetaStoreClientWrapper.java:530)
at
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at
org.apache.hadoop.hive.metastore.client.SynchronizedMetaStoreClient$SynchronizedHandler.invoke(SynchronizedMetaStoreClient.java:69)
at jdk.proxy2/jdk.proxy2.$Proxy32.alter_partitions(Unknown Source)
at
org.apache.hadoop.hive.metastore.client.MetaStoreClientWrapper.alter_partitions(MetaStoreClientWrapper.java:530)
at
org.apache.hadoop.hive.metastore.client.BaseMetaStoreClient.alter_partitions(BaseMetaStoreClient.java:620)
at
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:232)
at jdk.proxy2/jdk.proxy2.$Proxy32.alter_partitions(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.alterPartitions(Hive.java:1214)
... 25 more
Tested with Hive 4.2.0, Tez 0.10.5, Java 21.
For Iceberg, I used default values for most configuration keys (with
hive.iceberg.stats.source=iceberg).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)