Sungwoo Park created HIVE-29361:
-----------------------------------

             Summary: 'analyze table compute statistics for columns' on 
partitioned Iceberg tables fails with NullPointerException
                 Key: HIVE-29361
                 URL: https://issues.apache.org/jira/browse/HIVE-29361
             Project: Hive
          Issue Type: Bug
    Affects Versions: 4.2.0
            Reporter: Sungwoo Park


I found an error (NullPointerException) while updating statistics of TPC-DS 
datasets stored on Iceberg.

These are steps taken to produce the error:

1. Load partitioned Iceberg tables, e.g.:

create table catalog_returns
( cr_returned_time_sk bigint, cr_item_sk bigint, cr_refunded_customer_sk 
bigint, cr_refunded_cdemo_sk bigint, cr_refunded_hdemo_sk bigint, 
cr_refunded_addr_sk bigint, cr_returning_customer_sk bigint, 
cr_returning_cdemo_sk bigint, cr_returning_hdemo_sk bigint, 
cr_returning_addr_sk bigint, cr_call_center_sk bigint, cr_catalog_page_sk 
bigint, cr_ship_mode_sk bigint, cr_warehouse_sk bigint, cr_reason_sk bigint, 
cr_order_number bigint, cr_return_quantity int, cr_return_amount double, 
cr_return_tax double, cr_return_amt_inc_tax double, cr_fee double, 
cr_return_ship_cost double, cr_refunded_cash double, cr_reversed_charge double, 
cr_store_credit double, cr_net_loss double)
partitioned by (cr_returned_date_sk bigint)
STORED BY ICEBERG
stored as orc tblproperties ("orc.compress"="SNAPPY");
insert overwrite table catalog_returns select * from 
tpcds_bin_partitioned_orc_10000.catalog_returns;

Loading Iceberg tables works okay, and I can run simple TPC-DS queries like 
query 12.

2. Compute statistics by executing, e.g.:

analyze table catalog_returns compute statistics for columns;

Computing statistics seems okay, but updating statistics fails with the 
following stack trace.

...
2025-12-07T17:52:57,450  INFO [HiveServer2-Background-Pool: Thread-277] 
stats.BasicStatsTask: Partition {cr_returned_date_sk=2452924} stats: 
[numFiles=1, numRows=25, totalSize=6266]
2025-12-07T17:52:57,665  INFO [HiveServer2-Background-Pool: Thread-277] 
stats.BasicStatsTask: [Warning] could not update stats.Failed with exception 
Unable to alter partition. java.lang.NullPointerException: Cannot invoke 
"java.util.List.size()" because "vals" is null
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter partition. 
java.lang.NullPointerException: Cannot invoke "java.util.List.size()" because 
"vals" is null
  at org.apache.hadoop.hive.ql.metadata.Hive.alterPartitions(Hive.java:1218)
  at 
org.apache.hadoop.hive.ql.stats.BasicStatsTask.aggregateStats(BasicStatsTask.java:406)
  at 
org.apache.hadoop.hive.ql.stats.BasicStatsTask.process(BasicStatsTask.java:108)
  at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:111)
  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
  at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
  at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354)
  at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327)
  at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244)
  at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105)
  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:347)
  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:191)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:144)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:139)
  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190)
  at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:234)
  at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:334)
  at 
java.base/java.security.AccessController.doPrivileged(AccessController.java:714)
  at java.base/javax.security.auth.Subject.doAs(Subject.java:525)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
  at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:354)
  at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
  at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
  at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
  at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
  at java.base/java.lang.Thread.run(Thread.java:1570)

Caused by: MetaException(message:java.lang.NullPointerException: Cannot invoke 
"java.util.List.size()" because "vals" is null)
  at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_partitions_req_result$alter_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
  at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_partitions_req_result$alter_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
  at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_partitions_req_result.read(ThriftHiveMetastore.java)
  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88)
  at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_partitions_req(ThriftHiveMetastore.java:4625)
  at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_partitions_req(ThriftHiveMetastore.java:4612)
  at 
org.apache.hadoop.hive.metastore.client.ThriftHiveMetaStoreClient.alter_partitions(ThriftHiveMetaStoreClient.java:2382)
  at 
org.apache.hadoop.hive.metastore.client.MetaStoreClientWrapper.alter_partitions(MetaStoreClientWrapper.java:530)
  at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.alter_partitions(SessionHiveMetaStoreClient.java:1689)
  at 
org.apache.hadoop.hive.metastore.client.MetaStoreClientWrapper.alter_partitions(MetaStoreClientWrapper.java:530)
  at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.base/java.lang.reflect.Method.invoke(Method.java:580)
  at 
org.apache.hadoop.hive.metastore.client.SynchronizedMetaStoreClient$SynchronizedHandler.invoke(SynchronizedMetaStoreClient.java:69)
  at jdk.proxy2/jdk.proxy2.$Proxy32.alter_partitions(Unknown Source)
  at 
org.apache.hadoop.hive.metastore.client.MetaStoreClientWrapper.alter_partitions(MetaStoreClientWrapper.java:530)
  at 
org.apache.hadoop.hive.metastore.client.BaseMetaStoreClient.alter_partitions(BaseMetaStoreClient.java:620)
  at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.base/java.lang.reflect.Method.invoke(Method.java:580)
  at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:232)
  at jdk.proxy2/jdk.proxy2.$Proxy32.alter_partitions(Unknown Source)
  at org.apache.hadoop.hive.ql.metadata.Hive.alterPartitions(Hive.java:1214)
  ... 25 more

Tested with Hive 4.2.0, Tez 0.10.5, Java 21.

For Iceberg, I used default values for most configuration keys (with 
hive.iceberg.stats.source=iceberg).






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to