Stamatis Zampetakis created HIVE-24492: ------------------------------------------
Summary: SharedCache not able to estimate size for location field of TableWrapper Key: HIVE-24492 URL: https://issues.apache.org/jira/browse/HIVE-24492 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The following message appears various times in the logs indicating an error on estimating the size of some field of TableWrapper: {noformat} 2020-12-04T15:54:18,551 ERROR [CachedStore-CacheUpdateService: Thread-266] cache.SharedCache: Not able to estimate size java.lang.NullPointerException: null at sun.reflect.UnsafeFieldAccessorImpl.ensureObj(UnsafeFieldAccessorImpl.java:57) ~[?:1.8.0_261] at sun.reflect.UnsafeQualifiedObjectFieldAccessorImpl.get(UnsafeQualifiedObjectFieldAccessorImpl.java:38) ~[?:1.8.0_261] at java.lang.reflect.Field.get(Field.java:393) ~[?:1.8.0_261] at org.apache.hadoop.hive.ql.util.IncrementalObjectSizeEstimator$ObjectEstimator.estimate(IncrementalObjectSizeEstimator.java:399) ~[hive-storage-api-2.7.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.util.IncrementalObjectSizeEstimator$ObjectEstimator.estimate(IncrementalObjectSizeEstimator.java:386) ~[hive-storage-api-2.7.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.cache.SharedCache$TableWrapper.getTableWrapperSizeWithoutMaps(SharedCache.java:348) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.cache.SharedCache$TableWrapper.<init>(SharedCache.java:321) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.cache.SharedCache.createTableWrapper(SharedCache.java:1893) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.cache.SharedCache.populateTableInCache(SharedCache.java:1754) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.cache.CachedStore.prewarm(CachedStore.java:577) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.cache.CachedStore.triggerPreWarm(CachedStore.java:161) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.cache.CachedStore.access$600(CachedStore.java:90) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.metastore.cache.CachedStore$CacheUpdateMasterWork.run(CachedStore.java:767) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_261] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_261] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_261] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_261] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_261] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_261] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]{noformat} The message appears many times when running the TPC-DS perf tests: {noformat} mvn test -Dtest=TestTezTPCDS30TBPerfCliDriver{noformat} >From the stack trace it seems that we cannot estimate the size of a field >cause it is null. If the value of a field is null then we shouldn't attempt to estimate the size since it will always lead to a NPE. Furthermore, there is no need to estimate and we can simply count it as zero. Looking a bit deeper in this use-case the field which causes the NPE is {{TableWrapper#location}} which comes from the storage descriptor (SDS table in metastore). So should this field be null in the first place? The content of the metastore shows that this happens for technical tables: {noformat} version | db_version | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/db_version funcs | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/funcs key_constraints | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/key_constraints table_stats_view | columns | web_site | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_30000.db/web_site inventory_i | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_30000.db/inventory_i partition_stats_view | wm_resourceplans | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_resourceplans wm_triggers | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_triggers wm_pools | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_pools wm_pools_to_triggers | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_pools_to_triggers wm_mappings | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_mappings scheduled_queries | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/scheduled_queries scheduled_executions | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/scheduled_executions schemata | tables | table_privileges | column_privileges | views | scheduled_queries | scheduled_executions {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)