[jira] [Updated] (HIVE-24492) SharedCache not able to estimate size for location field of TableWrapper

Stamatis Zampetakis (Jira) Sat, 05 Dec 2020 04:32:06 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-24492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stamatis Zampetakis updated HIVE-24492:
---------------------------------------
    Description: 
The following message appears various times in the logs indicating an error on 
estimating the size of some field of TableWrapper:
{noformat}
2020-12-04T15:54:18,551 ERROR [CachedStore-CacheUpdateService: Thread-266] 
cache.SharedCache: Not able to estimate size
java.lang.NullPointerException: null
        at 
sun.reflect.UnsafeFieldAccessorImpl.ensureObj(UnsafeFieldAccessorImpl.java:57) 
~[?:1.8.0_261]
        at 
sun.reflect.UnsafeQualifiedObjectFieldAccessorImpl.get(UnsafeQualifiedObjectFieldAccessorImpl.java:38)
 ~[?:1.8.0_261]
        at java.lang.reflect.Field.get(Field.java:393) ~[?:1.8.0_261]
        at 
org.apache.hadoop.hive.ql.util.IncrementalObjectSizeEstimator$ObjectEstimator.estimate(IncrementalObjectSizeEstimator.java:399)
 ~[hive-storage-api-2.7.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.util.IncrementalObjectSizeEstimator$ObjectEstimator.estimate(IncrementalObjectSizeEstimator.java:386)
 ~[hive-storage-api-2.7.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.SharedCache$TableWrapper.getTableWrapperSizeWithoutMaps(SharedCache.java:348)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.SharedCache$TableWrapper.<init>(SharedCache.java:321)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.SharedCache.createTableWrapper(SharedCache.java:1893)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.SharedCache.populateTableInCache(SharedCache.java:1754)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.CachedStore.prewarm(CachedStore.java:577)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.CachedStore.triggerPreWarm(CachedStore.java:161)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.CachedStore.access$600(CachedStore.java:90)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.CachedStore$CacheUpdateMasterWork.run(CachedStore.java:767)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_261]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[?:1.8.0_261]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_261]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [?:1.8.0_261]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_261]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_261]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]{noformat}
The message appears many times when running the TPC-DS perf tests:
{noformat}
mvn test -Dtest=TestTezTPCDS30TBPerfCliDriver{noformat}
>From the stack trace it seems that we cannot estimate the size of a field 
>cause it is null.

If the value of a field is null then we shouldn't attempt to estimate the size 
since it will always lead to a NPE. Furthermore, there is no need to estimate 
and we can simply count it as zero.

Looking a bit deeper in this use-case the field which causes the NPE is 
{{TableWrapper#location}} which comes from the storage descriptor (SDS table in 
metastore). So should this field be null in the first place?

The content of the metastore shows that this happens for technical tables such 
as version, schemata, tables, table_privileges, etc:
{noformat}
version                   | 
 db_version                | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/db_version
 funcs                     | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/funcs
 key_constraints           | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/key_constraints
 table_stats_view          | 
 columns                   | 
 web_site                  | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_30000.db/web_site
 inventory_i               | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_30000.db/inventory_i
 partition_stats_view      | 
 wm_resourceplans          | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_resourceplans
 wm_triggers               | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_triggers
 wm_pools                  | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_pools
 wm_pools_to_triggers      | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_pools_to_triggers
 wm_mappings               | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_mappings
 scheduled_queries         | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/scheduled_queries
 scheduled_executions      | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/scheduled_executions
 schemata                  | 
 tables                    | 
 table_privileges          | 
 column_privileges         | 
 views                     | 
 scheduled_queries         | 
{noformat}
 but I didn't investigate how we can end up with this situation.

 

  was:
The following message appears various times in the logs indicating an error on 
estimating the size of some field of TableWrapper:
{noformat}
2020-12-04T15:54:18,551 ERROR [CachedStore-CacheUpdateService: Thread-266] 
cache.SharedCache: Not able to estimate size
java.lang.NullPointerException: null
        at 
sun.reflect.UnsafeFieldAccessorImpl.ensureObj(UnsafeFieldAccessorImpl.java:57) 
~[?:1.8.0_261]
        at 
sun.reflect.UnsafeQualifiedObjectFieldAccessorImpl.get(UnsafeQualifiedObjectFieldAccessorImpl.java:38)
 ~[?:1.8.0_261]
        at java.lang.reflect.Field.get(Field.java:393) ~[?:1.8.0_261]
        at 
org.apache.hadoop.hive.ql.util.IncrementalObjectSizeEstimator$ObjectEstimator.estimate(IncrementalObjectSizeEstimator.java:399)
 ~[hive-storage-api-2.7.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.util.IncrementalObjectSizeEstimator$ObjectEstimator.estimate(IncrementalObjectSizeEstimator.java:386)
 ~[hive-storage-api-2.7.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.SharedCache$TableWrapper.getTableWrapperSizeWithoutMaps(SharedCache.java:348)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.SharedCache$TableWrapper.<init>(SharedCache.java:321)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.SharedCache.createTableWrapper(SharedCache.java:1893)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.SharedCache.populateTableInCache(SharedCache.java:1754)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.CachedStore.prewarm(CachedStore.java:577)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.CachedStore.triggerPreWarm(CachedStore.java:161)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.CachedStore.access$600(CachedStore.java:90)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.metastore.cache.CachedStore$CacheUpdateMasterWork.run(CachedStore.java:767)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_261]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[?:1.8.0_261]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_261]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [?:1.8.0_261]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_261]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_261]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]{noformat}
The message appears many times when running the TPC-DS perf tests:
{noformat}
mvn test -Dtest=TestTezTPCDS30TBPerfCliDriver{noformat}
>From the stack trace it seems that we cannot estimate the size of a field 
>cause it is null.

If the value of a field is null then we shouldn't attempt to estimate the size 
since it will always lead to a NPE. Furthermore, there is no need to estimate 
and we can simply count it as zero.

Looking a bit deeper in this use-case the field which causes the NPE is 
{{TableWrapper#location}} which comes from the storage descriptor (SDS table in 
metastore). So should this field be null in the first place?

The content of the metastore shows that this happens for technical tables:

{noformat}
version                   | 
 db_version                | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/db_version
 funcs                     | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/funcs
 key_constraints           | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/key_constraints
 table_stats_view          | 
 columns                   | 
 web_site                  | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_30000.db/web_site
 inventory_i               | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_30000.db/inventory_i
 partition_stats_view      | 
 wm_resourceplans          | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_resourceplans
 wm_triggers               | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_triggers
 wm_pools                  | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_pools
 wm_pools_to_triggers      | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_pools_to_triggers
 wm_mappings               | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_mappings
 scheduled_queries         | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/scheduled_queries
 scheduled_executions      | 
hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/scheduled_executions
 schemata                  | 
 tables                    | 
 table_privileges          | 
 column_privileges         | 
 views                     | 
 scheduled_queries         | 
 scheduled_executions
{noformat}


 

 


> SharedCache not able to estimate size for location field of TableWrapper
> ------------------------------------------------------------------------
>
>                 Key: HIVE-24492
>                 URL: https://issues.apache.org/jira/browse/HIVE-24492
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>
> The following message appears various times in the logs indicating an error 
> on estimating the size of some field of TableWrapper:
> {noformat}
> 2020-12-04T15:54:18,551 ERROR [CachedStore-CacheUpdateService: Thread-266] 
> cache.SharedCache: Not able to estimate size
> java.lang.NullPointerException: null
>         at 
> sun.reflect.UnsafeFieldAccessorImpl.ensureObj(UnsafeFieldAccessorImpl.java:57)
>  ~[?:1.8.0_261]
>         at 
> sun.reflect.UnsafeQualifiedObjectFieldAccessorImpl.get(UnsafeQualifiedObjectFieldAccessorImpl.java:38)
>  ~[?:1.8.0_261]
>         at java.lang.reflect.Field.get(Field.java:393) ~[?:1.8.0_261]
>         at 
> org.apache.hadoop.hive.ql.util.IncrementalObjectSizeEstimator$ObjectEstimator.estimate(IncrementalObjectSizeEstimator.java:399)
>  ~[hive-storage-api-2.7.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.util.IncrementalObjectSizeEstimator$ObjectEstimator.estimate(IncrementalObjectSizeEstimator.java:386)
>  ~[hive-storage-api-2.7.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.metastore.cache.SharedCache$TableWrapper.getTableWrapperSizeWithoutMaps(SharedCache.java:348)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.metastore.cache.SharedCache$TableWrapper.<init>(SharedCache.java:321)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.metastore.cache.SharedCache.createTableWrapper(SharedCache.java:1893)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.metastore.cache.SharedCache.populateTableInCache(SharedCache.java:1754)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.metastore.cache.CachedStore.prewarm(CachedStore.java:577)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.metastore.cache.CachedStore.triggerPreWarm(CachedStore.java:161)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.metastore.cache.CachedStore.access$600(CachedStore.java:90)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.metastore.cache.CachedStore$CacheUpdateMasterWork.run(CachedStore.java:767)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_261]
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [?:1.8.0_261]
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [?:1.8.0_261]
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [?:1.8.0_261]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_261]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_261]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]{noformat}
> The message appears many times when running the TPC-DS perf tests:
> {noformat}
> mvn test -Dtest=TestTezTPCDS30TBPerfCliDriver{noformat}
> From the stack trace it seems that we cannot estimate the size of a field 
> cause it is null.
> If the value of a field is null then we shouldn't attempt to estimate the 
> size since it will always lead to a NPE. Furthermore, there is no need to 
> estimate and we can simply count it as zero.
> Looking a bit deeper in this use-case the field which causes the NPE is 
> {{TableWrapper#location}} which comes from the storage descriptor (SDS table 
> in metastore). So should this field be null in the first place?
> The content of the metastore shows that this happens for technical tables 
> such as version, schemata, tables, table_privileges, etc:
> {noformat}
> version                   | 
>  db_version                | 
> hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/db_version
>  funcs                     | 
> hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/funcs
>  key_constraints           | 
> hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/key_constraints
>  table_stats_view          | 
>  columns                   | 
>  web_site                  | 
> hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_30000.db/web_site
>  inventory_i               | 
> hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_30000.db/inventory_i
>  partition_stats_view      | 
>  wm_resourceplans          | 
> hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_resourceplans
>  wm_triggers               | 
> hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_triggers
>  wm_pools                  | 
> hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_pools
>  wm_pools_to_triggers      | 
> hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_pools_to_triggers
>  wm_mappings               | 
> hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_mappings
>  scheduled_queries         | 
> hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/scheduled_queries
>  scheduled_executions      | 
> hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/scheduled_executions
>  schemata                  | 
>  tables                    | 
>  table_privileges          | 
>  column_privileges         | 
>  views                     | 
>  scheduled_queries         | 
> {noformat}
>  but I didn't investigate how we can end up with this situation.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24492) SharedCache not able to estimate size for location field of TableWrapper

Reply via email to