[
https://issues.apache.org/jira/browse/SPARK-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815016#comment-16815016
]
zhoukang commented on SPARK-26568:
----------------------------------
It is hive client we used in spark cause this problem [~srowen]
> Too many partitions may cause thriftServer frequently Full GC
> -------------------------------------------------------------
>
> Key: SPARK-26568
> URL: https://issues.apache.org/jira/browse/SPARK-26568
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.0
> Reporter: zhoukang
> Priority: Major
>
> The reason is that:
> first we have a table with many partitions(may be several hundred);second, we
> have some concurrent queries.Then the long-running thriftServer may encounter
> OOM issue.
> Here is a case:
> call stack of OOM thread:
> {code:java}
> pool-34-thread-10
> at
> org.apache.hadoop.hive.metastore.api.StorageDescriptor.<init>(Lorg/apache/hadoop/hive/metastore/api/StorageDescriptor;)V
> (StorageDescriptor.java:240)
> at
> org.apache.hadoop.hive.metastore.api.Partition.<init>(Lorg/apache/hadoop/hive/metastore/api/Partition;)V
> (Partition.java:216)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopy(Lorg/apache/hadoop/hive/metastore/api/Partition;)Lorg/apache/hadoop/hive/metastore/api/Partition;
> (HiveMetaStoreClient.java:1343)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopyPartitions(Ljava/util/Collection;Ljava/util/List;)Ljava/util/List;
> (HiveMetaStoreClient.java:1409)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopyPartitions(Ljava/util/List;)Ljava/util/List;
> (HiveMetaStoreClient.java:1397)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;S)Ljava/util/List;
> (HiveMetaStoreClient.java:914)
> at
> sun.reflect.GeneratedMethodAccessor98.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
> (Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
> (DelegatingMethodAccessorImpl.java:43)
> at
> java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
> (Method.java:606)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object;
> (RetryingMetaStoreClient.java:90)
> at
> com.sun.proxy.$Proxy30.listPartitionsByFilter(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;S)Ljava/util/List;
> (Unknown Source)
> at
> org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Lorg/apache/hadoop/hive/ql/metadata/Table;Ljava/lang/String;)Ljava/util/List;
> (Hive.java:1967)
> at
> sun.reflect.GeneratedMethodAccessor97.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
> (Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
> (DelegatingMethodAccessorImpl.java:43)
> at
> java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
> (Method.java:606)
> at
> org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(Lorg/apache/hadoop/hive/ql/metadata/Hive;Lorg/apache/hadoop/hive/ql/metadata/Table;Lscala/collection/Seq;)Lscala/collection/Seq;
> (HiveShim.scala:602)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply()Lscala/collection/Seq;
> (HiveClientImpl.scala:608)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply()Ljava/lang/Object;
> (HiveClientImpl.scala:606)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply()Ljava/lang/Object;
> (HiveClientImpl.scala:321)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(Lscala/Function0;Lscala/runtime/IntRef;Lscala/runtime/ObjectRef;Ljava/lang/Object;)V
> (HiveClientImpl.scala:264)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(Lscala/Function0;)Ljava/lang/Object;
> (HiveClientImpl.scala:263)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(Lscala/Function0;)Ljava/lang/Object;
> (HiveClientImpl.scala:307)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(Lorg/apache/spark/sql/catalyst/catalog/CatalogTable;Lscala/collection/Seq;)Lscala/collection/Seq;
> (HiveClientImpl.scala:606)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply()Lscala/collection/Seq;
> (HiveExternalCatalog.scala:1017)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply()Ljava/lang/Object;
> (HiveExternalCatalog.scala:1000)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(Lscala/Function0;)Ljava/lang/Object;
> (HiveExternalCatalog.scala:100)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(Ljava/lang/String;Ljava/lang/String;Lscala/collection/Seq;)Lscala/collection/Seq;
> (HiveExternalCatalog.scala:1000)
> at
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(Lorg/apache/spark/sql/catalyst/TableIdentifier;Lscala/collection/Seq;)Lscala/collection/Seq;
> (SessionCatalog.scala:803)
> at
> org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(Lscala/collection/Seq;)Lorg/apache/spark/sql/execution/datasources/InMemoryFileIndex;
> (CatalogFileIndex.scala:67)
> at
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(Lorg/apache/spark/sql/catalyst/plans/logical/LogicalPlan;Lscala/Function1;)Ljava/lang/Object;
> (PruneFileSourcePartitions.scala:59)
> at
> {code}
> Memory issue is caused by
> {code}
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopy
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]