[
https://issues.apache.org/jira/browse/HIVE-24404?focusedWorklogId=834806&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-834806
]
ASF GitHub Bot logged work on HIVE-24404:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 20/Dec/22 14:38
Start Date: 20/Dec/22 14:38
Worklog Time Spent: 10m
Work Description: abstractdog commented on PR #1685:
URL: https://github.com/apache/hive/pull/1685#issuecomment-1359465994
started a PR against master: https://github.com/apache/hive/pull/3883
Issue Time Tracking
-------------------
Worklog Id: (was: 834806)
Time Spent: 1h 20m (was: 1h 10m)
> Hive getUserName close db makes client operations lost metaStoreClient
> connection
> ---------------------------------------------------------------------------------
>
> Key: HIVE-24404
> URL: https://issues.apache.org/jira/browse/HIVE-24404
> Project: Hive
> Issue Type: Bug
> Components: Clients
> Affects Versions: 2.3.7, 3.1.3, 4.0.0-alpha-1
> Environment: os: centos 7
> spark: 3.0.1
> hive: 2.3.7
> Reporter: Lichuanliang
> Assignee: László Bodor
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> I'm using spark to execute a drop partition sql will always encounter a lost
> metastore connection warning.
> Spark ql:
> {code:java}
> alter table mydb.some_table drop if exists partition(dt = '2020-11-12',hh =
> '17');
> {code}
> Execution log:
> {code:java}
> 20/11/12 19:37:57 WARN SessionState: METASTORE_FILTER_HOOK will be ignored,
> since hive.security.authorization.manager is set to instance of
> HiveAuthorizerFactory.20/11/12 19:37:57 WARN SessionState:
> METASTORE_FILTER_HOOK will be ignored, since
> hive.security.authorization.manager is set to instance of
> HiveAuthorizerFactory.20/11/12 19:37:57 WARN RetryingMetaStoreClient:
> MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s.
> listPartitionsWithAuthInfoorg.apache.thrift.transport.TTransportException:
> Cannot write to null outputStream at
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:142)
> at
> org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.java:185)
> at
> org.apache.thrift.protocol.TBinaryProtocol.writeMessageBegin(TBinaryProtocol.java:116)
> at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:70) at
> org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62) at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.send_get_partitions_ps_with_auth(ThriftHiveMetastore.java:2562)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_ps_with_auth(ThriftHiveMetastore.java:2549)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsWithAuthInfo(HiveMetaStoreClient.java:1209)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
> at com.sun.proxy.$Proxy32.listPartitionsWithAuthInfo(Unknown Source) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2336)
> at com.sun.proxy.$Proxy32.listPartitionsWithAuthInfo(Unknown Source) at
> org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:2555) at
> org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:2581) at
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$dropPartitions$2(HiveClientImpl.scala:628)
> at
> scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at
> scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) at
> scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) at
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:108) at
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$dropPartitions$1(HiveClientImpl.scala:622)
> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.dropPartitions(HiveClientImpl.scala:617)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$dropPartitions$1(HiveExternalCatalog.scala:1018)
> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:103)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.dropPartitions(HiveExternalCatalog.scala:1015)
> at
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.dropPartitions(ExternalCatalogWithListener.scala:211)
> at
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.dropPartitions(SessionCatalog.scala:988)
> at
> org.apache.spark.sql.execution.command.AlterTableDropPartitionCommand.run(ddl.scala:581)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) at
> org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618) at
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
> at
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
> at
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
> at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616) at
> org.apache.spark.sql.Dataset.<init>(Dataset.scala:229) at
> org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) at
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at
> org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) at
> org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607) at
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at
> org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602) at
> org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650) at
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:64)
> at
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377)
> at
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496)
> at
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:490)
> at scala.collection.Iterator.foreach(Iterator.scala:941) at
> scala.collection.Iterator.foreach$(Iterator.scala:941) at
> scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at
> scala.collection.IterableLike.foreach(IterableLike.scala:74) at
> scala.collection.IterableLike.foreach$(IterableLike.scala:73) at
> scala.collection.AbstractIterable.foreach(Iterable.scala:56) at
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) at
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474) at
> org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490) at
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:208)
> at
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Response codeTime
> taken: 4.192 seconds
> {code}
> The problem raised from the getPartitions method, where the metastore client
> getMSC get will later been closed by the getUserName method. The getUserName
> method will close the current metastore in during setAuth and cause the
> underlying thrift transport closed.
> {code:java}
> public List<Partition> getPartitions(Table tbl, Map<String, String>
> partialPartSpec,
> short limit)
> throws HiveException {
> if (!tbl.isPartitioned()) {
> throw new HiveException(ErrorMsg.TABLE_NOT_PARTITIONED,
> tbl.getTableName());
> }
> List<String> partialPvals = MetaStoreUtils.getPvals(tbl.getPartCols(),
> partialPartSpec);
> List<org.apache.hadoop.hive.metastore.api.Partition> partitions = null;
> try {
> partitions = getMSC().listPartitionsWithAuthInfo(tbl.getDbName(),
> tbl.getTableName(),
> partialPvals, limit, getUserName(), getGroupNames());
> } catch (Exception e) {
> throw new HiveException(e);
> }
> List<Partition> qlPartitions = new ArrayList<Partition>();
> for (org.apache.hadoop.hive.metastore.api.Partition p : partitions) {
> qlPartitions.add( new Partition(tbl, p));
> }
> return qlPartitions;
> }{code}
> I found another guy have raised the same issue for spark at a older version.
> https://issues.apache.org/jira/browse/SPARK-29409
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)