[jira] [Assigned] (SPARK-39485) When fetching hiveMetastoreJars from path, IsolatedClientLoader should get hive settings from origLoader.

Apache Spark (Jira) Wed, 15 Jun 2022 17:59:47 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-39485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apache Spark reassigned SPARK-39485:
------------------------------------

    Assignee:     (was: Apache Spark)

> When fetching hiveMetastoreJars from path, IsolatedClientLoader should get 
> hive settings from origLoader.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-39485
>                 URL: https://issues.apache.org/jira/browse/SPARK-39485
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.1
>            Reporter: SeongHoon Ku
>            Priority: Major
>
> Hi all, 
> I made a spark application where deploy-mode is YARN cluster and 
> spark.sql.hive.metastore.jars is path and hive metastore version is 2.3.2.
> And "spark.yarn.dist.files" was set so that the driver could refer to 
> hive-related xml files in cluster mode.
> {code}
> spark.yarn.dist.files 
> viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hive-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hivemetastore-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hiveserver2-site.xml
> {code}
> application failed with the following error.
> {code}
> 22/06/14 13:51:46 INFO yarn.ApplicationMaster: Unregistering 
> ApplicationMaster with FAILED (diag message: User class threw exception: 
> org.apache.spark.sql.AnalysisException: 
> java.lang.ExceptionInInitializerError: null
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:111)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224)
>       at 
> org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150)
>       at 
> org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140)
>       at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45)
>       at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60)
>       at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118)
>       at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118)
>       at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:298)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:205)
>       at 
> org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:42)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
>       at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>       at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
>       at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
>       at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
>       at 
> org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
>       at 
> org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
>       at 
> org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
>       at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
>       at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
>       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>       at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
>       at 
> org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
>       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>       at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
>       at buildExample.helloworld$.main(helloworld.scala:16)
>       at buildExample.helloworld.main(helloworld.scala)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:737)
> Caused by: java.lang.ExceptionInInitializerError
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.newHiveConf(HiveClientImpl.scala:1257)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:168)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:130)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:313)
>       at 
> org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:496)
>       at 
> org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:356)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224)
>       at 
> scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
>       ... 47 more
> Caused by: java.lang.IllegalArgumentException: URI scheme is not "file"
>       at java.io.File.<init>(File.java:421)
>       at 
> org.apache.hadoop.hive.conf.HiveConf.findConfigFile(HiveConf.java:177)
>       at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:145)
>       ... 62 more
> )
> {code}
> When IsolatedClientLoader is initialized, while creating a new classloader, 
> URL information about resources such as hive-site.xml that was previously 
> referenced disappeared, and an error occurred in the findConfigFile function.
> It works well in client mode because HIVE_HOME and HIVE_CONF_DIR are set in 
> the host where the spark driver is running, so it finds hive setting files 
> well such as the hive-site.xml. But in cluster mode, the only way to find 
> hive configuration files is to get files from YARN Nodemanager usercache 
> which is set via "spark.yarn.dist.files"
> I don't know if this is the right way, but I created a issue.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (SPARK-39485) When fetching hiveMetastoreJars from path, IsolatedClientLoader should get hive settings from origLoader.

Reply via email to