[
https://issues.apache.org/jira/browse/SPARK-39513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556142#comment-17556142
]
Timothy C. Arland commented on SPARK-39513:
-------------------------------------------
[~yumwang] I can reproduce this with YARN as scheduler as well, just using
spark-defaults.conf and simply running 'spark-shell'. This works correctly in
spark 3.2.1. Here is the cluster's spark-defaults.conf:
{code:java}
spark.authenticate=false
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.schedulerBacklogTimeout=1
spark.shuffle.service.enabled=true
spark.shuffle.service.port=7337
spark.master=yarn
spark.submit.deployMode=client
spark.ui.killEnabled=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.driver.memory=1g
spark.executor.memory=1g
spark.sql.hive.metastore.jars=path
spark.sql.hive.metastore.jars.path=file:///opt/TDH/hive/lib/*.jar
spark.sql.hive.metastore.version=3.1.2
spark.hadoop.hive.metastore.uris=thrift://callisto:9083
spark.eventLog.enabled=true
spark.eventLog.dir=hdfs://callisto:8020/tmp/spark/applicationHistory
spark.history.fs.logDirectory=hdfs://callisto:8020/tmp/spark/applicationHistory
spark.yarn.historyServer.address=http://callisto:18080
spark.yarn.jars=local:/opt/TDH/spark/jars/*
spark.driver.extraLibraryPath=$HADOOP_HOME/lib/native
spark.executor.extraLibraryPath=$HADOOP_HOME/lib/native
spark.yarn.am.extraLibraryPath=$HADOOP_HOME/lib/native
spark.hadoop.mapreduce.application.classpath=
spark.hadoop.yarn.application.classpath=
{code}
{noformat}
Spark context Web UI available at http://callisto.<mydomain>.net:4040
Spark context available as 'sc' (master = yarn, app id =
application_1655681960811_0002).
Spark session available as 'spark'.
scala> spark.catalog.listDatabases().show()
2022-06-19T17:06:04,935 INFO [main] org.apache.hadoop.hive.conf.HiveConf -
Found configuration file file:/opt/TDH/hive/conf/hive-site.xml
Hive Session ID = 0b3f641c-d862-4e78-93da-88e6d872d328
2022-06-19T17:06:05,104 INFO [main] SessionState - Hive Session ID =
0b3f641c-d862-4e78-93da-88e6d872d328
2022-06-19T17:06:05,265 INFO [main]
org.apache.hadoop.hive.metastore.HiveMetaStoreClient - Trying to connect to
metastore with URI thrift://callisto.charltontechnology.net:9083
2022-06-19T17:06:05,290 INFO [main]
org.apache.hadoop.hive.metastore.HiveMetaStoreClient - Opened a connection to
metastore, current connections: 1
2022-06-19T17:06:05,315 INFO [main]
org.apache.hadoop.hive.metastore.HiveMetaStoreClient - Connected to metastore.
2022-06-19T17:06:05,315 INFO [main]
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient -
RetryingMetaStoreClient proxy=class
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=tca
(auth:SIMPLE) retries=1 delay=1 lifetime=0
java.lang.NoClassDefFoundError: scala/Serializable
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
{noformat}
> HiveMetastore serializable exception with Spark 3.3.0
> -----------------------------------------------------
>
> Key: SPARK-39513
> URL: https://issues.apache.org/jira/browse/SPARK-39513
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes, SQL
> Affects Versions: 3.3.0
> Reporter: Timothy C. Arland
> Priority: Major
>
> Running this command-line with provided configuration works fine in Spark
> 3.2.1:
> ${SPARK_HOME}/bin/spark-shell \
> > --master k8s://[https://10.96.0.1:443|https://10.96.0.1/] \
> > --deploy-mode client \
> > --name spark-shell --conf
> > spark.scheduler.minRegisteredResourcesRatio=1 --conf
> > spark.executor.instances=8 --conf spark.executor.cores=1 --conf
> > spark.executor.limit.cores=2 --conf spark.executor.memory=4g --conf
> > spark.driver.cores=1 --conf spark.driver.limit.cores=1 --conf
> > spark.driver.memory=1g --conf
> > spark.kubernetes.container.image=<myimagerepo>/spark:v3.2.1-thebe-2206.10
> > --conf spark.kubernetes.driver.pod.name=spark-shell-driver-jglxk --conf
> > spark.kubernetes.namespace=spark --conf
> > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf
> > spark.hadoop.fs.s3a.endpoint=[http://10.96.45.95:80 |http://10.96.45.95/]
> > -conf spark.hadoop.fs.s3a.access.key=<myaccesskey> --conf
> > spark.hadoop.fs.s3a.secret.key=<mysecretkey> --conf
> > spark.hadoop.fs.s3a.path.style.access=true --conf
> > spark.hadoop.fs.s3a.block.size=512M --conf
> > spark.hadoop.fs.s3a.committer.magic.enabled=false --conf
> > spark.hadoop.fs.s3a.committer.name=directory --conf
> > spark.hadoop.fs.s3a.committer.staging.abort.pending.uploads=true --conf
> > spark.hadoop.fs.s3a.committer.staging.conflict-mode=append --conf
> > spark.hadoop.fs.s3a.committer.staging.tmp.path=/tmp/staging --conf
> > spark.hadoop.fs.s3a.committer.staging.unique-filenames=true --conf
> > spark.hadoop.fs.s3a.committer.threads=2048 --conf
> > spark.hadoop.fs.s3a.connection.establish.timeout=5000 --conf
> > spark.hadoop.fs.s3a.connection.maximum=8192 --conf
> > spark.hadoop.fs.s3a.connection.ssl.enabled=false --conf
> > spark.hadoop.fs.s3a.connection.timeout=200000 --conf
> > spark.hadoop.fs.s3a.fast.upload.active.blocks=2048 --conf
> > spark.hadoop.fs.s3a.fast.upload.buffer=disk --conf
> > spark.hadoop.fs.s3a.fast.upload=true --conf
> > spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem --conf
> > spark.hadoop.fs.s3a.max.total.tasks=2048 --conf
> > spark.hadoop.fs.s3a.multipart.size=512M --conf
> > spark.hadoop.fs.s3a.multipart.threshold=512M --conf
> > spark.hadoop.fs.s3a.socket.recv.buffer=65536 --conf
> > spark.hadoop.fs.s3a.socket.send.buffer=65536 --conf
> > spark.hadoop.fs.s3a.threads.max=2048 --conf
> > spark.eventLog.dir=s3a://spark/spark-logs --conf
> > spark.eventLog.enabled=true --conf
> > spark.hadoop.metastore.catalog.default=hive --conf
> > spark.sql.warehouse.dir=s3a//hive/warehouse --conf
> > spark.sql.hive.metastore.dir=s3a://hive/warehouse --conf
> > spark.sql.hive.metastore.version=3.1.2 --conf
> > spark.sql.hive.metastore.jars=path --conf
> > spark.sql.hive.metastore.jars.path=[file:///opt/hive/lib/*.jar] --conf
> > spark.hadoop.hive.metastore.uris=thrift://hive-metastore.hive.svc.cluster.local:9083
>
> With Spark 3.3.0 image, this same command yields the following exception when
> running spark.catalog functions, eg. spark.catalog.listDatabases().show()
>
> {{2022-06-19T00:57:13,687 INFO [main]
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient - Opened a connection to
> metastore, current connections: 1 2022-06-19T00:57:13,701 INFO
> [main] org.apache.hadoop.hive.metastore.HiveMetaStoreClient - Connected to
> metastore. }}
> {{2022-06-19T00:57:13,701 INFO [main]
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient -
> RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.}}
> {{metadata.SessionHiveMetaStoreClient ugi=root (auth:SIMPLE) retries=1
> delay=1 lifetime=0
> }}
> {{java.lang.NoClassDefFoundError: scala/Serializable }}
> {{ at java.base/java.lang.ClassLoader.defineClass1(Native Method) }}
> {{ at java.base/java.lang.ClassLoader.defineClass(Unknown Source) }}
> {{ at java.base/java.security.SecureClassLoader.defineClass(Unknown Source)
> }}
> {{ at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(Unknown
> Source) }}
> {{ at
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(Unknown
> Source) }}
> {{ at
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(Unknown
> Source) }}
> {{ at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(Unknown
> Source) }}
> {{ at
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(Unknown
> Source) }}
> {{ at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
> }}
> {{ at
> org.apache.spark.sql.catalyst.analysis.RewriteDeleteFromTable$.apply(RewriteDeleteFromTable.scala:39)
> }}
> {{ at
> org.apache.spark.sql.catalyst.analysis.RewriteDeleteFromTable$.apply(RewriteDeleteFromTable.scala:37)
> }}
> {{ at
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
> }}
> {{ at scala.collection.LinearSeqOps.foldLeft(LinearSeq.scala:169)}}
> {{ at scala.collection.LinearSeqOps.foldLeft$(LinearSeq.scala:165) }}
> {{ at scala.collection.immutable.List.foldLeft(List.scala:79)}}
> {{ at
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
> }}
> {{ at scala.collection.immutable.List.foreach(List.scala:333) }}
> {{ at
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
> }}
> {{ at
> org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:227)
> }}
> {{ at
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:223)
> }}
> {{ at
> org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:172)
> }}
> {{ at
> org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:223) }}
> {{ at
> org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:187)
> }}
> {{ at
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
> }}
> {{ at
> org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
> }}
> {{ at
> org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
> }}
> {{ at
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:208)}}
> {{ at
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)}}
> {{ at
> org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:207)}}
> {{ at
> org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)}}
> {{ at
> org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
> }}
> {{ at
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
> }}
> {{ at
> org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
> }}
> {{ at
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
> }}
> {{ at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
> }}
> {{ at
> org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
> }}
> {{ at
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
> }}
> {{ at
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
> }}
> {{ at
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66}}
>
> {{Not sure what I am missing between the two versions. Using spark 3.3.0 with
> hadoop 3.3.2 and hive 3.1.3}}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]