brysd opened a new issue, #5496: URL: https://github.com/apache/hudi/issues/5496
**Spark submit fails immediately with hudi-spark3.2-bundle_2.12:0.11.0 and kerberos authentication** executing following on our environment will result in the above mentioned error ``` shell /usr/bin/spark3-submit --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf "spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog" --num-executors 4 --principal [email protected] --keytab vdp2.keytab test_hudi_schema_evolution.py ``` code in python script: ``` python import pyspark from pyspark.sql import SparkSession from pyspark.sql.types import StructType, StructField, StringType, IntegerType, BooleanType spark = SparkSession.builder.appName('testHudiSchemaEvolution') \ .getOrCreate() ``` Maybe we need something extra and this is related to kerberos authentication. In the logs however we can see that we correctly get authenticated. **To Reproduce** Not sure how easy it is to reproduce this - we also apply kerberos authentication through keytab file as you can see in the spark3-submit command but basically we don't move forward from the basic session getOrCreate. **Expected behavior** No exceptions. **Environment Description** * Hudi version : 0.11.0 * Spark version : 3.2 * Hive version : 3.1.3000 * Hadoop version : 3.1.1.7 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no **Additional context** running kerberos authentication with keytab file **Stacktrace** Exception thrown: ``` shell Traceback (most recent call last): File "/home/dbrys1/test_hudi_schema_evolution.py", line 22, in <module> spark = SparkSession.builder.appName('testHudiSchemaEvolution') \ File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/sql/session.py", line 228, in getOrCreate File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 392, in getOrCreate File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 147, in __init__ File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 209, in _do_init File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py", line 329, in _initialize_context File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1574, in __call__ File "/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.NoClassDefFoundError: org/apache/hudi/org/apache/hadoop/hbase/protobuf/generated/AuthenticationProtos$TokenIdentifier at org.apache.hudi.org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier.readFields(AuthenticationTokenIdentifier.java:142) at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:192) at org.apache.hadoop.security.token.Token.identifierToString(Token.java:444) at org.apache.hadoop.security.token.Token.toString(Token.java:464) at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.$anonfun$obtainDelegationTokens$2(HBaseDelegationTokenProvider.scala:52) at org.apache.spark.internal.Logging.logInfo(Logging.scala:57) at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56) at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.logInfo(HBaseDelegationTokenProvider.scala:34) at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokens(HBaseDelegationTokenProvider.scala:52) at org.apache.spark.deploy.security.HadoopDelegationTokenManager.$anonfun$obtainDelegationTokens$2(HadoopDelegationTokenManager.scala:164) at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:213) at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245) at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108) at org.apache.spark.deploy.security.HadoopDelegationTokenManager.org$apache$spark$deploy$security$HadoopDelegationTokenManager$$obtainDelegationTokens(HadoopDelegationTokenManager.scala:162) at org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anon$4.run(HadoopDelegationTokenManager.scala:226) at org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anon$4.run(HadoopDelegationTokenManager.scala:224) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainTokensAndScheduleRenewal(HadoopDelegationTokenManager.scala:224) at org.apache.spark.deploy.security.HadoopDelegationTokenManager.org$apache$spark$deploy$security$HadoopDelegationTokenManager$$updateTokensTask(HadoopDelegationTokenManager.scala:198) at org.apache.spark.deploy.security.HadoopDelegationTokenManager.start(HadoopDelegationTokenManager.scala:123) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.$anonfun$start$1(CoarseGrainedSchedulerBackend.scala:552) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.$anonfun$start$1$adapted(CoarseGrainedSchedulerBackend.scala:549) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.start(CoarseGrainedSchedulerBackend.scala:549) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:48) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220) at org.apache.spark.SparkContext.<init>(SparkContext.scala:581) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: org.apache.hudi.org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$TokenIdentifier at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 47 more ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
