brysd opened a new issue, #5496:
URL: https://github.com/apache/hudi/issues/5496

   **Spark submit fails immediately with hudi-spark3.2-bundle_2.12:0.11.0 and 
kerberos authentication**
   
   executing following on our environment will result in the above mentioned 
error
   ``` shell
   
   /usr/bin/spark3-submit --packages 
org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 --conf 
"spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf 
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension" 
--conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
 --num-executors 4 --principal [email protected] --keytab vdp2.keytab 
test_hudi_schema_evolution.py
   ```
   
   code in python script:
   
   ``` python
   import pyspark
   
   from pyspark.sql import SparkSession
   from pyspark.sql.types import StructType, StructField, StringType, 
IntegerType, BooleanType
   
   spark = SparkSession.builder.appName('testHudiSchemaEvolution') \
       .getOrCreate()
   
   ```
   
   Maybe we need something extra and this is related to kerberos 
authentication. In the logs however we can see that we correctly get 
authenticated. 
   
   
   **To Reproduce**
   
   Not sure how easy it is to reproduce this - we also apply kerberos 
authentication through keytab file as you can see in the spark3-submit command 
but basically we don't move forward from the basic session getOrCreate.
   
   
   **Expected behavior**
   
   No exceptions.
   
   **Environment Description**
   
   * Hudi version : 0.11.0
   
   * Spark version : 3.2
   
   * Hive version : 3.1.3000
   
   * Hadoop version : 3.1.1.7
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   running kerberos authentication with keytab file
   
   **Stacktrace**
   
   Exception thrown:
   
   ``` shell
   Traceback (most recent call last):
     File "/home/dbrys1/test_hudi_schema_evolution.py", line 22, in <module>
       spark = SparkSession.builder.appName('testHudiSchemaEvolution') \
     File 
"/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/sql/session.py",
 line 228, in getOrCreate
     File 
"/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py",
 line 392, in getOrCreate
     File 
"/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py",
 line 147, in __init__
     File 
"/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py",
 line 209, in _do_init
     File 
"/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/pyspark.zip/pyspark/context.py",
 line 329, in _initialize_context
     File 
"/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py",
 line 1574, in __call__
     File 
"/opt/cloudera/parcels/SPARK3-3.2.0.3.2.7170.0-49-1.p0.18822714/lib/spark3/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py",
 line 328, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling 
None.org.apache.spark.api.java.JavaSparkContext.
   : java.lang.NoClassDefFoundError: 
org/apache/hudi/org/apache/hadoop/hbase/protobuf/generated/AuthenticationProtos$TokenIdentifier
           at 
org.apache.hudi.org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier.readFields(AuthenticationTokenIdentifier.java:142)
           at 
org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:192)
           at 
org.apache.hadoop.security.token.Token.identifierToString(Token.java:444)
           at org.apache.hadoop.security.token.Token.toString(Token.java:464)
           at 
org.apache.spark.deploy.security.HBaseDelegationTokenProvider.$anonfun$obtainDelegationTokens$2(HBaseDelegationTokenProvider.scala:52)
           at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
           at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
           at 
org.apache.spark.deploy.security.HBaseDelegationTokenProvider.logInfo(HBaseDelegationTokenProvider.scala:34)
           at 
org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokens(HBaseDelegationTokenProvider.scala:52)
           at 
org.apache.spark.deploy.security.HadoopDelegationTokenManager.$anonfun$obtainDelegationTokens$2(HadoopDelegationTokenManager.scala:164)
           at 
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
           at scala.collection.Iterator.foreach(Iterator.scala:941)
           at scala.collection.Iterator.foreach$(Iterator.scala:941)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
           at 
scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:213)
           at 
scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
           at 
scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
           at 
scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
           at 
org.apache.spark.deploy.security.HadoopDelegationTokenManager.org$apache$spark$deploy$security$HadoopDelegationTokenManager$$obtainDelegationTokens(HadoopDelegationTokenManager.scala:162)
           at 
org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anon$4.run(HadoopDelegationTokenManager.scala:226)
           at 
org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anon$4.run(HadoopDelegationTokenManager.scala:224)
           at java.security.AccessController.doPrivileged(Native Method)
           at javax.security.auth.Subject.doAs(Subject.java:422)
           at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
           at 
org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainTokensAndScheduleRenewal(HadoopDelegationTokenManager.scala:224)
           at 
org.apache.spark.deploy.security.HadoopDelegationTokenManager.org$apache$spark$deploy$security$HadoopDelegationTokenManager$$updateTokensTask(HadoopDelegationTokenManager.scala:198)
           at 
org.apache.spark.deploy.security.HadoopDelegationTokenManager.start(HadoopDelegationTokenManager.scala:123)
           at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.$anonfun$start$1(CoarseGrainedSchedulerBackend.scala:552)
           at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.$anonfun$start$1$adapted(CoarseGrainedSchedulerBackend.scala:549)
           at scala.Option.foreach(Option.scala:407)
           at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.start(CoarseGrainedSchedulerBackend.scala:549)
           at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:48)
           at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
           at org.apache.spark.SparkContext.<init>(SparkContext.scala:581)
           at 
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
           at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
           at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
           at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
           at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
           at py4j.Gateway.invoke(Gateway.java:238)
           at 
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
           at 
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
           at 
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
           at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.ClassNotFoundException: 
org.apache.hudi.org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$TokenIdentifier
           at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
           at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
           at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
           ... 47 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to