TranHuyTiep opened a new issue, #8340:
URL: https://github.com/apache/hudi/issues/8340

   
   **Describe the problem you faced**
   - I submit spark job in k8s by spark operator read file by hudi lib throw 
cannot assign instance of java.lang.invoke.SerializedLambda
   - I read the file directly on the parquet  works fine
   
   **Environment Description**
   
   * Hudi version : 0.13.0
   * Spark version : 3.3.1, apache/spark-py:v3.3.1
   * Running on Docker operator : sparkoperator.k8s.io/v1beta2
   * openjdk version "11.0.16"
     OpenJDK Runtime Environment 18.9 (build 11.0.16+8)
     OpenJDK 64-Bit Server VM 18.9 (build 11.0.16+8, mixed mode, sharing)
   * 
SPARK_PACKAGES=org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0,org.apache.spark:spark-avro_2.12:3.3.1
   
   **Additional context**
   - file yaml
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: Demo
     namespace: default
   spec:
     type: Python
     pythonVersion: "3"
     mode: cluster
     image: apache/spark-py:v3.3.1
     imagePullPolicy: Always
     mainApplicationFile: local:///opt/spark/work-dir/demo.py
     sparkVersion: "3.3.1"
     restartPolicy:
       type: OnFailure
       onFailureRetries: 3
       onFailureRetryInterval: 10
       onSubmissionFailureRetries: 5
       onSubmissionFailureRetryInterval: 20
     driver:
       cores: 1
       coreLimit: "1200m"
       memory: "1024m"
       labels:
         version: 3.3.1
       serviceAccount: spark
       envFrom:
         - configMapRef:
             name: spark-configmap
     executor:
       cores: 1
       instances: 2
       memory: "1024m"
       labels:
         version: 3.3.1
       envFrom:
         - configMapRef:
             name: spark-configmap
     sparkConf:
       spark.jars.packages: 
"org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0,org.apache.spark:spark-avro_2.12:3.3.1"
       spark.rdd.compress: "true"
       spark.serializer: "org.apache.spark.serializer.KryoSerializer"
       spark.sql.shuffle.partitions: "100"
   
   -  throw an error
     `INFO DAGScheduler: Job 0 failed: collect at 
HoodieSparkEngineContext.java:137, took 0.609365 s
   Traceback (most recent call last):
     File "/opt/spark/work-dir/demo.py", line 73, in <module>
       df_load.show(10)
     File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 
606, in show
     File "/opt/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", 
line 1321, in __call__
     File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 190, 
in deco
     File "/opt/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 
326, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling o70.showString.
   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
(TID 3) (10.244.0.45 executor 2): java.lang.ClassCastException: cannot assign 
instance of java.lang.invoke.SerializedLambda to field 
org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of 
org.apache.spark.rdd.MapPartitionsRDD`
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to