TranHuyTiep opened a new issue, #8340:
URL: https://github.com/apache/hudi/issues/8340
**Describe the problem you faced**
- I submit spark job in k8s by spark operator read file by hudi lib throw
cannot assign instance of java.lang.invoke.SerializedLambda
- I read the file directly on the parquet works fine
**Environment Description**
* Hudi version : 0.13.0
* Spark version : 3.3.1, apache/spark-py:v3.3.1
* Running on Docker operator : sparkoperator.k8s.io/v1beta2
* openjdk version "11.0.16"
OpenJDK Runtime Environment 18.9 (build 11.0.16+8)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.16+8, mixed mode, sharing)
*
SPARK_PACKAGES=org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0,org.apache.spark:spark-avro_2.12:3.3.1
**Additional context**
- file yaml
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: Demo
namespace: default
spec:
type: Python
pythonVersion: "3"
mode: cluster
image: apache/spark-py:v3.3.1
imagePullPolicy: Always
mainApplicationFile: local:///opt/spark/work-dir/demo.py
sparkVersion: "3.3.1"
restartPolicy:
type: OnFailure
onFailureRetries: 3
onFailureRetryInterval: 10
onSubmissionFailureRetries: 5
onSubmissionFailureRetryInterval: 20
driver:
cores: 1
coreLimit: "1200m"
memory: "1024m"
labels:
version: 3.3.1
serviceAccount: spark
envFrom:
- configMapRef:
name: spark-configmap
executor:
cores: 1
instances: 2
memory: "1024m"
labels:
version: 3.3.1
envFrom:
- configMapRef:
name: spark-configmap
sparkConf:
spark.jars.packages:
"org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0,org.apache.spark:spark-avro_2.12:3.3.1"
spark.rdd.compress: "true"
spark.serializer: "org.apache.spark.serializer.KryoSerializer"
spark.sql.shuffle.partitions: "100"
- throw an error
`INFO DAGScheduler: Job 0 failed: collect at
HoodieSparkEngineContext.java:137, took 0.609365 s
Traceback (most recent call last):
File "/opt/spark/work-dir/demo.py", line 73, in <module>
df_load.show(10)
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line
606, in show
File "/opt/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py",
line 1321, in __call__
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 190,
in deco
File "/opt/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line
326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o70.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0
(TID 3) (10.244.0.45 executor 2): java.lang.ClassCastException: cannot assign
instance of java.lang.invoke.SerializedLambda to field
org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of
org.apache.spark.rdd.MapPartitionsRDD`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]