meatheadmike commented on issue #12108: URL: https://github.com/apache/hudi/issues/12108#issuecomment-2416526133
This is what I thought initially too. So then I deployed a bone-stock Spark image and used the `--packages` flag: ``` export SPARK_CONF_DIR=/opt/spark/conf export SPARK_VERSION=3.5 # or 3.4, 3.3, 3.2 export HUDI_VERSION=1.0.0-beta2 /opt/spark/bin/pyspark \ --packages \ org.apache.hudi:hudi-spark$SPARK_VERSION-bundle_2.12:$HUDI_VERSION,\ com.amazonaws:aws-java-sdk-bundle:1.12.770,\ org.apache.hadoop:hadoop-common:3.3.4,\ org.apache.hadoop:hadoop-client:3.3.4,\ org.apache.hadoop:hadoop-aws:3.3.4 \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \ --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \ --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' \ --conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar' \ --conf spark.driver.extraJavaOptions='-Divy.cache.dir=/tmp -Divy.home=/tmp' ``` This should have eliminated any potential jar conflicts, no? Mind you this was done exclusively on the reader side of thngs as I assume this is where the problem is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
